Path: ...!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: How to properly use py-webrtcvad?
Date: 26 Jan 2025 10:37:40 GMT
Organization: Stefan Ram
Lines: 65
Expires: 1 Jan 2026 11:59:58 GMT
Message-ID: <audio-20250126113719@ram.dialup.fu-berlin.de>
References: <mailman.93.1737582833.2912.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de xKw2liOFsDXzl6mpAuxWjAq68x3fkSEFErMKujq/CVmvrf
Cancel-Lock: sha1:ltvlsJh6guhOiFBB8qtc54Qx6SE= sha256:2j5Be8UHbYhjjagEFJl29t+fJZyEgISE3CetvAaVCV0=
X-Copyright: (C) Copyright 2025 Stefan Ram. All rights reserved.
	Distribution through any means other than regular usenet
	channels is forbidden. It is forbidden to publish this
	article in the Web, to change URIs of this article into links,
        and to transfer the body without this notice, but quotations
        of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
	services to mirror the article in the web. But the article may
	be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Bytes: 3219

marc nicole <mk1853387@gmail.com> wrote or quoted:
>return _webrtcvad.process(self._vad, sample_rate, buf, length)
>>Error: Error while processing frame

  (I was not able to check the following tips myself! 
  So, please read them as a mere wild guess!)

  That error you're running into - it's possibly because the
  audio format webrtcvad wants isn't jiving with what you're
  feeding it. Let me break it down for you:

  WebRTC VAD is picky about its audio, like a foodie at a farmers
  market:

  - It wants 16-bit mono PCM, nothing fancy

  - Sample rates got to be 8000, 16000, 32000, or 48000 Hz

  - Frame durations should be 10, 20, or 30 ms, like clockwork

  Tweak your PyAudio setup like you're fine-tuning a classic car:

  Python

self.FORMAT = pyaudio.paInt16
self.CHANNELS = 1
self.RATE = 16000
self.FRAMES_PER_BUFFER = 480  # 30 ms at 16000 Hz, smooth as a SoCal highway

    Give your audio reading loop a makeover:

  Python

for i in range(0, int(self.RATE / self.FRAMES_PER_BUFFER * self.RECORD_SECONDS)):
    data = self.stream.read(self.FRAMES_PER_BUFFER)
    is_speech = self.vad.is_speech(data, self.RATE)

  Make sure your audio data is on point:

  Python

import numpy as np

# Turn that audio data into a numpy array, like magic
audio_array = np.frombuffer(data, dtype=np.int16)

# If it's not mono, make it mono - no stereo allowed at this party
if self.CHANNELS > 1:
    audio_array = audio_array[::self.CHANNELS]

# Back to bytes it goes
audio_bytes = audio_array.tobytes()

is_speech = self.vad.is_speech(audio_bytes, self.RATE)

  Crank up that VAD aggressiveness:

  Python

self.vad = webrtcvad.Vad(3)  # 3 is as aggressive as LA traffic

  (Just remember to adjust your sample rate and frame duration
  to fit your needs.)