Path: ...!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail From: ram@zedat.fu-berlin.de (Stefan Ram) Newsgroups: comp.lang.python Subject: Re: How to properly use py-webrtcvad? Date: 26 Jan 2025 10:37:40 GMT Organization: Stefan Ram Lines: 65 Expires: 1 Jan 2026 11:59:58 GMT Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de xKw2liOFsDXzl6mpAuxWjAq68x3fkSEFErMKujq/CVmvrf Cancel-Lock: sha1:ltvlsJh6guhOiFBB8qtc54Qx6SE= sha256:2j5Be8UHbYhjjagEFJl29t+fJZyEgISE3CetvAaVCV0= X-Copyright: (C) Copyright 2025 Stefan Ram. All rights reserved. Distribution through any means other than regular usenet channels is forbidden. It is forbidden to publish this article in the Web, to change URIs of this article into links, and to transfer the body without this notice, but quotations of parts in other Usenet posts are allowed. X-No-Archive: Yes Archive: no X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some services to mirror the article in the web. But the article may be kept on a Usenet archive server with only NNTP access. X-No-Html: yes Content-Language: en-US Bytes: 3219 marc nicole wrote or quoted: >return _webrtcvad.process(self._vad, sample_rate, buf, length) >>Error: Error while processing frame (I was not able to check the following tips myself! So, please read them as a mere wild guess!) That error you're running into - it's possibly because the audio format webrtcvad wants isn't jiving with what you're feeding it. Let me break it down for you: WebRTC VAD is picky about its audio, like a foodie at a farmers market: - It wants 16-bit mono PCM, nothing fancy - Sample rates got to be 8000, 16000, 32000, or 48000 Hz - Frame durations should be 10, 20, or 30 ms, like clockwork Tweak your PyAudio setup like you're fine-tuning a classic car: Python self.FORMAT = pyaudio.paInt16 self.CHANNELS = 1 self.RATE = 16000 self.FRAMES_PER_BUFFER = 480 # 30 ms at 16000 Hz, smooth as a SoCal highway Give your audio reading loop a makeover: Python for i in range(0, int(self.RATE / self.FRAMES_PER_BUFFER * self.RECORD_SECONDS)): data = self.stream.read(self.FRAMES_PER_BUFFER) is_speech = self.vad.is_speech(data, self.RATE) Make sure your audio data is on point: Python import numpy as np # Turn that audio data into a numpy array, like magic audio_array = np.frombuffer(data, dtype=np.int16) # If it's not mono, make it mono - no stereo allowed at this party if self.CHANNELS > 1: audio_array = audio_array[::self.CHANNELS] # Back to bytes it goes audio_bytes = audio_array.tobytes() is_speech = self.vad.is_speech(audio_bytes, self.RATE) Crank up that VAD aggressiveness: Python self.vad = webrtcvad.Vad(3) # 3 is as aggressive as LA traffic (Just remember to adjust your sample rate and frame duration to fit your needs.)