| Deutsch English Français Italiano |
|
<audio-20250126113719@ram.dialup.fu-berlin.de> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: How to properly use py-webrtcvad?
Date: 26 Jan 2025 10:37:40 GMT
Organization: Stefan Ram
Lines: 65
Expires: 1 Jan 2026 11:59:58 GMT
Message-ID: <audio-20250126113719@ram.dialup.fu-berlin.de>
References: <mailman.93.1737582833.2912.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de xKw2liOFsDXzl6mpAuxWjAq68x3fkSEFErMKujq/CVmvrf
Cancel-Lock: sha1:ltvlsJh6guhOiFBB8qtc54Qx6SE= sha256:2j5Be8UHbYhjjagEFJl29t+fJZyEgISE3CetvAaVCV0=
X-Copyright: (C) Copyright 2025 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Bytes: 3219
marc nicole <mk1853387@gmail.com> wrote or quoted:
>return _webrtcvad.process(self._vad, sample_rate, buf, length)
>>Error: Error while processing frame
(I was not able to check the following tips myself!
So, please read them as a mere wild guess!)
That error you're running into - it's possibly because the
audio format webrtcvad wants isn't jiving with what you're
feeding it. Let me break it down for you:
WebRTC VAD is picky about its audio, like a foodie at a farmers
market:
- It wants 16-bit mono PCM, nothing fancy
- Sample rates got to be 8000, 16000, 32000, or 48000 Hz
- Frame durations should be 10, 20, or 30 ms, like clockwork
Tweak your PyAudio setup like you're fine-tuning a classic car:
Python
self.FORMAT = pyaudio.paInt16
self.CHANNELS = 1
self.RATE = 16000
self.FRAMES_PER_BUFFER = 480 # 30 ms at 16000 Hz, smooth as a SoCal highway
Give your audio reading loop a makeover:
Python
for i in range(0, int(self.RATE / self.FRAMES_PER_BUFFER * self.RECORD_SECONDS)):
data = self.stream.read(self.FRAMES_PER_BUFFER)
is_speech = self.vad.is_speech(data, self.RATE)
Make sure your audio data is on point:
Python
import numpy as np
# Turn that audio data into a numpy array, like magic
audio_array = np.frombuffer(data, dtype=np.int16)
# If it's not mono, make it mono - no stereo allowed at this party
if self.CHANNELS > 1:
audio_array = audio_array[::self.CHANNELS]
# Back to bytes it goes
audio_bytes = audio_array.tobytes()
is_speech = self.vad.is_speech(audio_bytes, self.RATE)
Crank up that VAD aggressiveness:
Python
self.vad = webrtcvad.Vad(3) # 3 is as aggressive as LA traffic
(Just remember to adjust your sample rate and frame duration
to fit your needs.)