Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: How to check whether audio bytes contain empty noise or actual voice/signal?
Date: 26 Oct 2024 11:16:13 GMT
Organization: Stefan Ram
Lines: 89
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <nn-20241026120839@ram.dialup.fu-berlin.de>
References: <mailman.48.1729873488.4695.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de Qt4MCpznxGAr4JnAllVmJwScGy5qQLn2ygajXQILk/3Tmi
Cancel-Lock: sha1:95MzoRlWrMHuZfPlTfQaS6Amwzc= sha256:Dih+Zdw0b/JPMoowBIsTnYTCh+y/pEWndCxqfW5qG04=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
	Distribution through any means other than regular usenet
	channels is forbidden. It is forbidden to publish this
	article in the Web, to change URIs of this article into links,
        and to transfer the body without this notice, but quotations
        of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
	services to mirror the article in the web. But the article may
	be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Bytes: 4093

marc nicole <mk1853387@gmail.com> wrote or quoted:
>I have a hard time finding a way to check whether audio data samples are
>containing empty noise or actual significant voice/noise.

  Or, you could have a human do a quick listen to some audio files to
  gauge the "empty-noise ratio," then use that number as the filename
  as a float, and finally train up a neural net on this. E.g.,

0.99.wav  # very empty
0.992.wav # very empty file #2
0.993.wav # very empty file #3

0.00.wav  # very not empty file
0.002.wav # very not empty file #2

  One possible approach:

import os
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
import librosa

## Data Preparation

# Function to extract audio features
def extract_features(file_path):
    audio, sr = librosa.load(file_path)
    mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)
    return np.mean(mfccs.T, axis=0)

# Load data from directory
directory = 'd' # for example
X = []
y = []

for filename in os.listdir(directory):
    if filename.endswith('.wav'):
        file_path = os.path.join(directory, filename)
        X.append(extract_features(file_path))
        y.append(float(filename[:-4]))  # Assuming filename is the p value

X = np.array(X)
y = np.array(y)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Neural Network Model

model = Sequential([
    Dense(64, activation='relu', input_shape=(13,)),
    Dense(32, activation='relu'),
    Dense(1)
])

model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')

## Training

model.fit(X_train_scaled, y_train, epochs=100, batch_size=32, validation_split=0.2, verbose=1)

## Evaluation

test_loss = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"Test Loss: {test_loss}")

## Prediction Function

def predict_p(audio_file):
    features = extract_features(audio_file)
    scaled_features = scaler.transform(features.reshape(1, -1))
    prediction = model.predict(scaled_features)
    return prediction[0][0]

# Example usage
new_audio_file = 'path/to/new/audio/file.wav'
predicted_p = predict_p(new_audio_file)
print(f"Predicted p value: {predicted_p}")