Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail From: ram@zedat.fu-berlin.de (Stefan Ram) Newsgroups: comp.lang.python Subject: Re: How to check whether audio bytes contain empty noise or actual voice/signal? Date: 26 Oct 2024 11:16:13 GMT Organization: Stefan Ram Lines: 89 Expires: 1 Jul 2025 11:59:58 GMT Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de Qt4MCpznxGAr4JnAllVmJwScGy5qQLn2ygajXQILk/3Tmi Cancel-Lock: sha1:95MzoRlWrMHuZfPlTfQaS6Amwzc= sha256:Dih+Zdw0b/JPMoowBIsTnYTCh+y/pEWndCxqfW5qG04= X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved. Distribution through any means other than regular usenet channels is forbidden. It is forbidden to publish this article in the Web, to change URIs of this article into links, and to transfer the body without this notice, but quotations of parts in other Usenet posts are allowed. X-No-Archive: Yes Archive: no X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some services to mirror the article in the web. But the article may be kept on a Usenet archive server with only NNTP access. X-No-Html: yes Content-Language: en-US Bytes: 4093 marc nicole wrote or quoted: >I have a hard time finding a way to check whether audio data samples are >containing empty noise or actual significant voice/noise. Or, you could have a human do a quick listen to some audio files to gauge the "empty-noise ratio," then use that number as the filename as a float, and finally train up a neural net on this. E.g., 0.99.wav # very empty 0.992.wav # very empty file #2 0.993.wav # very empty file #3 0.00.wav # very not empty file 0.002.wav # very not empty file #2 One possible approach: import os import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import Adam import librosa ## Data Preparation # Function to extract audio features def extract_features(file_path): audio, sr = librosa.load(file_path) mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13) return np.mean(mfccs.T, axis=0) # Load data from directory directory = 'd' # for example X = [] y = [] for filename in os.listdir(directory): if filename.endswith('.wav'): file_path = os.path.join(directory, filename) X.append(extract_features(file_path)) y.append(float(filename[:-4])) # Assuming filename is the p value X = np.array(X) y = np.array(y) # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Feature scaling scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) ## Neural Network Model model = Sequential([ Dense(64, activation='relu', input_shape=(13,)), Dense(32, activation='relu'), Dense(1) ]) model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error') ## Training model.fit(X_train_scaled, y_train, epochs=100, batch_size=32, validation_split=0.2, verbose=1) ## Evaluation test_loss = model.evaluate(X_test_scaled, y_test, verbose=0) print(f"Test Loss: {test_loss}") ## Prediction Function def predict_p(audio_file): features = extract_features(audio_file) scaled_features = scaler.transform(features.reshape(1, -1)) prediction = model.predict(scaled_features) return prediction[0][0] # Example usage new_audio_file = 'path/to/new/audio/file.wav' predicted_p = predict_p(new_audio_file) print(f"Predicted p value: {predicted_p}")