| Deutsch English Français Italiano |
|
<ln8jpuFlipbU3@mid.individual.net> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!feeds.phibee-telecom.net!news.mixmin.net!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: rbowman <bowman@montana.com> Newsgroups: comp.os.linux.advocacy Subject: Re: If you're a fucking moron Date: 16 Oct 2024 01:39:10 GMT Lines: 31 Message-ID: <ln8jpuFlipbU3@mid.individual.net> References: <dvhpfjd1rh7uheoien02arle31q9fhcd57@4ax.com> <8m6dnczkO_GAPpz6nZ2dnZfqnPSdnZ2d@giganews.com> <slrnvgdh2b.2nfc6.candycanearter07@candydeb.host.invalid> <pan$78b32$3adf0bd2$1deb5e7a$b8c37cfa@linux.rocks> <slrnvgg5g4.1gkvp.candycanearter07@candydeb.host.invalid> <lmqvr7FiegjU1@mid.individual.net> <slrnvgtmjv.3lkna.candycanearter07@candydeb.host.invalid> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: individual.net SVPV3MHFzJTWs/zA9EMWTAibmq21LyQoNdqfnc7C8EQMCPUi3Z Cancel-Lock: sha1:M3mFdOsgdDx+vmg9MC543luaaHQ= sha256:sCt/FDPKQatr5Ye6D8NWxoHm4HbFHvo++G+ESMPz9Ss= User-Agent: Pan/0.149 (Bellevue; 4c157ba) Bytes: 2622 On Tue, 15 Oct 2024 21:20:04 -0000 (UTC), candycanearter07 wrote: > No wonder youtube autocaptions are so unreliable. One project for a TinyML course I took was using an Arduino Nano Sense 33 to handle wake words. Those are phrases like 'Alexa'. Currently the wake word triggers the system but almost all subsequent speech processing is done by a server in the cloud. The object someday is to have the capability in a phone or an edge device to handle the whole process. There would be a lot of savings in eliminating a massive backend and also would address privacy issues. Anyway I could train the board to recognize a few words like start, stop, up and down. Some were more reliable than others. Messing around I could get some feel for what the neural network model was looking for, so to speak, and trick it. That's the problem with NNs. It's not clear what they really are doing even while understanding the process. In this case the microphone output was sampled by an AD converter and used to create a spectrogram. https://en.wikipedia.org/wiki/Spectrogram Ultimately deciding if the spoken command was 'start' or 'stop' cam down to image classification using the spectrogram. There is clipping, scaling, and manipulations to simplify the image all along the way but it worked. Mostly. Trying to autocaption probably breaks the speech into phonemes to be more flexible but given accents, inflections, poor pronunciations, and other factors human listeners are skilled at handling it is a challenge.