A Simplified Block Diagram of ASR Process in Kaldi NGC Nvidia – Kaldi Container Oxinabox – Kaldi Notes KWS14 – Kaldi Lattices
• Costs: Are Log Negative Probability, so a higher cost means lower probability. • Frame: Each 10ms of audio that using MFCC turned into a fixed size vector called a frame. • Beam: Cutoff would be Best Cost–Beam (Around 10 to 16) • Cutoff: The maximum cost that all cost higher than this value will…
Measure Microphone Latency in Linux with Alsa The command below generates a tone signal out of the speaker and receives it back through the mic. Measuring the phase diff will reveal the round-trip latency. alsa_delay hw:1,0 hw:0,0 44100 256 2 1 1 Here hw:1,0 refer to the recording device that can be retrieved from arecord…
Let’s Enhance Kaldi, Here are some links along the way. Look like YouTube is progressing a lot during the last couple of years so basically here is just a bunch of random videos creating my favorite playlist to learn all the cool stuff under the Kaldi’s hood. YouTube Keith Chugg (USC) – Viterbi Algorithm Lim…
Thanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection GStreamer – Dynamic pipelines Function that save lives!…
On the way to develop a driver for Scarlet Solo Gen3 to harness the power of Shure SM57 Dynamic Microphone. Useful links to preserve: Microsoft – Universal Audio Architecture: Guideline to for Sound Card Without Propriety Driver Microsoft – Introduction to Port Class Microsoft – AVStream Overview Microsoft – WDM Audio Terminology Microsoft – Kernel…
So the third year has been passed. I mostly worked on developing a couple of hardware projects. Halsey music was a big passion there. Learning all ML cool stuff now is one of my top priority. Combine it with the emerge of Talon, a powerful C2 grammar framework by Ryan Hileman, and wave2letter a game-changing…
Here I am, pursuing once more the old-fashioned machine learning. I’ll keep it short and write down useful links Books Dan Povey – HTK Book Ian Goodfellow – Deep Learning Papers IEEE – Uncertainty Decoding with SPLICE for Noise Robust Speech Recognition YouTube Hannes van Lier – Basic Introduction to Speech Recognition (HMM & Neural…
The combination of FMCOMMS3 and PetaLinux is working only on Ubuntu 16.04 LTS, PetaLinux 2018.3, Vivado 2018.3 Required Packages: sudo apt-get install -y gcc git make net-tools libncurses5-dev tftpd zlib1g-dev libssl-dev flex bison libselinux1 gnupg wget diffstat chrpath socat xterm autoconf libtool tar unzip texinfo zlib1g-dev gcc-multilib build-essential libsdl1.2-dev libglib2.0-dev zlib1g:i386 screen pax gzip Installing…
ADS has a broad way of aspects from IC design to the RF simulation, here we explore how to prepare your workspace to start layout phase after schematic design. ADS comes with tons of ready to use parts, these parts are available at <ADS>/ADS/oalibs/componentLib/. Here I demonstrate how to add and use RF_Passive_SMT library in…