Kaldi Delta Features

Posted by Bijan in Speech Recognition

Delta-Delta feature is proposed in 1986 by S. Furui and Hermann Ney in 1990. It’s simply add first and second derivative of cepstrum to the feature vector. By doing that they say it can capture spectral dynamics and improve overall accuracy. The only problem is that in a discrete signal space getting derivative from the…

Kaldi FSTs Jargons

Posted by Bijan in Speech Recognition

• Lattices: Are a graph containing states(nodes) and arcs(edges). which each state represent one 10ms frame • Arcs: Are start from one state to another state. Each state arcs can be accessed with arc iterator and arcs only retain their next state. each arcs have weight and input and output label. • States: Are simple…

Kaldi Lattice Decoder

Posted by Bijan in Speech Recognition

• Link: Same as arc • Token: Are same as state. They have costs • FrameToks: A link list that contain all tokens in a single frame • Adaptive Beam: Used in pruning before creating lattice and through decoding • NEmitting Tokens: Non Emitting Tokens or NEmitting Tokens are tokens that generate from emitting token…

Kaldi GMM Overview

Posted by Bijan in Speech Recognition

A Simplified Block Diagram of ASR Process in Kaldi NGC Nvidia – Kaldi Container Oxinabox – Kaldi Notes KWS14 – Kaldi Lattices

Notes on Kaldi

Posted by Bijan in Speech Recognition

• Costs: Are Log Negative Probability, so a higher cost means lower probability. • Frame: Each 10ms of audio that using MFCC turned into a fixed size vector called a frame. • Beam: Cutoff would be Best Cost–Beam (Around 10 to 16) • Cutoff: The maximum cost that all cost higher than this value will…

It’s All About The Latency

Posted by Bijan in Electrical Engineering, Speech Recognition

Measure Microphone Latency in Linux with Alsa The command below generates a tone signal out of the speaker and receives it back through the mic. Measuring the phase diff will reveal the round-trip latency. alsa_delay hw:1,0 hw:0,0 44100 256 2 1 1 Here hw:1,0 refer to the recording device that can be retrieved from arecord…

Speech Recognition II(Developing Kaldi)

Posted by Bijan in Speech Recognition

Let’s Enhance Kaldi, Here are some links along the way. Look like YouTube is progressing a lot during the last couple of years so basically here is just a bunch of random videos creating my favorite playlist to learn all the cool stuff under the Kaldi’s hood. YouTube Keith Chugg (USC) – Viterbi Algorithm Lim…

Online Kaldi Decoding

Posted by Bijan in Speech Recognition

Thanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection GStreamer – Dynamic pipelines Function that save lives!…

WDM, WDK, DDK, HDK, SDK and ….

Posted by Bijan in Software, Windows

On the way to develop a driver for Scarlet Solo Gen3 to harness the power of Shure SM57 Dynamic Microphone. Useful links to preserve: Microsoft – Universal Audio Architecture: Guideline to for Sound Card Without Propriety Driver Microsoft – Introduction to Port Class Microsoft – AVStream Overview Microsoft – WDM Audio Terminology Microsoft – Kernel…

HaLseY and TaLoN!

Posted by Bijan in Zest

So the third year has been passed. I mostly worked on developing a couple of hardware projects. Halsey music was a big passion there. Learning all ML cool stuff now is one of my top priority. Combine it with the emerge of Talon, a powerful C2 grammar framework by Ryan Hileman, and wave2letter a game-changing…

‹ previous posts next posts ›

Kaldi Delta Features

Kaldi FSTs Jargons

Kaldi Lattice Decoder

Kaldi GMM Overview

Notes on Kaldi

It’s All About The Latency

Speech Recognition II(Developing Kaldi)

Online Kaldi Decoding

WDM, WDK, DDK, HDK, SDK and ….

HaLseY and TaLoN!

BijoKH

BijoKH