It’s All About The Latency

Published on 2021-09-18 in Electrical Engineering, Speech Recognition

Measure Microphone Latency in Linux with Alsa

The command below generates a tone signal out of the speaker and receives it back through the mic. Measuring the phase diff will reveal the round-trip latency.

alsa_delay hw:1,0 hw:0,0 44100 256 2 1 1

Here hw:1,0 refer to the recording device that can be retrieved from arecord -l and hw:0,0 refer to the playback device. Again can be retrieved from aplay -l .

The 44100 is the sampling rate. 256 is the buffer size. 256 works best for me. Lower numbers corrupt the test and higher numbers just bring more latency to the table. Don’t know exactly what nfrags input and output arguments are but 2 1 and 1 respectively works magically for me. I just tinkering around and found these numbers. No other number works for me.

My Setup

1. Focusrite Scarlett Solo Latency: 2.5ms

2. Shure SM57 Mic Latency: 2.5ms

3. OverAll Delay: 14ms with non-RT mode

LoopBack

You can tinker around the effect of latency with

pactl load-module module-loopback latency_msec=15

To end the loopback mode

pactl unload-module module-loopback

As Always Useful links

PulseAudio – Latency Control

Arun Raghavan – Beamforming in PulseAudio

Arch Linux Wiki – Professional Audio, Realtime kernel

Speech Recognition II(Developing Kaldi)

Published on 2021-08-02 in Speech Recognition

Let’s Enhance Kaldi, Here are some links along the way. Look like YouTube is progressing a lot during the last couple of years so basically here is just a bunch of random videos creating my favorite playlist to learn all the cool stuff under the Kaldi’s hood.

YouTube

Keith Chugg (USC) – Viterbi Algorithm
Lim Zhi Hao (NTU) – WFST: A Nice Channel On Weighted Finite State Transducers
Dan Povey (JHU) – ICASSP 2011 Kaldi Workshop: Dan Explaining Kaldi Basics
Luis Serrano – The Covariance Matrix: To Understand GMM Acoustic Modeling

Kaldi

Mehryar Mohri (NYU) – Speech Recognition with WFST: A joint work of RWTH and NYU
Mehryar Mohri (NYU), Afshin Rostamizadeh – Foundations of Machine Learning
George Doddington (US DoD) ICASSP 2011 – Human Assisted Speaker Recognition
GitHub Kaldi – TED-LIUM Result: GMM, SGMM, Triple Deltas Comparison
EE Columbia University – Speech Recognition Spring 2016
D. Povey – Generating Lattices in the WFST : For understanding LattceFasterDecoder

Notes

Lattices: A more complex form of FST‘s, The first version decoders were based on FST’s (like faster-decoder and online decoders). For Minimum Bayesian Risk Calculation Using Lattices will give you a better paved way
faster-decoder: Old decoder, very simple to understand how decoding process is done
lattice-faster-decoder: general decoder, same as faster-decoder but output lattices instead of FSTs
DecodableInterface: An interface that connects decoder to the features. decoder uses this Decodable object to pull CMVN features from it.
BestPath: An FST that constructed from the Best Path (path with maximum likelihood) in the decoded FST.
nBestPath: An FST constructed from the top N Best Path in the decoded FST.
GetLinearSymbolSequence: The final step in the recognition process, get a BestPath FST or Lattice and output the recognized words with the path weight. CompactLattices need to be converted using ConvertLattice
Strongly Connected Component: A set that all components are accessible (in two ways) by it’s member.
The Main Function in Decoder is ProcessEmitting that pulls loglikelihood from the decodable object

Online Kaldi Decoding

Published on 2021-07-28 in Speech Recognition

Thanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection

GStreamer – Dynamic pipelines
Function that save lives! gst_caps_to_string(caps)
GStreamer – GstBufferPool
StackOverFlow – Gstreamer gst_buffer_pool_acquire_buffer function is slow on ARM
GitHub – Alumae: GST-Kaldi-NNet2-Online
StackOverFlow – How to create GstBuffer

Lost in the Vast Ocean of Speech Recognition

Published on 2020-12-08 in Speech Recognition

Here I am, pursuing once more the old-fashioned machine learning. I’ll keep it short and write down useful links

Books

Papers

IEEE – Uncertainty Decoding with SPLICE for Noise Robust Speech Recognition

YouTube

Kaldi

Kaldi ASR – FrameWork Wiki
Kaldi Wiki – Kaldi Tutorial #1
Dan Povey’s Homepage: Former Professor at John Hopkins University, Author of HTK Book
KalDi WiKi – Kaldi for Dummies tutorial

Wikipedia

Blog Posts

Others

Jargons

LVCSR: Large Vocabulary Continuous Speech Recognition

Tags: Speech Recognition

next posts ›

It’s All About The Latency

Measure Microphone Latency in Linux with Alsa

My Setup

LoopBack

Speech Recognition II(Developing Kaldi)

YouTube

Kaldi

Notes

Online Kaldi Decoding

Lost in the Vast Ocean of Speech Recognition

Books

Papers

YouTube

Kaldi

Wikipedia

Blog Posts

Others

Jargons

BijoKH

BijoKH