mcrblg-header-image

search

It’s All About The Latency

Published on 2021-09-18 in Electrical Engineering, Speech Recognition

Measure Microphone Latency in Linux with Alsa

The command below generates a tone signal out of the speaker and receives it back through the mic. Measuring the phase diff will reveal the round-trip latency.

alsa_delay hw:1,0 hw:0,0 44100 256 2 1 1

Here hw:1,0 refer to the recording device that can be retrieved from arecord -l and hw:0,0 refer to the playback device. Again can be retrieved from aplay -l .

The 44100 is the sampling rate. 256 is the buffer size. 256 works best for me. Lower numbers corrupt the test and higher numbers just bring more latency to the table. Don’t know exactly what nfrags input and output arguments are but 2 1 and 1 respectively works magically for me. I just tinkering around and found these numbers. No other number works for me.

My Setup

1. Focusrite Scarlett Solo Latency: 2.5ms

2. Shure SM57 Mic Latency:  2.5ms

3. OverAll Delay: 14ms with non-RT mode

LoopBack

You can tinker around the effect of latency with

pactl load-module module-loopback latency_msec=15

To end the loopback mode

pactl unload-module module-loopback

As Always Useful links


PulseAudio – Latency Control

Arun Raghavan – Beamforming in PulseAudio

Arch Linux Wiki – Professional Audio, Realtime kernel


Speech Recognition II(Developing Kaldi)

Published on 2021-08-02 in Speech Recognition

Let’s Enhance Kaldi, Here are some links along the way. Look like YouTube is progressing a lot during the last couple of years so basically here is just a bunch of random videos creating my favorite playlist to learn all the cool stuff under the Kaldi’s hood.

YouTube

  1. Keith Chugg (USC) – Viterbi Algorithm
  2. Lim Zhi Hao (NTU) – WFST: A Nice Channel On Weighted Finite State Transducers
  3. Dan Povey (JHU) – ICASSP 2011 Kaldi Workshop: Dan Explaining Kaldi Basics
  4. Luis Serrano – The Covariance Matrix: To Understand GMM Acoustic Modeling

Kaldi

  1. Mehryar Mohri (NYU) – Speech Recognition with WFST: A joint work of RWTH and NYU
  2. Mehryar Mohri (NYU), Afshin Rostamizadeh – Foundations of Machine Learning
  3. George Doddington (US DoD) ICASSP 2011 – Human Assisted Speaker Recognition
  4. GitHub Kaldi – TED-LIUM Result: GMM, SGMM, Triple Deltas Comparison
  5. EE Columbia University – Speech Recognition Spring 2016
  6. D. Povey – Generating Lattices in the WFST : For understanding LattceFasterDecoder

Notes


Online Kaldi Decoding

Published on 2021-07-28 in Speech Recognition

Thanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection

  1. GStreamer – Dynamic pipelines
  2. Function that save lives! gst_caps_to_string(caps)
  3. GStreamer – GstBufferPool
  4. StackOverFlow – Gstreamer gst_buffer_pool_acquire_buffer function is slow on ARM
  5. GitHub – Alumae: GST-Kaldi-NNet2-Online
  6. StackOverFlow – How to create GstBuffer

Lost in the Vast Ocean of Speech Recognition

Published on 2020-12-08 in Speech Recognition

Here I am, pursuing once more the old-fashioned machine learning. I’ll keep it short and write down useful links

Books

  1. Dan Povey – HTK Book
  2. Ian Goodfellow – Deep Learning

Papers

  1. IEEE – Uncertainty Decoding with SPLICE for Noise Robust Speech Recognition

YouTube

  1. Hannes van Lier – Basic Introduction to Speech Recognition (HMM & Neural Networks)
  2. Luis Serrano – A friendly introduction to Bayes Theorem and Hidden Markov Models
  3. Djp3 – Hidden Markov Models, The forward-backward algorithm

Kaldi

  1. Kaldi ASR – FrameWork Wiki
  2. Kaldi Wiki – Kaldi Tutorial #1
  3. Dan Povey’s Homepage: Former Professor at John Hopkins University, Author of HTK Book
  4. KalDi WiKi – Kaldi for Dummies tutorial

Wikipedia

  1. Wikipedia – Dempster–Shafer theory
  2. Wikipedia – Expectation–maximization algorithm
  3. Wikipedia – Cepstral mean and variance normalization
  4. Wikipedia – Baum–Welch algorithm
  5. Wikipedia – Mutual Information

Blog Posts

  1. Medium – Jonathan Hui: ASR Model Training
  2. Medium – Jonathan Hui: Maximum Mutual Information Estimation (MMIE)
  3. Medium – Jonathan Hui: Weighted Finite-State Transducers

Others

  1. KDE Simon – CMU SPHINX Based ASR
  2. Speech Research International Language Model (SRILM)

Jargons

  1. LVCSR: Large Vocabulary Continuous Speech Recognition

next posts ›
close
menu