mcrblg-header-image

search

Online Kaldi Decoding

Published on 2021-07-28 in Speech Recognition

Thanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection

  1. GStreamer – Dynamic pipelines
  2. Function that save lives! gst_caps_to_string(caps)
  3. GStreamer – GstBufferPool
  4. StackOverFlow – Gstreamer gst_buffer_pool_acquire_buffer function is slow on ARM
  5. GitHub – Alumae: GST-Kaldi-NNet2-Online
  6. StackOverFlow – How to create GstBuffer

Lost in the Vast Ocean of Speech Recognition

Published on 2020-12-08 in Speech Recognition

Here I am, pursuing once more the old-fashioned machine learning. I’ll keep it short and write down useful links

Books

  1. Dan Povey – HTK Book
  2. Ian Goodfellow – Deep Learning

Papers

  1. IEEE – Uncertainty Decoding with SPLICE for Noise Robust Speech Recognition

YouTube

  1. Hannes van Lier – Basic Introduction to Speech Recognition (HMM & Neural Networks)
  2. Luis Serrano – A friendly introduction to Bayes Theorem and Hidden Markov Models
  3. Djp3 – Hidden Markov Models, The forward-backward algorithm

Kaldi

  1. Kaldi ASR – FrameWork Wiki
  2. Kaldi Wiki – Kaldi Tutorial #1
  3. Dan Povey’s Homepage: Former Professor at John Hopkins University, Author of HTK Book
  4. KalDi WiKi – Kaldi for Dummies tutorial

Wikipedia

  1. Wikipedia – Dempster–Shafer theory
  2. Wikipedia – Expectation–maximization algorithm
  3. Wikipedia – Cepstral mean and variance normalization
  4. Wikipedia – Baum–Welch algorithm
  5. Wikipedia – Mutual Information

Blog Posts

  1. Medium – Jonathan Hui: ASR Model Training
  2. Medium – Jonathan Hui: Maximum Mutual Information Estimation (MMIE)
  3. Medium – Jonathan Hui: Weighted Finite-State Transducers

Others

  1. KDE Simon – CMU SPHINX Based ASR
  2. Speech Research International Language Model (SRILM)

Jargons

  1. LVCSR: Large Vocabulary Continuous Speech Recognition

next posts ›
close
menu