mcrblg-header-image

search

Speech Recognition II(Developing Kaldi)

Published on 2021-08-02 in Speech Recognition

Let’s Enhance Kaldi, Here are some links along the way. Look like YouTube is progressing a lot during the last couple of years so basically here is just a bunch of random videos creating my favorite playlist to learn all the cool stuff under the Kaldi’s hood.

YouTube

  1. Keith Chugg (USC) – Viterbi Algorithm
  2. Lim Zhi Hao (NTU) – WFST: A Nice Channel On Weighted Finite State Transducers
  3. Dan Povey (JHU) – ICASSP 2011 Kaldi Workshop: Dan Explaining Kaldi Basics
  4. Luis Serrano – The Covariance Matrix: To Understand GMM Acoustic Modeling

Kaldi

  1. Mehryar Mohri (NYU) – Speech Recognition with WFST: A joint work of RWTH and NYU
  2. Mehryar Mohri (NYU), Afshin Rostamizadeh – Foundations of Machine Learning
  3. George Doddington (US DoD) ICASSP 2011 – Human Assisted Speaker Recognition
  4. GitHub Kaldi – TED-LIUM Result: GMM, SGMM, Triple Deltas Comparison
  5. EE Columbia University – Speech Recognition Spring 2016
  6. D. Povey – Generating Lattices in the WFST : For understanding LattceFasterDecoder

Notes


Online Kaldi Decoding

Published on 2021-07-28 in Speech Recognition

Thanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection

  1. GStreamer – Dynamic pipelines
  2. Function that save lives! gst_caps_to_string(caps)
  3. GStreamer – GstBufferPool
  4. StackOverFlow – Gstreamer gst_buffer_pool_acquire_buffer function is slow on ARM
  5. GitHub – Alumae: GST-Kaldi-NNet2-Online
  6. StackOverFlow – How to create GstBuffer

Lost in the Vast Ocean of Speech Recognition

Published on 2020-12-08 in Speech Recognition

Here I am, pursuing once more the old-fashioned machine learning. I’ll keep it short and write down useful links

Books

  1. Dan Povey – HTK Book
  2. Ian Goodfellow – Deep Learning

Papers

  1. IEEE – Uncertainty Decoding with SPLICE for Noise Robust Speech Recognition

YouTube

  1. Hannes van Lier – Basic Introduction to Speech Recognition (HMM & Neural Networks)
  2. Luis Serrano – A friendly introduction to Bayes Theorem and Hidden Markov Models
  3. Djp3 – Hidden Markov Models, The forward-backward algorithm

Kaldi

  1. Kaldi ASR – FrameWork Wiki
  2. Kaldi Wiki – Kaldi Tutorial #1
  3. Dan Povey’s Homepage: Former Professor at John Hopkins University, Author of HTK Book
  4. KalDi WiKi – Kaldi for Dummies tutorial

Wikipedia

  1. Wikipedia – Dempster–Shafer theory
  2. Wikipedia – Expectation–maximization algorithm
  3. Wikipedia – Cepstral mean and variance normalization
  4. Wikipedia – Baum–Welch algorithm
  5. Wikipedia – Mutual Information

Blog Posts

  1. Medium – Jonathan Hui: ASR Model Training
  2. Medium – Jonathan Hui: Maximum Mutual Information Estimation (MMIE)
  3. Medium – Jonathan Hui: Weighted Finite-State Transducers

Others

  1. KDE Simon – CMU SPHINX Based ASR
  2. Speech Research International Language Model (SRILM)

Jargons

  1. LVCSR: Large Vocabulary Continuous Speech Recognition

next posts ›
close
menu