mcrblg-header-image

search

RegEx: My Greatest Fear

Published on 2023-01-30 in Linux

Regex, Sed, and AWK are freaks in programming but they are pretty simple, well not at the beginning though.

Here I summarize some of the most amazing ones for RegEx

  1. To Be or Not To Be, is possible with: LookAround
  2. Stop Worrying! Regex101 is all you need to know

Qt MSVC vs MinGW in Windows

Published on 2022-12-31 in Software, Windows

I starting to use WinRT with Qt today and now after long time with MinGW, I’m switching to MSVC in Windows. Here is why

  1. MinGW is opensource but deep down if you are in Win32, MSVC compiler always offers better API compatibility
  2. WinRT is available only on MSVC
  3. MSVC is better with memory control access management using SEH which MinGW doesn’t offer
  4. MSVC offers pdb files That can help you if your program crashes. Then you can generate the core dump and debug using WinDbg
  5. If you ever get around to some dll that simply doesn’t work with your project, it’s because MinGW and MSVC ABI are not the same. and probably that DLL was compiled by MSVC not MinGW. Same OS and still a different ABI, sounds too Windowsy to me
  6. Because you are on Windows, show some support to the closed-source community!

COM Object and C++

Published on 2022-07-31 in Software, Windows
• CoInitialize: Initializes the COM library for use by the calling thread, sets the thread’s concurrency model, and creates a new apartment
• CoInitializeEx: More advanced version CoInitialize that specify the thread’s concurrency model
• CoUninitialize: Should be called on deconstructor

And Bash Is Awesome!!!

Published on 2022-07-08 in Linux

Here I list cool bash tricks I learned:

Bash Heredoc


Chrome DevTools Remote Control in Linux Bash

Published on 2022-05-06 in Linux

Ok the title is a bit long but why google create such a nice debug interface and make it so difficult to access it.

1. open chrome with remote debug enabled

chromium --remote-debugging-port=9222 https://github.com/

2. Install websocat to create websocket to chrome

sudo pacman -S websocat

3. Find magic chrome ws url. To do that visit following url

http://127.0.0.1:9222/json/list

4. Connect to the websocket

websocat ws://127.0.0.1:9222/devtools/browser/<GUID>

5. Execute magic command. Here just scrolling the page

{"id": 1, "method": "Runtime.evaluate", "params": {"expression": "document.documentElement.scrollTop = 600"}}

Few Notes


Kaldi Confidence Score

Published on 2022-04-30 in Speech Recognition

To calculate word level confidence score Kaldi uses a method called MBR Decoding. MBR Decoding is a decoding process that minimize word level error rate (instead of minimizing the whole utterance cost) to calculate the result. This may not give the accurate result but can be use to calculate the confidence score up to some level. Just don’t expect too much as the performance is not well-accurate.

Here are some key concepts:

1. Levenshtein Distance: Levenshtein Distance or Edit Distance compute difference between two sentences. It computes how many words are different between the two. Lets say X and Y are two word sequence shown below. The Levenshtein distance would be 3 where Ɛ represent empty word

To calculate the Levenshtein distance you can use following recursive algorithm where A and B are word sequence with length of N+1

As in all recursive algorithm to decrease amount of duplicate computation Kaldi used the memoization technique and store the above three circumstances in a1, a2 and a3 respectively

2. Forward-Backward Algorithm: Lets say you want to calculate the probability of seeing a waveform(or MFCC features) given a path in a lattice (or on HHM FST). Then the Forward-Backward Algorithm is nothing more than a optimized way to compute this probability.

3. Gamma Calculation: TBA

4. MBR Decoding: TBA


Kaldi Delta Features

Published on 2022-04-13 in Speech Recognition

Delta-Delta feature is proposed in 1986 by S. Furui and Hermann Ney in 1990. It’s simply add first and second derivative of cepstrum to the feature vector. By doing that they say it can capture spectral dynamics and improve overall accuracy.

The only problem is that in a discrete signal space getting derivative from the signal increase spontaneous noise level so instead of simple first and second order derivative HTK proposed a differentiation filter. This filter basically is a convoluted low-pass filter on top of discrete signal derivative to smooth out the result and remove unwanted noises. In Fig 1 you can see the result of simple second derivative vs the proposed differentiation filter.

Fig. 1. Plain dervative VS differentiation filter. courtesy of Matlab™

HTK filter for a Delta-Delta feature (order=2, window=2) is a 9 element FIR filter with following coefficient(Θ is window size which is 2 in HTK)

• Reverberation: Is the effect of sound bouncing the walls and getting back in a room. The time is roughly between 1 and 2 second in an ordinary room. You can use Sabine equation to do more accurate calculation.

IEEE ICASSP ’86 – Isolated Word Recognition Based on Emphasized Spectral Dynamics

IEEE ICASSP ’90 – Experiments on mixture-density phoneme-modelling for 1000-word DARPA task

Desh Raj Blog – Award-winning classic papers in ML and NLP


Kaldi FSTs Jargons

Published on 2022-03-29 in Speech Recognition
• Lattices: Are a graph containing states(nodes) and arcs(edges). which each state represent one 10ms frame
• Arcs: Are start from one state to another state. Each state arcs can be accessed with arc iterator and arcs only retain their next state. each arcs have weight and input and output label.
• States: Are simple decimal number starting from lat.Start(). and goes up to lat.NumStates(). Most of the time start is 0
• Topological Sort: An FST is topological sorted if the FST can be laid out on a horizontal axis and no arc direction would be from right to left
• Note 1: You can get max state with lat.NumStates()
• Note 2: You can prune lattices by creating dead end path. Dead end path is a path that’s not get end up to the final state. After that fst::connect will trim the FST and get rid of these dead paths

Fig. 1. Topologically Sorted Graph


Kaldi Lattice Decoder

Published on 2022-02-18 in Speech Recognition
• Link: Same as arc
• Token: Are same as state. They have costs
• FrameToks: A link list that contain all tokens in a single frame
• Adaptive Beam: Used in pruning before creating lattice and through decoding
• NEmitting Tokens: Non Emitting Tokens or NEmitting Tokens are tokens that generate from emitting token in the same frame and have input label = 0 and have acoustic_cost = 0
• Emitting Tokens: Emitting Tokens are tokens that surpass from a frame to another frame

Lattice Decoder In A Glance

Fig. 1. After First Emitting Nodes Process

Fig. 2. After Second Emitting Nodes Process


Kaldi GMM Overview

Published on 2022-01-06 in Speech Recognition

A Simplified Block Diagram of ASR Process in Kaldi


  1. NGC Nvidia – Kaldi Container
  2. Oxinabox – Kaldi Notes
  3. KWS14 – Kaldi Lattices

‹ previous posts next posts ›
close
menu