COM Object and C++

Posted by in Windows

• CoInitialize: Initializes the COM library for use by the calling thread, sets the thread’s concurrency model, and creates a new apartment • CoInitializeEx: More advanced version CoInitialize that specify the thread’s concurrency model • CoUninitialize: Should be called on deconstructor

And Bash Is Awesome!!!

Posted by in Linux, Uncategorized

Here I list cool bash tricks I learned: – Bash Heredoc

Chrome DevTools Remote Control in Linux Bash

Posted by in Linux

Ok the title is a bit long but why google create such a nice debug interface and make it so difficult to access it. 1. open chrome with remote debug enabled chromium –remote-debugging-port=9222 2. Install websocat to create websocket to chrome sudo pacman -S websocat 3. Find magic chrome ws url. To do that…

Kaldi Confidence Score

Posted by in Speech Recognition

To calculate word level confidence score Kaldi uses a method called MBR Decoding. MBR Decoding is a decoding process that minimize word level error rate (instead of minimizing the whole utterance cost) to calculate the result. This may not give the accurate result but can be use to calculate the confidence score up to some…

Kaldi Delta Features

Posted by in Speech Recognition

Delta-Delta feature is proposed in 1986 by S. Furui and Hermann Ney in 1990. It’s simply add first and second derivative of cepstrum to the feature vector. By doing that they say it can capture spectral dynamics and improve overall accuracy. The only problem is that in a discrete signal space getting derivative from the…

Kaldi FSTs Jargons

Posted by in Speech Recognition

• Lattices: Are a graph containing states(nodes) and arcs(edges). which each state represent one 10ms frame • Arcs: Are start from one state to another state. Each state arcs can be accessed with arc iterator and arcs only retain their next state. each arcs have weight and input and output label. • States: Are simple…

Kaldi Lattice Decoder

Posted by in Speech Recognition

• Link: Same as arc • Token: Are same as state. They have costs • FrameToks: A link list that contain all tokens in a single frame • Adaptive Beam: Used in pruning before creating lattice and through decoding • NEmitting Tokens: Non Emitting Tokens or NEmitting Tokens are tokens that generate from emitting token…

Kaldi GMM Overview

Posted by in Speech Recognition

A Simplified Block Diagram of ASR Process in Kaldi NGC Nvidia – Kaldi Container Oxinabox – Kaldi Notes KWS14 – Kaldi Lattices

Notes on Kaldi

Posted by in Speech Recognition

• Costs: Are Log Negative Probability, so a higher cost means lower probability. • Frame: Each 10ms of audio that using MFCC turned into a fixed size vector called a frame. • Beam: Cutoff would be Best Cost–Beam (Around 10 to 16) • Cutoff: The maximum cost that all cost higher than this value will…

It’s All About The Latency

Posted by in Electrical Engineering, Speech Recognition

Measure Microphone Latency in Linux with Alsa The command below generates a tone signal out of the speaker and receives it back through the mic. Measuring the phase diff will reveal the round-trip latency. alsa_delay hw:1,0 hw:0,0 44100 256 2 1 1 Here hw:1,0 refer to the recording device that can be retrieved from arecord…

‹ previous posts