Kaldi GMM

Published on 2024-05-21 in Software, Speech Recognition

This is the golden formula in the speech recognition.

The argmax function means find the value of w that makes p(x|w) maximum. Here x is observation acoustic signal. So basically we compute all possible sequence and then for each one of them calculate the possibility of seeing such an acoustic signal. This is a very computation intensive process but by using HMM and CTC we try to minimize searching space. The process of guessing the correct sequence is called decoding in the speech recognition research field.

Transition Matrix

HMM is just bunch of states that transition from one state to the other. These would be called on every  emitting transitions and all of them can be expressed in a matrix that would be called transition matrix.

• Occupation counts: .occs It’s the per-transition-id occupation counts. They are rarely needed. e.g. might be used somewhere in the basis-fMLLR scripts.
• FMLLR: An acoustic feature extraction technique like MFCC but with focus on multi-speaker adaptation.
• Beam: Cutoff would be Best CostBeam (Around 10 to 16)

  1. OxinaBox Kaldi-Notes Train
  2. VpanaYotov: Decoding graph construction in Kaldi: A visual walkthrough
  3. Jonathan-Hui Medium: Speech Recognition GMM-HMM

Voltage Feedback Amplifier(VFB) vs Current Feedback Amplifier(CFB)

Published on 2024-02-06 in Electrical Engineering, Hardware Design

Voltage Feedback Amplifier(VFB) or Current Feedback Amplifier(CFB). that is the question!

This document will summarize the difference

Current Feedback Amplifier




Voltage Feedback Amplifier



Understanding Partitions in Android MediaTek Chips

Published on 2024-01-27 in Android

Flash Samsung Galaxy A34 SP Flash Tool

Published on 2024-01-27 in Android, Software

Scatter File

To use the SP Flash tool you need to have a scatter file. One easy way to find that is to look out for other devices that use the same chipset but the manufacturer releases the firmware file including the scatter file one such manufacturer is Xiaomi but you may find other manufacturers as well.

Samsung Galaxy A34 uses Dimensity 1080[MT6877v] and here is the list of other devices that use this chip as well:


Devices come in two kinds, eMMC and UFS. eMMC is just an SD card but in a package of a chip which then is called an embedded SD card or eMMC. Others come with UFS storage which is NAND flash in fancy words. You can determine your device type by looking at the specs. For A34 that is UFS 2.2.

Partition Starting Address

There’s a PIT (Partition Information Table) file inside all Samsung firmware. This file includes all partition starting addresses. Don’t know how to read it yet though

Small Notes

  1. If you are creating a scatter file from scratch know that you should put all partition inside and all correct start addresses. SP Flash Tool every time you flash even a single partition will update the device gpt partition table based on the scatter file you supplied thus be careful or backup the ptable before starting messing around
  2. Would be awesome if anyone knows a tool which can read the ptable, currently I used mtkclient but the support is not that great
  3. SP Flash Tool By SRAM, By DRAM option, choose how the file was first copied to the device and then write to the actual location. Both should work in normal conditions, SRAM used if DRAM has issues, pro during R&D.

Samsung MTK Force BRom (DM)

Published on 2024-01-23 in Android, Software

Half of Samsung devices come with a MediaTek chip. These chips come with a special mode called BROM or emergency mode. This is something that by default is not enabled but if the device goes into broken mode it will activate to allow the device to be flashed without the need for a jtag connection.
There’s a tool called Android Utility Tool that comes with a very shitty support, website, and documentation. In my journey, I thought I give it a shot as the phone that I was playing with wasn’t so important to me. Unfortunately, I used the tool to put my device into the BROM mode and the device got bricked with a black screen and no reaction whatsoever.
The solution was easy just get a stock ROM extract Bootloader files uncompress the LZ4 file format and then use the write boot_section and write a preloader file to the device. And your device will work revive again.
Final note MediaTek mode only enables for a few seconds after you reboot the device so each time you want to execute an action you have to keep Holding the power button or some combination for a few seconds or more.

How to Downgrade Samsung SW REV. CHECK FAIL

Published on 2024-01-20 in Android

If you try to downgrade a new Samsung phone to an older firmware using Odin you are gonna get SW REV. CHECK FAIL. Fortunately, there’s a fix for this but it takes a little bit of patience

Quick Guide

  1. Download and install 7-Zip
  2. Download the required tool.
  3. Extract the Ap_<Version>.tar.md5​ content in the same place as the required tools.
  4. Convert .lz4 files to .img​ by dragging and dropping them on lz4.exe​
  5. Run SignRemover
  6. Pack the whole dir except lz4 files and tools to tar with 7-Zip
  7. Do the same for the bootloader(Ap_<Version>.tar.md5​) and flash normally using Odin

If needed place vb_mate.img in AP slot to disable AVB

How to Downgrade Android Version in Samsung Devices if Device is in Higher Binery


Web Extension Console Firefox

Published on 2023-12-19 in Software

Developing Firefox extensions could be rough but shouldn’t be. here are few techniques to smooth out the process

1. Dev Console

Access all extension logs by visiting about:debugging#/runtime/this-firefox and clicking on the Inspect button to see console.log logs.

For background scripts, you can see the logs also in the browser console by pressing Ctrl+Shift+J

2. Try WebExtension API Live

You can access WebExtension API Here and you can try them in about:debugging#/runtime/this-firefox in the same console. As an example browser.tabs.query({active: true}) will give you the current tab

3. Terms

A browser action is a button that your extension adds to the browser’s toolbar

4. Installation

If you want to install your add-on you need to first sign it on the Mozilla platform but this can take time instead you can install Firefox Developer Edition and enable xpinstall.signatures.required to disable sign enforcement and then normally install your add-on

  1. Anatomy of Extention
  2. Windows Firefox Dev Edition 100.0

How UDP Hole Punching Work

Published on 2023-12-13 in Linux, Software, Windows

TCP/UDP hole punching or NAT traversal works as following:

A and B are behind NAT and want to communicate, while you have public relay server, S.

1. A connects to S, B connects to S

2. S send A ip and port to B, and send B ip and port to A

3. One of A or B try to connect to the other by the address S shared

Note1: For hole punching you don’t need uPnP IGD or port forwarding

Note2: UDP hole punching works more reliably than TCP hole punching as it’s connectionless by nature and don’t need SYN packet

Note3: Hole punching isn’t a reliable technique as router or other firewall may see B ip address is different from S ip address and block the inbound connection

Note4: STUN is a standard protocol that implement UDP hole punching although you can create a custom protocol as well following the above steps

RNN vs CNN in Speech Recognition

Published on 2023-10-06 in Speech Recognition

RNN or CNN, that’s the question. so what should you use?

Let the battle begin

So speech recognition is a very broad task. People use speech recognition to do speech-to-text on videos, a pre-recorded data which you can go back and forth between past and future and optimize the output a couple of times. On the other hand voice control is also a speech recognition task, but you need to do all this speech processing in real time. And within a low latency time manner.

And now comes the big question. Which technologies should you use RNN or CNN? in this post, We’re going to talk about that

so generally, speech waveform data by using a CMVN or MFCC, can be converted to 2D image data and then, from that point is basically an image that you can show to people and people can learn how each word will look like. So, it is basically detecting where exactly the word is happening. And it’s very similar to an object detection task. very similar, but not quite the same. And why is that? So, a lot of times we also have trouble detecting words but we are using the language grammar in the background of our head to predict what exactly the next word would look like. And we’re using that and combining that with the waveform data and then we detect the right word so if you say, a very strange word to people, they will have trouble getting the correct text out of it. But if you teach them a couple of times, and they know that when these words pop up, they will have much less trouble detecting them. So the machine learning community uses the same approach.

In modern speech recognition engines ML engineers first use CNN to capture the features, or at least detect how likely the word is, and then run an RNN in the background as a language model to improve the result. So in the case of voice control, you don’t care about the language model, because there is no language model. You can say whatever word you want, or at least we give you that freedom, and then you just need the text out of the word that you just said.

So in that case there is no use for RNN as there is no language model. And a 1D convolutional network is enough. So it is the same as localization and object detection in classic machine learning. So if you use the same technique as YOLO to move around the convolution layer, around the waveform, and just detect the maximum confidence score on a window, then you can find the exact word happening at that time. The problem is as the number of words increases and increases this technique will become more and more challenging. So you need to develop more mature techniques. And that’s exactly why we introduced HMM. The best technique is to use a hidden Markov model To detect which word is spoken in a certain way and then slide that word over the signal and find out if it’s actually that signal or not. And by using that we can do that alignment. We can do better force alignment, use that data, and also feed it to Hmm, To increase the accuracy and finally, we create this awesome engine with a great amount of accuracy that no one has seen ever before.

So wait for it and Sleep on it.

Windows Accurate Timing

Published on 2023-09-13 in Software, Windows

There’s a lot of controversial discussion over the internet about achieving accurate sleep function on Windows platforms. The problem is most of them are very old and with the introduction of multicore processors many of the older functions break down but this is not the case in 2023.
Nowadays you can easily call the cross-platform C++ 11 chrono function and with the following source code, I could achieve one millisecond accuracy which is more than enough for my application.

#include <chrono>
#include <thread>

Before I used the QThread::msleep function which had an accuracy of about 5ms to 15ms which was a lot more than what I imagined even when I used the QThread::usleep function.
There is an issue that is if you call the sleep function on a thread the OS scheduling system will put your application to sleep and it may take a while till the scheduler picks up your application again. To prevent this issue you need to specifically tell OS to treat your application differently than others and C++ 11 introduces chrono which uses QueryPerformanceCounter in the background Windows API to make sure Windows scheduler will pick your application up at the right time
You can go ahead and directly call the Windows API function but nowadays C++ 11 is nicely integrated into a lot of environments and it’s also a cross-platform solution so lucky you, you don’t need to get your hands dirty anymore.

  1. YouTube – Test and Set Synchronization Primitive
  2. RandomASCii: Windows Timer
  3. Microsoft – Windows Performance Analyzer
  4. YouTube – CppCon 2017: Fedor Pikus “C++ atomics, from basic to advanced.”

‹ previous posts