This is the golden formula in the speech recognition.
The argmax function means find the value of w that makes p(x|w) maximum. Here x is observation acoustic signal. So basically we compute all possible sequence and then for each one of them calculate the possibility of seeing such an acoustic signal. This is a very computation intensive process but by using HMM and CTC we try to minimize searching space. The process of guessing the correct sequence is called decoding in the speech recognition research field.
HMM is just bunch of states that transition from one state to the other. These would be called on every emitting transitions and all of them can be expressed in a matrix that would be called transition matrix.
• Occupation counts: | .occs It’s the per-transition-id occupation counts. They are rarely needed. e.g. might be used somewhere in the basis-fMLLR scripts. |
• FMLLR: | An acoustic feature extraction technique like MFCC but with focus on multi-speaker adaptation. |
• Beam: | Cutoff would be Best Cost –Beam (Around 10 to 16) |
• Deterministic FST: | A FST that each state has at most one transition with any given input label and there are no input eps-labels. |
Voltage Feedback Amplifier(VFB) or Current Feedback Amplifier(CFB). that is the question!
This document will summarize the difference
Preloader
: initial bootloader code that runs when the device is powered on.Vbmeta
: Verified Boot metadata, which is used to verify the integrity of the boot image.Vbmeta_system
: Verified Boot metadata for the system partition.Vbmeta_vendor
: Verified Boot metadata for the vendor partition.Spmfw
: Secure Partition Manager firmware.Lk
: Little Kernel bootloader.Boot
: kernel and ramdisk images used to boot the device.Dtbo
: device tree binary object.Tee
: Trusted Execution Environment.Efuse
: MediaTek Specific Data for RF Parameters and other chip-specific properties.Super
: metadata for all dynamic partitions on the device.Cust
: A partition that contains customer-specific data.Rescue
: A partition that contains a recovery image that can be used to restore the device to its factory state.Userdata
: user’s data, such as apps, photos, and documents.To use the SP Flash tool you need to have a scatter file. One easy way to find that is to look out for other devices that use the same chipset but the manufacturer releases the firmware file including the scatter file one such manufacturer is Xiaomi but you may find other manufacturers as well.
Samsung Galaxy A34 uses Dimensity 1080[MT6877v] and here is the list of other devices that use this chip as well:
Devices come in two kinds, eMMC and UFS. eMMC is just an SD card but in a package of a chip which then is called an embedded SD card or eMMC. Others come with UFS storage which is NAND flash in fancy words. You can determine your device type by looking at the specs. For A34 that is UFS 2.2.
There’s a PIT (Partition Information Table) file inside all Samsung firmware. This file includes all partition starting addresses. Don’t know how to read it yet though
By SRAM
, By DRAM
option, choose how the file was first copied to the device and then write to the actual location. Both should work in normal conditions, SRAM used if DRAM has issues, pro during R&D.Half of Samsung devices come with a MediaTek chip. These chips come with a special mode called BROM or emergency mode. This is something that by default is not enabled but if the device goes into broken mode it will activate to allow the device to be flashed without the need for a jtag connection.
There’s a tool called Android Utility Tool that comes with a very shitty support, website, and documentation. In my journey, I thought I give it a shot as the phone that I was playing with wasn’t so important to me. Unfortunately, I used the tool to put my device into the BROM mode and the device got bricked with a black screen and no reaction whatsoever.
The solution was easy just get a stock ROM extract Bootloader files uncompress the LZ4 file format and then use the write boot_section and write a preloader file to the device. And your device will work revive again.
Final note MediaTek mode only enables for a few seconds after you reboot the device so each time you want to execute an action you have to keep Holding the power button or some combination for a few seconds or more.
If you try to downgrade a new Samsung phone to an older firmware using Odin you are gonna get SW REV. CHECK FAIL
. Fortunately, there’s a fix for this but it takes a little bit of patience
Ap_<Version>.tar.md5
content in the same place as the required tools..lz4
files to .img
by dragging and dropping them on lz4.exe
SignRemover
lz4
files and tools to tar with 7-ZipAp_<Version>.tar.md5
) and flash normally using OdinIf needed place vb_mate.img in AP slot to disable AVB
– How to Downgrade Android Version in Samsung Devices if Device is in Higher Binery
Developing Firefox extensions could be rough but shouldn’t be. here are few techniques to smooth out the process
Access all extension logs by visiting about:debugging#/runtime/this-firefox
and clicking on the Inspect button to see console.log logs.
For background scripts, you can see the logs also in the browser console by pressing Ctrl+Shift+J
You can access WebExtension API Here and you can try them in about:debugging#/runtime/this-firefox
in the same console. As an example will give you the current tab
A browser action is a button that your extension adds to the browser’s toolbar
If you want to install your add-on you need to first sign it on the Mozilla platform but this can take time instead you can install Firefox Developer Edition and enable xpinstall.signatures.required
to disable sign enforcement and then normally install your add-on
TCP/UDP hole punching or NAT traversal works as following:
A and B are behind NAT and want to communicate, while you have public relay server, S.
1. A connects to S, B connects to S
2. S send A ip and port to B, and send B ip and port to A
3. One of A or B try to connect to the other by the address S shared
Note1: For hole punching you don’t need uPnP IGD or port forwarding
Note2: UDP hole punching works more reliably than TCP hole punching as it’s connectionless by nature and don’t need SYN packet
Note3: Hole punching isn’t a reliable technique as router or other firewall may see B ip address is different from S ip address and block the inbound connection
Note4: STUN is a standard protocol that implement UDP hole punching although you can create a custom protocol as well following the above steps
RNN or CNN, that’s the question. so what should you use?
Let the battle begin
So speech recognition is a very broad task. People use speech recognition to do speech-to-text on videos, a pre-recorded data which you can go back and forth between past and future and optimize the output a couple of times. On the other hand voice control is also a speech recognition task, but you need to do all this speech processing in real time. And within a low latency time manner.
And now comes the big question. Which technologies should you use RNN or CNN? in this post, We’re going to talk about that
so generally, speech waveform data by using a CMVN or MFCC, can be converted to 2D image data and then, from that point is basically an image that you can show to people and people can learn how each word will look like. So, it is basically detecting where exactly the word is happening. And it’s very similar to an object detection task. very similar, but not quite the same. And why is that? So, a lot of times we also have trouble detecting words but we are using the language grammar in the background of our head to predict what exactly the next word would look like. And we’re using that and combining that with the waveform data and then we detect the right word so if you say, a very strange word to people, they will have trouble getting the correct text out of it. But if you teach them a couple of times, and they know that when these words pop up, they will have much less trouble detecting them. So the machine learning community uses the same approach.
In modern speech recognition engines ML engineers first use CNN to capture the features, or at least detect how likely the word is, and then run an RNN in the background as a language model to improve the result. So in the case of voice control, you don’t care about the language model, because there is no language model. You can say whatever word you want, or at least we give you that freedom, and then you just need the text out of the word that you just said.
So in that case there is no use for RNN as there is no language model. And a 1D convolutional network is enough. So it is the same as localization and object detection in classic machine learning. So if you use the same technique as YOLO to move around the convolution layer, around the waveform, and just detect the maximum confidence score on a window, then you can find the exact word happening at that time. The problem is as the number of words increases and increases this technique will become more and more challenging. So you need to develop more mature techniques. And that’s exactly why we introduced HMM. The best technique is to use a hidden Markov model To detect which word is spoken in a certain way and then slide that word over the signal and find out if it’s actually that signal or not. And by using that we can do that alignment. We can do better force alignment, use that data, and also feed it to Hmm, To increase the accuracy and finally, we create this awesome engine with a great amount of accuracy that no one has seen ever before.
So wait for it and Sleep on it.
There’s a lot of controversial discussion over the internet about achieving accurate sleep function on Windows platforms. The problem is most of them are very old and with the introduction of multicore processors many of the older functions break down but this is not the case in 2023.
Nowadays you can easily call the cross-platform C++ 11 chrono function and with the following source code, I could achieve one millisecond accuracy which is more than enough for my application.
#include <chrono>
#include <thread>
std::this_thread::sleep_for(std::chrono::microseconds(500));
Before I used the QThread::msleep
function which had an accuracy of about 5ms
to 15ms
which was a lot more than what I imagined even when I used the QThread::usleep
function.
There is an issue that is if you call the sleep function on a thread the OS scheduling system will put your application to sleep and it may take a while till the scheduler picks up your application again. To prevent this issue you need to specifically tell OS to treat your application differently than others and C++ 11 introduces chrono which uses QueryPerformanceCounter
in the background Windows API to make sure Windows scheduler will pick your application up at the right time
You can go ahead and directly call the Windows API function but nowadays C++ 11 is nicely integrated into a lot of environments and it’s also a cross-platform solution so lucky you, you don’t need to get your hands dirty anymore.