Published on 2024-11-04 in
Software
1. Create user, -m
means create no home dir.
useradd -M <username>
2. Create ssh-key, -C
is specifying the comment.
ssh-keygen -t rsa -b 4096 -C "<comment>"
3. Download private key
and put it in .ssh
folder on client side.
4. Move public key
to /etc/ssh/authorized_keys/
.
5. Set permission.
chown <username> /etc/ssh/authorized_keys/<username>.pub
chmod 644 /etc/ssh/authorized_keys/<username>.pub
6. Edit sshd_config
vim /etc/ssh/sshd_config
------------------------
Match User <username>
AuthorizedKeysFile /etc/ssh/authorized_keys/<username>.pub
due to complication permission thing only use the etc folder
Published on 2024-05-21 in
Software,
Speech Recognition
This is the golden formula in the speech recognition.
The argmax function means find the value of w that makes p(x|w) maximum. Here x is observation acoustic signal. So basically we compute all possible sequence and then for each one of them calculate the possibility of seeing such an acoustic signal. This is a very computation intensive process but by using HMM and CTC we try to minimize searching space. The process of guessing the correct sequence is called decoding in the speech recognition research field.
Transition Matrix
HMM is just bunch of states that transition from one state to the other. These would be called on every emitting transitions and all of them can be expressed in a matrix that would be called transition matrix.
• Occupation counts: |
.occs It’s the per-transition-id occupation counts. They are rarely needed. e.g. might be used somewhere in the basis-fMLLR scripts. |
• FMLLR: |
An acoustic feature extraction technique like MFCC but with focus on multi-speaker adaptation. |
• Beam: |
Cutoff would be Best Cost –Beam (Around 10 to 16) |
• Deterministic FST: |
A FST that each state has at most one transition with any given input label and there are no input eps-labels. |
Questions
- Why to use -logarithm probabilities: For numerical stability.
- What’s the difference between WFSA and WFST: Acceptors only have output, but transducers have input and output
- sometimes in implementation We implement WFSA as WFST But all nodes have same input and output This is done to simply implement WFSA using normal WFSA without changing the implementation
- What are the input and output nodes in WFST: Inputs are usually phoneme and outputs are words and usually along the way of phonemes, the output is just empty or epsilon except final node
- OxinaBox Kaldi-Notes Train
- VpanaYotov: Decoding graph construction in Kaldi: A visual walkthrough
- Jonathan-Hui Medium: Speech Recognition GMM-HMMl
- Mehryar Mohri: Weighted finite-state transducers in speech recognition
Voltage Feedback Amplifier(VFB) or Current Feedback Amplifier(CFB). that is the question!
This document will summarize the difference
Current Feedback Amplifier
Advantage
- No fixed gain bandwidth product (high gain and high bandwidth at the same time!)
- Ultra slew rate (Inverting configuration maximizes input slew rate)
Disadvantage
- Only the non-inverting input is the high input impedance
- feedback resistor plays a large role in amplifier stability this can also limit the value of the gain set resistor.
- The signal bandwidth is determined by RF(feedback resistor) and not by the circuit gain
- Limited DC gain
- A larger headroom is needed for the output
Notes
- CFA circuits must never include a direct capacitance between the output and inverting input pins as this often leads to oscillation
Voltage Feedback Amplifier
Advantage
- Easier to design, no sensitivity to the values of feedback resisters
- High DC gain
Published on 2024-01-27 in
Android
Preloader
: initial bootloader code that runs when the device is powered on.
Vbmeta
: Verified Boot metadata, which is used to verify the integrity of the boot image.
Vbmeta_system
: Verified Boot metadata for the system partition.
Vbmeta_vendor
: Verified Boot metadata for the vendor partition.
Spmfw
: Secure Partition Manager firmware.
Lk
: Little Kernel bootloader.
Boot
: kernel and ramdisk images used to boot the device.
Dtbo
: device tree binary object.
Tee
: Trusted Execution Environment.
Efuse
: MediaTek Specific Data for RF Parameters and other chip-specific properties.
Super
: metadata for all dynamic partitions on the device.
Cust
: A partition that contains customer-specific data.
Rescue
: A partition that contains a recovery image that can be used to restore the device to its factory state.
Userdata
: user’s data, such as apps, photos, and documents.
Published on 2024-01-27 in
Android,
Software
Scatter File
To use the SP Flash tool you need to have a scatter file. One easy way to find that is to look out for other devices that use the same chipset but the manufacturer releases the firmware file including the scatter file one such manufacturer is Xiaomi but you may find other manufacturers as well.
Samsung Galaxy A34 uses Dimensity 1080[MT6877v] and here is the list of other devices that use this chip as well:
UFS Or eMMC
Devices come in two kinds, eMMC and UFS. eMMC is just an SD card but in a package of a chip which then is called an embedded SD card or eMMC. Others come with UFS storage which is NAND flash in fancy words. You can determine your device type by looking at the specs. For A34 that is UFS 2.2.
Partition Starting Address
There’s a PIT (Partition Information Table) file inside all Samsung firmware. This file includes all partition starting addresses. Don’t know how to read it yet though
Small Notes
- If you are creating a scatter file from scratch know that you should put all partition inside and all correct start addresses. SP Flash Tool every time you flash even a single partition will update the device gpt partition table based on the scatter file you supplied thus be careful or backup the ptable before starting messing around
- Would be awesome if anyone knows a tool which can read the ptable, currently I used mtkclient but the support is not that great
- SP Flash Tool
By SRAM
, By DRAM
option, choose how the file was first copied to the device and then write to the actual location. Both should work in normal conditions, SRAM used if DRAM has issues, pro during R&D.
Published on 2024-01-23 in
Android,
Software
Half of Samsung devices come with a MediaTek chip. These chips come with a special mode called BROM or emergency mode. This is something that by default is not enabled but if the device goes into broken mode it will activate to allow the device to be flashed without the need for a jtag connection.
There’s a tool called Android Utility Tool that comes with a very shitty support, website, and documentation. In my journey, I thought I give it a shot as the phone that I was playing with wasn’t so important to me. Unfortunately, I used the tool to put my device into the BROM mode and the device got bricked with a black screen and no reaction whatsoever.
The solution was easy just get a stock ROM extract Bootloader files uncompress the LZ4 file format and then use the write boot_section and write a preloader file to the device. And your device will work revive again.
Final note MediaTek mode only enables for a few seconds after you reboot the device so each time you want to execute an action you have to keep Holding the power button or some combination for a few seconds or more.
Published on 2024-01-20 in
Android
If you try to downgrade a new Samsung phone to an older firmware using Odin you are gonna get SW REV. CHECK FAIL
. Fortunately, there’s a fix for this but it takes a little bit of patience
Quick Guide
- Download and install 7-Zip
- Download the required tool.
- Extract the
Ap_<Version>.tar.md5
content in the same place as the required tools.
- Convert
.lz4
files to .img
by dragging and dropping them on lz4.exe
- Run
SignRemover
- Pack the whole dir except
lz4
files and tools to tar with 7-Zip
- Do the same for the bootloader(
Ap_<Version>.tar.md5
) and flash normally using Odin
If needed place vb_mate.img in AP slot to disable AVB
– How to Downgrade Android Version in Samsung Devices if Device is in Higher Binery
Published on 2023-12-19 in
Software
Developing Firefox extensions could be rough but shouldn’t be. here are few techniques to smooth out the process
1. Dev Console
Access all extension logs by visiting about:debugging#/runtime/this-firefox
and clicking on the Inspect button to see console.log logs.
For background scripts, you can see the logs also in the browser console by pressing Ctrl+Shift+J
2. Try WebExtension API Live
You can access WebExtension API Here and you can try them in about:debugging#/runtime/this-firefox
in the same console. As an example browser.tabs.query({active: true})
will give you the current tab
3. Terms
A browser action is a button that your extension adds to the browser’s toolbar
4. Installation
If you want to install your add-on you need to first sign it on the Mozilla platform but this can take time instead you can install Firefox Developer Edition and enable xpinstall.signatures.required
to disable sign enforcement and then normally install your add-on
- Anatomy of Extention
- Windows Firefox Dev Edition 100.0
Published on 2023-12-13 in
Linux,
Software,
Windows
TCP/UDP hole punching or NAT traversal works as following:
A and B are behind NAT and want to communicate, while you have public relay server, S.
1. A connects to S, B connects to S
2. S send A ip and port to B, and send B ip and port to A
3. One of A or B try to connect to the other by the address S shared
Note1: For hole punching you don’t need uPnP IGD or port forwarding
Note2: UDP hole punching works more reliably than TCP hole punching as it’s connectionless by nature and don’t need SYN packet
Note3: Hole punching isn’t a reliable technique as router or other firewall may see B ip address is different from S ip address and block the inbound connection
Note4: STUN is a standard protocol that implement UDP hole punching although you can create a custom protocol as well following the above steps
Published on 2023-10-06 in
Speech Recognition
RNN or CNN, that’s the question. so what should you use?
Let the battle begin
So speech recognition is a very broad task. People use speech recognition to do speech-to-text on videos, a pre-recorded data which you can go back and forth between past and future and optimize the output a couple of times. On the other hand voice control is also a speech recognition task, but you need to do all this speech processing in real time. And within a low latency time manner.
And now comes the big question. Which technologies should you use RNN or CNN? in this post, We’re going to talk about that
so generally, speech waveform data by using a CMVN or MFCC, can be converted to 2D image data and then, from that point is basically an image that you can show to people and people can learn how each word will look like. So, it is basically detecting where exactly the word is happening. And it’s very similar to an object detection task. very similar, but not quite the same. And why is that? So, a lot of times we also have trouble detecting words but we are using the language grammar in the background of our head to predict what exactly the next word would look like. And we’re using that and combining that with the waveform data and then we detect the right word so if you say, a very strange word to people, they will have trouble getting the correct text out of it. But if you teach them a couple of times, and they know that when these words pop up, they will have much less trouble detecting them. So the machine learning community uses the same approach.
In modern speech recognition engines ML engineers first use CNN to capture the features, or at least detect how likely the word is, and then run an RNN in the background as a language model to improve the result. So in the case of voice control, you don’t care about the language model, because there is no language model. You can say whatever word you want, or at least we give you that freedom, and then you just need the text out of the word that you just said.
So in that case there is no use for RNN as there is no language model. And a 1D convolutional network is enough. So it is the same as localization and object detection in classic machine learning. So if you use the same technique as YOLO to move around the convolution layer, around the waveform, and just detect the maximum confidence score on a window, then you can find the exact word happening at that time. The problem is as the number of words increases and increases this technique will become more and more challenging. So you need to develop more mature techniques. And that’s exactly why we introduced HMM. The best technique is to use a hidden Markov model To detect which word is spoken in a certain way and then slide that word over the signal and find out if it’s actually that signal or not. And by using that we can do that alignment. We can do better force alignment, use that data, and also feed it to Hmm, To increase the accuracy and finally, we create this awesome engine with a great amount of accuracy that no one has seen ever before.
So wait for it and Sleep on it.