mcrblg-header-image

search

Speech Recognition II(Developing Kaldi)

Published on 2021-08-02 in Speech Recognition

Let’s Enhance Kaldi, Here are some links along the way. Look like YouTube is progressing a lot during the last couple of years so basically here is just a bunch of random videos creating my favorite playlist to learn all the cool stuff under the Kaldi’s hood.

YouTube

  1. Keith Chugg (USC) – Viterbi Algorithm
  2. Lim Zhi Hao (NTU) – WFST: A Nice Channel On Weighted Finite State Transducers
  3. Dan Povey (JHU) – ICASSP 2011 Kaldi Workshop: Dan Explaining Kaldi Basics
  4. Luis Serrano – The Covariance Matrix: To Understand GMM Acoustic Modeling

Kaldi

  1. Mehryar Mohri (NYU) – Speech Recognition with WFST: A joint work of RWTH and NYU
  2. Mehryar Mohri (NYU), Afshin Rostamizadeh – Foundations of Machine Learning
  3. George Doddington (US DoD) ICASSP 2011 – Human Assisted Speaker Recognition
  4. GitHub Kaldi – TED-LIUM Result: GMM, SGMM, Triple Deltas Comparison
  5. EE Columbia University – Speech Recognition Spring 2016
  6. D. Povey – Generating Lattices in the WFST : For understanding LattceFasterDecoder

Notes


Online Kaldi Decoding

Published on 2021-07-28 in Speech Recognition

Thanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection

  1. GStreamer – Dynamic pipelines
  2. Function that save lives! gst_caps_to_string(caps)
  3. GStreamer – GstBufferPool
  4. StackOverFlow – Gstreamer gst_buffer_pool_acquire_buffer function is slow on ARM
  5. GitHub – Alumae: GST-Kaldi-NNet2-Online
  6. StackOverFlow – How to create GstBuffer

WDM, WDK, DDK, HDK, SDK and ….

Published on 2021-05-21 in Software, Windows

On the way to develop a driver for Scarlet Solo Gen3 to harness the power of Shure SM57 Dynamic Microphone.

Useful links to preserve:

  1. Microsoft – Universal Audio Architecture: Guideline to for Sound Card Without Propriety Driver

  2. Microsoft – Introduction to Port Class

  3. Microsoft – AVStream Overview
  4. Microsoft – WDM Audio Terminology

  5. Microsoft – Kernel Streaming
  6. Microsoft – KS Filters

Update 1: Finished developing! Here is the link to the released driver

GitHub – BijanBina/BAudio Windows 7 x64


HaLseY and TaLoN!

Published on 2021-04-20 in Zest

So the third year has been passed. I mostly worked on developing a couple of hardware projects. Halsey music was a big passion there.

Learning all ML cool stuff now is one of my top priority. Combine it with the emerge of Talon, a powerful C2 grammar framework by Ryan Hileman, and wave2letter a game-changing speech recognition engine from the Facebook AI department, I have some hope to make distinct progress.
Watching Emily Shea demonstrating how she uses Talon to write Perl was a big improvement over the past few years. And then Ryan last week’s tweet: .

Conformer better handles accents as well as fast speech. Here's a demo dictating Vue code
at high speed with the new model, with no errors. Compared to two typists on the same code:
80wpm typist took 1m54s, 120wpm took 53s. It took me 1m15s with voice. I think I could go faster!

— Ryan Hileman (@lunixbochs) April 3, 2021

This month also Microsoft bought Nuance for $19.7 billion, This will be Microsoft’s second-biggest deal ever. Now the industry is going to see a tremendous change in the SR area.


YouTube – PyGotham 2018:Coding by Voice with Dragonfly

GitHub – AccJoon: MSAA-Based Tool to Access Any Control in Win32

GitHub – Rebound: Control Linux and Windows with remote XBox-One Controller

Google Cloud Platform Podcast – Voice Coding with Emily Shea and Ryan Hileman

TheRegister – Microsoft acquires Nuance—makers of Dragon—for $19.7 billion

YouTube – Halsey: Nightmare (Live From The Armory)


Lost in the Vast Ocean of Speech Recognition

Published on 2020-12-08 in Speech Recognition

Here I am, pursuing once more the old-fashioned machine learning. I’ll keep it short and write down useful links

Books

  1. Dan Povey – HTK Book
  2. Ian Goodfellow – Deep Learning

Papers

  1. IEEE – Uncertainty Decoding with SPLICE for Noise Robust Speech Recognition

YouTube

  1. Hannes van Lier – Basic Introduction to Speech Recognition (HMM & Neural Networks)
  2. Luis Serrano – A friendly introduction to Bayes Theorem and Hidden Markov Models
  3. Djp3 – Hidden Markov Models, The forward-backward algorithm

Kaldi

  1. Kaldi ASR – FrameWork Wiki
  2. Kaldi Wiki – Kaldi Tutorial #1
  3. Dan Povey’s Homepage: Former Professor at John Hopkins University, Author of HTK Book
  4. KalDi WiKi – Kaldi for Dummies tutorial

Wikipedia

  1. Wikipedia – Dempster–Shafer theory
  2. Wikipedia – Expectation–maximization algorithm
  3. Wikipedia – Cepstral mean and variance normalization
  4. Wikipedia – Baum–Welch algorithm
  5. Wikipedia – Mutual Information

Blog Posts

  1. Medium – Jonathan Hui: ASR Model Training
  2. Medium – Jonathan Hui: Maximum Mutual Information Estimation (MMIE)
  3. Medium – Jonathan Hui: Weighted Finite-State Transducers

Others

  1. KDE Simon – CMU SPHINX Based ASR
  2. Speech Research International Language Model (SRILM)

Jargons

  1. LVCSR: Large Vocabulary Continuous Speech Recognition

ZC702 FMCOMMS3 PetaLinux Starting Guide

Published on 2019-07-08 in Electrical Engineering, Linux, Xilinx

The combination of FMCOMMS3 and PetaLinux is working only on Ubuntu 16.04 LTS, PetaLinux 2018.3, Vivado 2018.3

Required Packages:

sudo apt-get install -y gcc git make net-tools libncurses5-dev tftpd zlib1g-dev libssl-dev flex bison libselinux1 gnupg wget diffstat chrpath socat xterm autoconf libtool tar unzip texinfo zlib1g-dev gcc-multilib build-essential libsdl1.2-dev libglib2.0-dev zlib1g:i386 screen pax gzip

Installing PetaLinux

Create a new directory

sudo mkdir -m 755 PetaLinux 
sudo chown bijan ./PetaLinux

Install PetaLinux by running the following command.

./petalinux-v2018.3-final-installer.run .

Building Vivado Project

Clone Analog Devices HDL repository

git clone https://github.com/analogdevicesinc/hdl.git
git clone https://github.com/analogdevicesinc/meta-adi.git

Make HDL Project

export PATH="$PATH:/mnt/hdd1/Vivado/Vivado/2018.3/bin"
make fmcomms2.zc702

Creating a New PetaLinux Project:

source ../settings.sh
petalinux-create --type project --template zynq --name fmcomms3_linux

Then change directory to the created project directory.

petalinux-config --get-hw-description=<hdf file directory>

set Subsystem AUTO Hardware Settings -> Advanced bootable
images storage setting -> u-boot env partition settings -> image
storage media -> primary sd

/home/bijan/Projects/ADI_Linux/meta-adi/meta-adi-core
/home/bijan/Projects/ADI_Linux/meta-adi/meta-adi-xilinx

Download following files and write it down to meta-adi/meta-adi-xilinx/recipes-bsp/device-tree/files

device-tree.bbappend

pl-delete-nodes-zynq-zc702-adv7511-ad9361-fmcomms2-3.dtsi

zynq-zc702-adv7511-ad9361-fmcomms2-3.dts

Build PetaLinux:

To build petalinux run following command inside petalinux directory

petalinux-build

In case of error remove -e from first line of system-user.dtsi file inside build/tmp/work/plnx_zynq7-xilinx-linux-gnueabi/device-tree/xilinx+gitAUTOINC+b7466bbeee-r0/system-user.dtsi

Program ZC-702 FPGA Board Through JTAG

Install Digilent Drivers

<Vivado Install Dir>/data/xicom/cable_drivers/lin64/install_script/install_drivers/install_drivers

To program the board using jtag interface. First we should package the kernel with the following command.

petalinux-package --boot --fsbl images/linux/zynq_fsbl.elf --fpga images/linux/system.bit --u-boot --force

Then login to the root account and run following commands.

petalinux-package --prebuilt --fpga images/linux/system.bit --force
petalinux-boot --jtag --prebuilt 3 -v
petalinux-boot --jtag --fpga --bitstream images/linux/system.bit

Program ZC-702 FPGA Board Through SD-Card

Enable SW16.3 & SW16.4 on ZC702 Board.

Generate BOOT.BIN file by executing following command:

petalinux-package --boot --fsbl images/linux/zynq_fsbl.elf --fpga images/linux/system.bit --u-boot --force

copy image.ub and BOOT.BIN to SD-Card

Customize Username and Password

To change username and password open

meta-adi/meta-adi-xilinx/recipes-core/images/petalinux-user-image.bbappend

Change analog to your desired password. If you want to remove login requirement comment EXTRA_USERS_PARAMS and enable debug-tweak in petalinux-config -c rootfs.

Change UART BaudRate

To change UART baudrate run

petalinux-config

go to Subsystem AUTO Hardware Settings -> Serial Settings -> System stdin/stdout baudrate


Useful Links

Analog Wiki – Building with Petalinux

Analog Wiki – HDL Releases

GitHub – Analog Device No OS


Start Microwave Layout In ADS 2015.1

Published on 2018-05-08 in Hardware Design

ADS has a broad way of aspects from IC design to the RF simulation, here we explore how to prepare your workspace to start layout phase after schematic design. ADS comes with tons of ready to use parts, these parts are available at <ADS>/ADS/oalibs/componentLib/. Here I demonstrate how to add and use RF_Passive_SMT library in your layout.

Installing Vendor Component Library

  1. In the workspace view, from the menubar click on DesignKits>Unzip Design Kit...
  2. Browse to <ADS>/ADS/oalibs/componentLib/ and select library file
  3. Continue the process until library join to your workspace

Prepare the Layout

before using the parts you need to setup the substrate file and technology file.

  1. From workspace view menubar click on Options>Technology>Technology Setup.
  2. In the opened dialogue from Referenced Libraries click on Add Referenced Library... button.
  3. select ads_standard_layers and click on Ok button.
  4. Close Technology Setup dialogue and setup your substrate file based on ads_standard _layers that you imported earlier

The below image shows fooprint of ATC cap that inserted into layout.

 

Creating  Footprint

To create footprint(artwork), you have two options:

1. create the layout by inserting rectangle, traces and etc into the board by using layout editor. In the reference links you can find link of an YouTube video demostrating that.

2. write an AEL script to create the artwork for you.

First option is easy, fast and works out of the box but it’s not scalable. writing down an AEL function is more clean from designer point of view. Fortunately Dr. Mühlhaus company wrote down a comprehensive guide (link down below) on how to create an Artwork based on AEL language.

 


Useful Links

YouTube – A vs B Modeling and Layout Footprint Generation


Setup ADS Front to Back Design Flow

Published on 2018-03-04 in Hardware Design

One of the great feature that comes along with ADS Package is the ability to create your design in a reverse flow. This necessity become more evident when you prefer to use other feature-reach layout tool than ADS and use ODB++ or ADFI tools afterward to import the design into the ADS for performing layout verification. In this tutorial I will summarize multitude of notes that you should consider to get a successful schematic back from your designed layout.

To start, first convert the traces into microstrip transmission line by right clicking on a trace and select Path/Trace > Convert Traces. If you were lucky the trace would change to microstrip type without any significance change in the trace color but if that’s not the case, you’ll notice changes in trace appearance like becoming invisible or transportation into a wrong layer.  This is illustrated in Fig. 1 that shows a transformation nullify the line due to the fact that new wrong layer  is an outline type. To solve the issue, proceed to the following steps. In case you hadn’t counter to the mentioned problem you can skip following steps abd generate schematic directly by Schematic > Generate/Update Schematic in the layout window.

Fig. 1. Transform the trace to microstrip will change the layer unwillingly

1. Remove Ports

ADS sometimes have issues with ports and terms. So if you cannot generate schematic from the layout it is probably the main cause.

2. Substrate File

After generating schematic you’ll notice unconnected components placed vertically on the canvas. If you rearrange the components you get something like Fig 2 which by now you were recognized that ADS failed to import the connectivity back into the schematic. Unfortunately there is no automated fixation for this issue, which leaves you with no option but to make the connections manually. After that from the Parts panel, add a MSub component into the canvas. Now if you checkout files in your workspace you witness a new file named tech.subst. This file is a auto-generated stackup file that in many cases may contain some errors so it it recommended to to compare new generated substrate file with your current stackup and apply the required changes.

3. Configure Microstrip Layer

The last step is to configure the layer that microstrip will land on. To do so, double click on MSub component and adjust Cond1 parameter. Cond2 is, ADS said and I quote

Layer on which the air bridges will be draws

Probably the ground layer but still I have a doubt about that. Finally by selecting Layout > Generate/Update Layout in the schematic window regenerate the layout.

 

Cheers!

 


ZYNQ SD Card and RGMII Length Matching

Published on 2018-02-08 in Hardware Design

Understanding the math behind length matching of each protocol is a key in reaching out a well designed PCB with a balance between performance and layout area and other manufacturing constraints. Unfortunately for some protocols such as SDIO these information remained under NDA’s and are confidential. Here I examined ZC702 reference design board and/or datasheet of the PHY chip or the ZYNQ UG933 for figuring out the knowledge behind the design.

SDIO

As I mentioned above, SDIO full specification isn’t at disposal of the public, nevertheless by examining the ZC702 traces delay and inspecting constraint manger of the board I finally grasp the situation. The ZC702 configuration simplified by the following diagram. Also under the diagram you can find out each traces delay, from ZYNQ AP-SoC chip to TI voltage translator and ended to SD Card connector. Delay for the CLK trace from U1 to U87 is a cummolative sum of U1 to R81 delay and R81 to U87, this is done by defining R81 ESpice model and creating an XNet in the Allegro SI Analysis.

 

 

From the SDIO specification, for proper High Speed Mode operation, the CLK line shoud be 1ns longer than CMD and DAT[0:3].  Furthuremore CMD and DAT[0:3] should reamin in a 50ps margin from each other which is a tight version of what UG933 recommended:

PCB and package delay skew for SD_DAT[0:3] and SD_CMD relative to SD_CLK
must be between 50–200 ps

But by inspecting following table which is extracted from ZC702 board you can conclude that all lines are matched together without any sign of 1ns delays on the CLK line.

 

 

After checking out TXS02612 datasheet, you can see on page 12 the clock to channel skew is roughly around 1.5ns which can justify why the clk traces has the same length as other signals on the PCB.

RGMII

The Reduced Gigabit Media Independent Interface or RGMII is an low pin count interface between the PHY chip and the controller. Ethernet protocol is instinctivly a Full-Duplex non-synced protocol, thus the TX and RX signals are completely independent. Differential traces from RJ45 back to the PHY chip are completely independent and only the phase matching in a differtial couple should be considered. For the RGMII signals, the mathing requirments is highly depends on version of RGMII that PHY chip supports. Here we assume a conformal with RGMII v2 without internal delays. In this case following traces in each group should length mathed to the margin of up to 100 ps with each other including ZYNQ package delay.

The average delay of each group should be length mathched to corresponding CLK with a delay of 1.5ns shorter.  MDIO and MDC are operating at max frequency of 2.5MHz thus doesn’t require any length mathing. Any other signal that does not mentioned can routed at any arbitary length.


Useful Links

Xilinx – AR# 59999: Design Advisory for Zynq-7000 SoC, eMMC – JEDEC standard

Texas Instruments – TXS02612: XS02612 SDIO Port Expander With Voltage-Level Translation

IEEE 802 – Specification Gaps and Improvement of MDC/ MDIO Interface


Export Pin Delay from Vivado into Allegro

Published on 2018-01-18 in Hardware Design

I know that there exist many disperse app note around how to export pin delay and why you should do that nevertheless I can’t find any comprehensive guide line which accompany you from A to Z of this process so I convinced to wrote down this post. For DDR3 and other sensitive high speed signals it is recommended to do the length matching with taking account package delay. This requirement only applied for non high-volume custom designed chips like Xilinx FPGA and other SoC’s available on market, in other word well-establish ICs like DDR chips pin delay’s had already been matched internally.

Export Pin Delay

1. Open Vivado (my version is 2017.2)

2. On the TCL command window execute following commands

link_design -part xc7z020clg484-3
write_csv pin_delay.csv

Prepare CSV File

Unfortunately output file of Vivado not work out of the box. You need to open the CSV file in some text editor (I use one of the most primitive one, Microsoft Excel) and remove the trivia rows and columns from it. Next thing is that Vivado export package delay in terms of Max Delay and Min Delay but Allegro only support Pin Delay. So you have to choose between the two or doing some math and get an average.

1. Remove all rows until only pins or banks remain without any empty row or heading

2. Remove all columns except Pin Name (like A9) and one of Max Delay or Min Delay

3. Add Unit. Vivado output delay in ps unit but allegro may not compromise with this choice. To overcome this issue, you need to add ps to the end of all delay values. In Microsoft Excel in a new column (we assume C is new column and delay values are in column B) write formula =B1&" ps"

4. get gripper from bottom corner of cell and drag it all the way down

5. copy column C and press Alt+E, S, V to paste special into B column. (I hate Excel, use bash+sed)

6. save file into a CSV file and open Allegro

Import To Allegro

1. Go to File > Import > Pin Delay, select CSV file and click on your chip.

2. Pick Import button and close the Pin Delay form.


‹ previous posts next posts ›
close
menu