The command below generates a tone signal out of the speaker and receives it back through the mic. Measuring the phase diff will reveal the round-trip latency.
alsa_delay hw:1,0 hw:0,0 44100 256 2 1 1
Here hw:1,0
refer to the recording device that can be retrieved from arecord -l
and hw:0,0
refer to the playback device. Again can be retrieved from aplay -l
.
The 44100
is the sampling rate. 256
is the buffer size. 256
works best for me. Lower numbers corrupt the test and higher numbers just bring more latency to the table. Don’t know exactly what nfrags
input
and output
arguments are but 2
1
and 1
respectively works magically for me. I just tinkering around and found these numbers. No other number works for me.
1. Focusrite Scarlett Solo Latency: 2.5ms
2. Shure SM57 Mic Latency: 2.5ms
3. OverAll Delay: 14ms with non-RT mode
You can tinker around the effect of latency with
pactl load-module module-loopback latency_msec=15
To end the loopback mode
pactl unload-module module-loopback
As Always Useful links
Arun Raghavan – Beamforming in PulseAudio
Arch Linux Wiki – Professional Audio, Realtime kernel
Let’s Enhance Kaldi, Here are some links along the way. Look like YouTube is progressing a lot during the last couple of years so basically here is just a bunch of random videos creating my favorite playlist to learn all the cool stuff under the Kaldi’s hood.
LattceFasterDecoder
Lattices
: A more complex form of FST
‘s, The first version decoders were based on FST’s (like faster-decoder
and online
decoders). For Minimum Bayesian Risk Calculation Using Lattices
will give you a better paved wayfaster-decoder
: Old decoder, very simple to understand how decoding process is donelattice-faster-decoder
: general decoder, same as faster-decoder
but output lattices instead of FST
sDecodableInterface
: An interface that connects decoder to the features. decoder uses this Decodable
object to pull CMVN features from it.BestPath
: An FST that constructed from the Best Path (path with maximum likelihood) in the decoded FST.nBestPath
: An FST constructed from the top N Best Path in the decoded FST.GetLinearSymbolSequence
: The final step in the recognition process, get a BestPath FST or Lattice and output the recognized words with the path weight. CompactLattice
s need to be converted using ConvertLattice
Strongly Connected Component
: A set that all components are accessible (in two ways) by it’s member.ProcessEmitting
that pulls loglikelihood
from the decodable
objectThanks to this marvelous framework, a trained model is at disposal with WER of absolute zero percent over the 10 minutes of continuous speech file. The final piece to this puzzle would be implementing a semi-online decoding tool using GStreamer. As always useful links for further inspection
gst_caps_to_string(caps)
Here I am, pursuing once more the old-fashioned machine learning. I’ll keep it short and write down useful links