µSpeech 4.0 coming soon with codebender.cc support

The µSpeech library is almost ready to be deployed as version 4.0. A number of bug fixes have been performed and new features have been added. Among the new features is a way to store and compare words. It also makes it significantly easier to calibrate the library. Among other things I have augmented the documentation with a video for calibrating the latest version of the library. If you are interested in trying it before hand:


git clone -b 4.0-workingBranch https://github.com/arjo129/uSpeech.git

UPDATE: 4.0 is now mainstream, just go to the downloads page to download the latest version.

This is how to calibrate the phoneme recognizer:

The new API Docs will be coming up soon. I am in the process of getting it updated at codebender.cc so those of you who use it to program your arduino can enjoy the benefits of µSpeech

Complete list of changes:

  • Update code bender support.
  • Fix vowel detection.
  • Package in Arduino IDE friendly format.
  • Video documentation.
  • Improve ease of use.
  • New API for easy word recognition

How µSpeech works

Its been over a year since I posted anything. This does not mean that I was not doing anything, but rather that I was working on other stuff (and also the Chinese government has blocked WordPress and I only got a VPN recently). One of the things that I had been working on was µSpeech, a speech recognition software for the Arduino. This had originally seemed crazy as speech recognition was a very computationally demanding process. Clocked at a couple of megahertz and with Kilobytes of RAM, the Arduino could not afford to use a standard speech recognition algorithm.

Most speech recognition algorithms involve the use of a process known as the Fast Fourier Transform (FFT). For those who are unaware, the FFT is a process which takes a sound and splits it into its constituent frequencies. Now, the FFT is not something that is particularly easy to do. In fact contrary to its name, it is an extremely slow process. The innovation in µSpeech is that it bypasses this process – at a cost: µSpeech is only able to differentiate between fricatives and voiced fricatives. Its ability is therefore limited, but it is good enough for being able to differentiate between commands such as “Left”, “right”, “Forward” and “Backward.”

The thing about fricatives (such as: /f/, /s/, /sh/) is that if you touch your throat, you realize that the vocal chords play no role in making these sounds. This means that these sounds are made entirely by the mouth and the air coming out of it. The key here is that this means that these sounds have an inherent tendency to be more like noise and have higher frequencies. If you were to look at a graph plotting the air pressure over time, the sounds of /s/ has a very chaotic graph that zigzags a lot which is not the case with the sound of /a/. Thus I found that the following formula works well:

\large{ c = \frac{\sum |\frac{df(x)}{dx}|}{\sum |f(x)|}}

Letters such as /s/ result in a very high value of c, where as letters such as /a/ result in a low value. Voiced fricatives such as /v/ result in a value that is just in between.

I found that the value for c falls within a certain range depending on the letter (and microphone). Thus when you calibrate µSpeech, you are essentially tweaking the threshold values. It generally takes a full afternoon to get them right!

The “Lazy” Keyword

Javascript sharp has learnt to procrastinate. The lazy keyword is a new keyword introduced into JS#. It enables lazy loading (i.e the determination of the variable is procrastinated). This means that the variable will only be computed when the value is used. An example would probably be easier to understand.
var auto n = 9+3
document do write n .

Would compile to :
var n = 9+3
document.write(n);

Where as using lazy instead of var would look like this:
lazy n = 9+3
document do write n .

this would translate to:

document.write(9+3);

Currently the lazy keyword is buggy and would only support the following syntax lazy [variable name] = [javascript expression]

Personal project ends – Javascript sharp

My personal project has come to an end. I have created my own device for pulse reading. I know I have not posted for sometime, this is mainly due to the fact that wordpress is blocked in china. I am currently using a free VPN to access this blog. I will now be diverting my attention to more interesting things including my new programming language named Javascript sharp. The programming language is available on github. The compiler is written in python and churns out javascript. The language adds an easy to use type based variable declaration system along with type inference to make the most beautiful programming language. It has a class keyword with extends.

class foo extends car {

This makes creating complex objects much easier than using javascript. The Language also avoids using too many symbols. This makes it very easy to learn. I have removed brackets where possible.

if i == 3 {

I have also made it easier to declare a function via the function keyword.

function print x, y, z {

Curly braces still exist because indentation is very imprecise and “end” is just to much writing. Javascript sharp also adds support for public and private variables. In the future changes section one will notice I have added a large number of potential features that will be worked on. The one which will be implemented first will be the lazy keyword.

Success

OK I succeeded in my first pulse measurement using the webcam. I first collected the pulse by placing the webcam under my thumb, then I calculated the mean of each frame using python and PIL (frame extraction through MPlayer). Afterwards I implemented an high pass filter of 0.5Hz and Wala! Most of the work was done in python. The video was captured through Photo Booth and plotting was done using excel. I plan on consolidating all this into processing. My design specifications are ready too.