Regarding µSpeech 4.1.2+

Dear Users of µSpeech library,
With the release of the 4.2.alpha library I have been reciveing a lot of mail as to how the debug µSpeech program does not work. I sincerely, apologize for this inconvenience as I have very little time on hand to address the issues associated with 4.1.2+ and 4.2.alpha. I realized that I had made the error of not flagging 4.1.2 as a pre-release: I have fixed this now. I also realized that the documentation that I have on youtube is back dated I will be addressing this as soon as possible.

If anyone has the time and is willing to look into why debug_uspeech is giving trouble, please feel free to create a pull request on github. Due to the fact that I am currently going through my school finals, I am unable to devote the time required to address this issue. Please revert back to µSpeech 4.1.1.

Sincere Apologies,

Arjo Chakravarty

Wiring two Arduino’s to do your bidding (using I2C)

Ok, so making rovers with one Arduino is so overdone. What if you had two rovers that worked together to solve problems. Or even turn. This can be fairly challenging. In order to achieve communication between to Arduino’s, one can use the two wire interface, known as I2C. I2C is a method of communicating between the Arduino’s. One can connect 100 Arduino’s all on the same line to make a super-duino, but for now lets keep it down just to two.

Many peripherals can be attached to your Arduio (µC) via this protocol. One Arduino acts as the all powerful overlord (a.k.a master).

Schematic of generic I2C circuit (courtesy wikipedia)

Lets put this in context of two Arduino’s. Here’s how you wire them:

Schematic for your Arduino’s, squiggly lines correspond to resistors (courtesy instructables).

So one Arduino will be the master and one the slave. It’s up to you which one you decide will do what – however the master is programmed separate from the slave. So you will need two programs: one for master, and one for slave.

Now for the code. The actual I2C protocol is fairly complex, however the folks at Arduino have simplified the process significantly. They do this through the use of a library. A library stores extra code which may be useful in order to simplify your life. The library it is stored in is known as Wire, so at the top of both the programs attach:

#include <Wire.h>

Now lets deal with the commands that can be sent: left, right, stop and reverse. We can use something known as preprocessor directives to handle these commands. Again at the top of both the master and slave copy and paste the following:

#define LEFT 0
#define RIGHT 1
#define STOP 2
#define REVERSE 3

Now we will have to deal with the slave differently. Because I2C supports multiple devices, each slave device has its own address (a number that identifies it anywhere between 1-127). So in the void setup() function the following needs to be placed:

void setup(){
  ... //some of your code
   Wire.begin(6); 
}

You have declared your slave as having the address 6. This is kind of like the IP address of a computer, or the URL which points you to this website. Next you have to create a function which responds when the master comes around to command the slave.

void setup(){
  ... //some of your code
   Wire.begin(6);
   Wire.onReceive(followCommand); 
}

We have not yet defined followCommand, the function that will follow the master’s orders, but we shall implement it now (full code for slave listed):

#include <Wire.h>
#define LEFT 0
#define RIGHT 1
#define STOP 2
#define REVERSE 3

void followCommand(int command){
   //what your robot will do
   if(command == LEFT){
     //Write code for turning your robot left
   }
   if(command == RIGHT){
     //Write code for turning your robot right
   }
   //You get the idea...
}
void setup(){
   //some of your code
   Wire.begin(6);
   Wire.onReceive(followCommand); 
}
void loop(){
}

We have now implemented a slave, but we need to implement a master. This is easier. The master has no address so the setup would look like this:

void setup(){
  //Your code...
  Wire.begin();
}

To send a signal the following commands can be used:

  Wire.beginTransmission(6); // transmit to device #6
  Wire.send(LEFT);              // sends LEFT Change to whatever
  Wire.endTransmission();

So the master program would look something like this:

#include <Wire.h>
#define LEFT 0
#define RIGHT 1
#define STOP 2
#define REVERSE 3

void setup(){
   //some of your code
   Wire.begin(); 
}
void loop(){
  //Some code
  //Suddenly you want to transmit a message to make the slave turn left
  Wire.beginTransmission(6); // transmit to device #6
  Wire.send(LEFT);              // sends LEFT Change to whatever
  Wire.endTransmission();
  //Some more code
}

Upload the master program to the master Arduino and the slave program to the slave Arduino and you are ready to go!

Creative Coding Quick Reference Sheet

In this page we shall begin to program our robot. To understand how to program we first need to know how a computer works. A computer understands nothing. It is up to the programmer to tell it what to do. At the heart of the computer is a CPU. The CPU translates a piece of machine code into a series of mathematical operations it performs on its inputs. When we program, we program in a structured language such as C++ or Java which is then translated by a program known as the compiler into machine code. The machine code is interpreted by a CPU and translated into the mathematical operations. Java is a little special in this aspect as when you compile a piece of Java it gets converted to Java byte code. The program/application file gets interpreted when a user runs the application through a program known as the Java Virtual Machine. The Java VM translates byte-code to machine code on the fly.

Key components of a programming language

Programming languages are the way we instruct our computers. In this club we will use JAVA and C++. For our purposes, Java is used by the processing application on our desktop and the mobile phone and C++ is used by the Arduino micro-controller. The two languages have their similarities in terms of the way we express ourselves. Most other programming languages follow a similar pattern. Click on the tables to enlarge them.

Screen Shot 2014-03-06 at 7.54.49 PM

Screen Shot 2014-03-06 at 7.55.42 PM

The semicolon (;) and other grammar

In both Java and C++ one must put a semicolon at the end of every statement made. Its basically like a full stop. Both languages also ignore spaces.

Functions

Computers are stupid, so apart from the table above the computer understands nothing else. Thus there is no such thing as special words that mean something. What programmers have done is created their own words known as functions.

In C++(arduino) a function looks like this:

void left(){
     \\tell your robot how to turn left
}

One would invoke the function:

    left();

A function can take an input and give an output:

    int add1(x){
       return x+1;
    }

One can call this by

add1(1);

To handle output one would create a variable to store the data returned.

int val = add1(1); //val will equal 1+1 = 2

Easy.

Now how do you use these? Well, generally the Arduino and Processing environment come with what is known as a library. The libraries provide a list of functions to make life easier (otherwise you would have to write a whole operating system, or tell the arduino how to turn on a pin which is fairly nasty compared to digitalWrite(9,HIGH);

Where to go from here

http://processing.org/learning/objects/

http://processing.org/learning/pixels/

http://arduino.cc/en/Tutorial/AnalogReadSerial

http://arduino.cc/en/Tutorial/DigitalReadSerial

Using skewness to perform syllable recognition – Part I (Theory)

Currently the syllable class contains an accumulator for various letters. If these letters exist it corresponds to a special syllable. I however think that this is fairly limited because “fish” and “shift” will be interpreted as the same word. Yet, µSpeech should be capable of better: we are able to tell when an individual letter has been said. For instance the occurrence of “f” in fish should occur more at the beginning, where as the occurrence of “f” in “shift” should occur at the end.

For solving this problem I have been considering two methods: The skewness and a method based on pure calculus. Today I will explore Skewness.

Skewness – The theory

Skewness is a measure of how something leans to one side. Its statistical definition is:

\frac{\mu_3}{\sigma^3}

Where \mu_3 is the 3rd moment of the mean i.e E[(x-\mu)^3].

Now going back to High school mathematics:

E[(x-\mu)^3] = E[x^3 - 3x^2\mu + 3x\mu^2 - \mu^3] = E[x^3] - 3\mu E[x^2] + 3\mu^2 E[x] - \mu^3

Given that \mu = E[x]:

E[(x-\mu)^3] = E[x^3] - 3\mu (E[x^2] - \mu^2) - \mu^3

Since \sigma^2 = E[x^2]-\mu^2:

E[(x-\mu)^3] = E[x^3] - 3\mu \sigma^2 - \mu^3

This results in the necessity for the algorithm to compute 3 variables: E[x^3], \mu, \sigma^2

Making the algorithm online

Now given that µSpeech has stringent memory requirements it seems imperative that we devise the algorithm so that the following can be done:

  • Data is not kept in an array.
  • Computation is minimal.

Thus keeping these in mind it seems necessary to look at ways of computing the three variables online. To start with lets tackle the simplest algorithm, the one for µ.

Given that \mu = E[x] = \sum n_i p_i = \sum \frac{x_i}{n}, one can write an algorithm as such:

\mu_i = \mu_{i-1} + \frac{x_i - \mu_{i-1}}{i}

This can be extended to E[x^3]:

E[x^3]_i = E[x^3]_{i-1} +\frac{x^3_i-E[x^3]_{i-1}}{i}

The third value which we need to compute \sigma^2.

\sigma^2 = E[x^2]-E[x]^2

So we need to find E[x^2]:

E[x^2]_i = E[x^2]_{i-1} +\frac{x^2_i-E[x^2]_{i-1}}{i}

Part 2 coming soon.

µSpeech 4.0 coming soon with codebender.cc support

The µSpeech library is almost ready to be deployed as version 4.0. A number of bug fixes have been performed and new features have been added. Among the new features is a way to store and compare words. It also makes it significantly easier to calibrate the library. Among other things I have augmented the documentation with a video for calibrating the latest version of the library. If you are interested in trying it before hand:


git clone -b 4.0-workingBranch https://github.com/arjo129/uSpeech.git

UPDATE: 4.0 is now mainstream, just go to the downloads page to download the latest version.

This is how to calibrate the phoneme recognizer:

The new API Docs will be coming up soon. I am in the process of getting it updated at codebender.cc so those of you who use it to program your arduino can enjoy the benefits of µSpeech

Complete list of changes:

  • Update code bender support.
  • Fix vowel detection.
  • Package in Arduino IDE friendly format.
  • Video documentation.
  • Improve ease of use.
  • New API for easy word recognition

How µSpeech works

Its been over a year since I posted anything. This does not mean that I was not doing anything, but rather that I was working on other stuff (and also the Chinese government has blocked WordPress and I only got a VPN recently). One of the things that I had been working on was µSpeech, a speech recognition software for the Arduino. This had originally seemed crazy as speech recognition was a very computationally demanding process. Clocked at a couple of megahertz and with Kilobytes of RAM, the Arduino could not afford to use a standard speech recognition algorithm.

Most speech recognition algorithms involve the use of a process known as the Fast Fourier Transform (FFT). For those who are unaware, the FFT is a process which takes a sound and splits it into its constituent frequencies. Now, the FFT is not something that is particularly easy to do. In fact contrary to its name, it is an extremely slow process. The innovation in µSpeech is that it bypasses this process – at a cost: µSpeech is only able to differentiate between fricatives and voiced fricatives. Its ability is therefore limited, but it is good enough for being able to differentiate between commands such as “Left”, “right”, “Forward” and “Backward.”

The thing about fricatives (such as: /f/, /s/, /sh/) is that if you touch your throat, you realize that the vocal chords play no role in making these sounds. This means that these sounds are made entirely by the mouth and the air coming out of it. The key here is that this means that these sounds have an inherent tendency to be more like noise and have higher frequencies. If you were to look at a graph plotting the air pressure over time, the sounds of /s/ has a very chaotic graph that zigzags a lot which is not the case with the sound of /a/. Thus I found that the following formula works well:

\large{ c = \frac{\sum |\frac{df(x)}{dx}|}{\sum |f(x)|}}

Letters such as /s/ result in a very high value of c, where as letters such as /a/ result in a low value. Voiced fricatives such as /v/ result in a value that is just in between.

I found that the value for c falls within a certain range depending on the letter (and microphone). Thus when you calibrate µSpeech, you are essentially tweaking the threshold values. It generally takes a full afternoon to get them right!

The “Lazy” Keyword

Javascript sharp has learnt to procrastinate. The lazy keyword is a new keyword introduced into JS#. It enables lazy loading (i.e the determination of the variable is procrastinated). This means that the variable will only be computed when the value is used. An example would probably be easier to understand.
var auto n = 9+3
document do write n .

Would compile to :
var n = 9+3
document.write(n);

Where as using lazy instead of var would look like this:
lazy n = 9+3
document do write n .

this would translate to:

document.write(9+3);

Currently the lazy keyword is buggy and would only support the following syntax lazy [variable name] = [javascript expression]