View on GitHub

RPi Voice Recognition and Command

With Jasper and PocketSphinx

Description

Voice recognition and command is one of the modules useful for interacting with robotic systems. There are many voice recognition (VR) systems available but my requirements are:

  • No Internet connection required,
  • Open source license that permits commercial use,
  • Still being developed,
  • Supports Linux,
  • Able to run on the Raspberry Pi,
  • Balance between speed and accuracy

Out of the possible VR engines, the best candidates seem to be PocketSphinx or Julius running under Jasper. I removed Kaldi as a candidate due to its RAM and CPU requirements. Simon was removed as a candidate since it seems very focused on the desktop.

Next up?

After reading this guide, you may be interested in reading:

Parts List

  • Raspberry Pi 2
  • 16GB (or larger) class 10 MicroSD card
  • USB microphone or webcam, preferrably listed at elinux
    • I am using a Logitech Webcam C525
  • Headphones or powered computer speakers
    • In a separate guide, the sound output will be replaced with an amplifier and passive speakers.

Overview

Start with a Raspberry Pi image. This is an image saved after following the RPi Initial Setup Guide, RPi WiFi Access Point Guide, and RPi Desktop Mods. The image should not be Lite. If you do not have such an image, start with a Raspbian image and follow the aforementioned guides before returning here. This guide assumes you have an appropriate image and are connected to your running pi.

  1. Setup sound I/O
  2. Test sound I/O
  3. Virtual Environments
  4. Install Jasper
  5. Install PocketSphinx and requirements
  6. Connect and test.
  7. Conclusion.

Procedures

Setup sound I/O

Much of the documentation for setting up a USB microphone refer to Raspbian versions prior to Jessie. Since our base distribution is Jessie or newer, those do not apply. This guide follows the stackexchange, How do I configure my sound for Jasper on Raspbian Jessie?.

  • Power down your pi and insert your USB webcam/microphone
  • Boot the pi and connect to the cli
  • Check the order in which your sound cards have been loaded, cat /proc/asound/modules. With output similar to:

    0 snd_bcm2835
    1 snd_usb_audio
  • Create a file, sudo nano /etc/modprobe.d/alsa-base.conf with these lines

    # This sets the index value of the cards but doesn't reorder.
    options snd_usb_audio index=0
    options snd_bcm2835 index=1
    
    # Does the reordering.
    options snd slots=snd-usb-audio,snd-bcm2835
  • Reboot the pi
  • Check the sound card order, cat /proc/asound/modules
    • The order should have changed

Test sound I/O

Settings may be viewed and changed using: amixer, a command-line mixer for ALSA soundcard driver and alsamixer, a soundcard mixer for ALSA soundcard driver, with ncurses interface.

  • View settings of microphone, amixer -c 0
  • View settings of playback, amixer -c 1
  • Adjust settings using alsamixer -c 0 and alsamixer -c 1

Recording is done with arecord, a command-line sound recorder for ALSA soundcard driver and aplay, a command-line sound player for ALSA soundcard driver.

  • Record something with (stop recording with CTRL-c) arecord -D plughw:0,0 -f cd test.wav
  • Play it back with aplay test.wav
  • If you receive error messages or no sound record and playback occurs, please troubleshoot before continuing

Virtual Environments

A Virtual Environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them. Jasper, a python project which helps convert voice recognition into commands, has a number of dependencies that may conflict with the OS installed Linux. A virtual environment will help greatly with this. See The Hitchhiker's Guide to Python and Virtualenv Documentation.

  • Install virtualenv, sudo apt-get install virtualenv
  • Some basic commands are:
    • cd my_project_folder
    • Create the virtual environment, virtualenv venv. Only needs to be created once.
    • Activate the environment, . venv/bin/activate. The prompt will change.
    • Perform tasks as usual
    • When done, deactivate the environment, deactivate

Install Jasper

  • Install prerequisites, sudo apt-get install vim git-core python-dev python-pip bison libasound2-dev libportaudio-dev python-pyaudio
  • Add to nano ~/.bash_profile export LD_LIBRARY_PATH="/usr/local/lib" source .bashrc
  • Add to nano ~/.bashrc LD_LIBRARY_PATH="/usr/local/lib" export LD_LIBRARY_PATH PATH=$PATH:/usr/local/lib/ export PAT
  • Clone Jasper, git clone https://github.com/jasperproject/jasper-client.git jasper
  • cd jasper
  • Create virtualenv, virtualenv venv
  • Activate the environment, . venv/bin/activate
  • Install Jasper requirements, pip install -r client/requirements.txt. This might take a while to compile

Install PocketSphinx and requirements

This follows the Jasper guide closely.

  • Install PocketSphinx, sudo apt-get install pocketsphinx
  • Installing CMUCLMTK
    • sudo apt-get install subversion autoconf libtool automake gfortran g++
    • svn co https://svn.code.sf.net/p/cmusphinx/code/trunk/cmuclmtk/
    • cd cmuclmtk/
    • ./autogen.sh && make, Check for errors prior to next step
    • sudo make install
    • cd ~
  • Installing OpenFST, Phonetisaurus, m2m-aligner and MITLM
    • sudo su -c "echo 'deb http://ftp.debian.org/debian experimental main contrib non-free' > /etc/apt/sources.list.d/experimental.list"
    • sudo apt-get update
    • sudo apt-get -t experimental install phonetisaurus m2m-aligner mitlm libfst-tools
  • Building the Phonetisaurus FST model
    • wget https://www.dropbox.com/s/kfht75czdwucni1/g014b2b.tgz
    • tar -xvf g014b2b.tgz
    • cd g014b2b
    • ./compile-fst.sh
    • cd ..
    • mv ~/g014b2b ~/phonetisaurus

References