Student: Jyothirmayee Donthineni
Mentors: Jason Kridner
GSoC: Not Applicable
This project is currently just a proposal.
A "Hello World" application has been created,cross-compiled using GCC-ARM tool chain and executed using QEMU. A Pull request has been generated for the same.Please find the link here.
School: National Institute of Technology,Karnataka,Surathkal
Primary language (We have mentors who speak multiple languages): English
Typical work hours (We have mentors in various time zones): 2:30 AM to 11.30 AM UTC
Previous GSoC participation: No previous experience but excited to join the open source community.
About your project
Project name: Modern "Speak & Spell" using PocketBeagle
This project motto is to implement an updated "Speak&Spell" using PocketBeagle for modern day preschoolers with improved games/puzzles and better hardware interfaces.Previously used VF display will be replaced with an OLED display and instead of using keypad everytime,speech recognition will be implemented.
Current Proposal for the project aims at building an open source reproducible Linux application that can be implemented by downloading the code anywhere for its real time implementation so that it can be promoted for usage in commercial purposes. Since the current Speak&Spell's basic games are not sophisticated enough for present day preschoolers,the updated puzzles will help improving their Spelling skills.Puzzles will be implemented using python in 2 levels(PyGame library will be used).For text to voice generation,a light open source program engine called 'CMU Flite' will be used. Flite is specially built for embedded systems and has an added advantage of changing to multiple accents.
For adding the speech recognition feature,I am planning to use CMUSphinx toolkit because of its better accuracy compared to TIesr libraries.In CMUSphinx kit,I chose pocketsphinx over Sphinx4 because of its speed and efficiency on Embedded systems.Also Pocketsphinx library gives the developer an option to switch between the modes during run time.For this project,I am planning to use two modes-Keyword mode and allphone mode.For the voice triggering part(Greetings,Instructions to start the game,Commands to pause,resume and to move to next level games),Keyword mode will be used.This mode basically takes keywords and discards rest of the speech which makes command recognition easy.
For Example,Let us say users give any of these commnads
- (1)"Hello!Start level-1 game"
- (2)"Hi!Start the first level-1 game"
- (3) "Let us start the level-1"
- In all the above commands the keywords like "start","level-1" are only considered to invoke the game which makes the recognition easy and better.
After invoking the game,we can switch to allphone mode (during run time itself).This mode recognises phenomes using phonetic language model and hence useful to recognise each letter in a word.To improve user interaction,Generic dictation will be used in which we restrict the language in the domain we need(making it appropriate for preschoolers) instead of using all the words present in the dictionary for games and recognition.This also reduces the memory usage. For building the audio system,I am planning to use Bela because it has on-board speaker amps which are absent in bela mini.Also to make speech recognition easy,MAX9814 Microphone Amplifier with AGC will be used.
Implementing the games:
Level-1 game: This level basically compares the word spelt out by the player with the original spelling and gives feedback accordingly.
Level-2 game: A rough image of how this game is implemented is shown in the below picture.
This project will be implemented in 3 phases.
- Install the Flite program for text to speech generation.
- Install CMUSphinx libraries for speech recognition and voice triggering.
- Improving the Chatbot functionality in terms of giving feedback after every puzzle.It will also be able to give suggestions to improve user's performance.This way it will be developed as a teaching aid.
- Build Level 1 puzzle: Implement the basic 'spell the word' puzzle.
- Build Level 2 puzzle: Fill the missing letters in a word within specified time (This will be implemented as a dynamic game with graphics)
- Implement the complete model by adding a USB keyboard,Bela Cape,Microphone amplifier and an SSD1306 based 128 x 64 pixel OLED display.
- Bug fixing and Documentation of the project.
Community bonding period (2018-04-23 - 2018-05-14):
- Refine the weekly plan with suggestions from the mentor and community.
- Familiarise myself with Pocketbeagle.
Week 1-->Milestone #1(2018-05-14):
- Install the Pocketsphinx-5prealpha release,Sphinxbase and Sphinxtrain libraries.
- Test the installation by decoding the audio files present in the source code.
Week 2-->Milestone #2(2018-05-21):
- Create the grammar files and make sure they are flexible enough to recognise most of the possible commands.For this,instead of giving phrases,majorly used words will be listed allowing any arbitrary order.
- Build a statistical language model and train it with SRILM Toolkit.
- Use this model with pocketsphinx and test the recognition accuracy by giving test samples.
- Calculate Error rate using tools from Sphinxtrain and adjust the sampling rate and language model accordingly.
Week 3-->Milestone #3(2018-05-28):
- Apply Generic dictation and restrict the language models used (by removing domains like legal transcriptions and medical terms).
- Add the feedback feature to the model by training it with commands depending on the performance level(by keeping a count of no. of words spelt correctly)
Week 4-->Milestone #4(2018-06-04):
- Install flite for text to speech generation and check the working of feedback feature.
- Write the code to switch from keyword search mode to allphone mode after recognising the command which invokes the games.
- Document the finished code for Phase-1 evaluation
Week 5-->Milestone #5(2018-06-11):
- Improve the code based on feedback after phase-1 evaluation.
- Write the code for basic "spell the word" game in python
Week 6-->Milestone #6(2018-06-18):
- Install PyGame library and code the level-2 game
Week 7-->Milestone #7(2018-06-25):
- Add the Game Resources (graphics,background images and sound effects) to the level 2 game
Week 8-->Milestone #8(2018-07-02):
- Connect the USB keyboard,128 * 64 pixel OLED display and verify the gaming controls using keyboard
- Document the finished work for Phase-2 evaluation
Week 9-->Milestone #9(2018-07-09):
- Improve the Python code and gaming controls based on feedback after phase-2 evaluation
Week 10-->Milestone #10(2018-07-16):
- Make a demo by adding a Bela,MAX9814 Microphone Amplifier with AGC
- Add external speakers and microphone to the module
Week 11-->Milestone #11(2018-07-23):
- BUG fixing and documentation for final evaluation(Phase-3)
- Prepare final presentation slides and video.
- Implementing Speech recognition in other languages by using training sets and models available on websites.(This way device recognises commands in these languages and also give feedback in those languages which improves user interaction)
- (or) Add a level-3 game in which misspelt words are to be detected and corrected from a set of words given .
Post-GSoc: After GSoC,I will make sure that I remain in touch with the community and keep contributing to the organisation the best I can.
Experience and approach
I have done projects on Raspberry Pi before and therefore will be comfortable enough to implement this project on a Pocketbeagle.I will get my hands-on with Pocketbeagle before the project starts since I have free time after my semester exams.I am quite new to Speech recognition,so I have been doing required background work to implement these functionalities on Pocketbeagle and therefore can assure you that I will be able to complete the Phase-1 in planned time. I had only done some basic projects on python before and therefore planning to spend most of the Phase-2 to build the games.
I will ensure that probability of this happening is less by keeping in regular touch with my mentor and make sure I know about his unavailability beforehand .I will also check with the community if they have a backup mentor provision.I will communicate my coding issues with the community or students working on similar projects to see if someone can help. Else I will contact the organisation administrator to talk about the issue and make sure the work is not delayed in the mean time by documenting the finished work.
"Speak&Spell" was more than just a popular kids toy,it should be better described as a blueprint for the devices we use today.Until its invention, real time speech synthesis was defined to be impossible.Implementing and updating its functionality for the present day kids is a perfect way to celebrate its importance in the development of today's Signal processing technology.
<erik.welsh> "Basically, the ideas was to re-brain a speak-and-spell with a PocketBeagle to celebrate the 40th anniversary.
Speak-and-Spell was a great educational tool and bringing it back into the public. Generate open-source code around the Speak-and-Spell functionality.Plus we can get one of the creators (Gene Frantz) to then promote it.” Feb 02 23:21:50 <beagleslackbot> <erik.welsh>" agreed. I was just outlining the grand vision : from a HW perspective, it would be getting one of the OLED / Display Clicks working and interfacing with an I2C gpio expander. Not really that much. The bigger piece would be doing text to speech and the games / puzzles." <jkridner> "USB audio and Bela Mini are reasonable audio options for PocketBeagle..."
I am currently pursuing my Bachelor's degree in Electronics and Communication Engineering and I am good in C and Python programming.I have no other commitments this summer and can dedicate my entire time for the project.