Difference between revisions of "ECE434 Project - Mandroid"

From eLinux.org
Jump to: navigation, search
(Theory of Operation)
(Theory of Operation)
Line 137: Line 137:
== Work Breakdown ==
== Work Breakdown ==

Revision as of 13:42, 2 November 2020

Team members: Dylan Turner

Executive Summary


A humanoid talking robot head.

The head listens to you with a microphone, formulates a response, and then replies using speech synthesis while moving its mouth.

Currently, I have PWM working on the beaglebone along with servo control (with a circuit for a high-torque servo), a rudimentary speech synthesizer, speech recognition, the mouth, and a rudimentary chat bot all working!

An example of a much more complex version of what I'm aiming for can be found here. At a minimum, this robot will respond to some speech input, move its mouth a bit, and output some speech output. If I can't build a complex chat bot in time, that's okay with me. I plan to focus on the other three parts more than anything else.


Everything is placed inside the mask and uses wooden dowels glued together with hot glue to hold it with support.

The base is connected to a thick wooden dowel which has a second platform on its top. On this secondary platform rests the BeagleBone black, and on top of the black is a tiny breadboard with a circuit to control the servo.

The servo is attached to the support structure within the mask whereas all other components are attached to the primary wooden dowel and simply *covered* by the mask. The mask itself (with servo) is completely removable to allow tinkering with the internals.

Installation Instructions

External Hardware

Obviously, there's a beaglebone

The dowels and wooden base are used to make a framework for holding up the mask as well as holding the beaglebone. A T-shaped structure created from thin dowels attaches at the mask's ears and nose, and the motor rests under their joint, attached with hot-glue.

Another dowel is attached to a servo. I used a 5V 20kg servo because of the weight. This central dowel is connected to the jaw of the head, which has been cut where it meets the chin and on the sides, so the mouth can open more easily.

There is a small circuit to provide a large current to the servo without voltage drops (as the servo can be driven with a 3.3V signal which can be provided by the Beaglebone, but requires 4.8-6.8V with a decent current at its source). This circuit is dead simple, so I'll make it with ASCII art below. The two transistors are B547 NPN.

/| 5V
 |                   |
 Z 1kOhm             Z  1kOhm
 Z                   Z
 |                   |
 |                   *---------- Servo Control Pin
 *-----------      |/
 |          |______|
  \|               |\    
   |--- P9_22        V
  /|                 |
 V                   |
 |                  ---
---                  -

Attached to the central pole is a downward facing Rock-band microphone and an outward facing mini-speaker. The microphone is a USB mic, and the speaker attaches to a 3.5mm cable which thin goes into a USB adapter. Then both USB devices are plugged into a hub which hangs down from the beagle bone.


First, make sure you have Python 3.7

If you don't, then you can install it with:

sudo apt install libpython3.7-dev

If it gets deprecated, you can update the versions in the Makefile (all instances of 3.7 become 3.x or whatever), or build it from source using the instructions here.

Afterwards, you're ready to install the program

Here's the installation commands for installing the Mandroid software:

sudo apt install -y libsdl2-dev libsdl2-mixer-dev python3-pyaudio pybind11-dev flac
pip3 install PyAudio
pip3 install SpeechRecognition
git clone https://github.com/blueOkiris/python-duckduckgo
cd python-duckduckgo
sudo python3 setup.py install
cd ..
git clone https://github.com/blueOkiris/man-droid
cd man-droid
sudo make install

Here's the explanation:

  • Install Dependencies:
    • SDL2_mixer is required for speech synthesis: `libsdl2-dev libsdl2-mixer-dev`
    • The Python pip libraries `PyAudio` and `SpeechRecognition` is required for speech recognition. It relies on: `python3-pyaudio`
    • The python speech recognition library is called in C++ using pybind: `pybind11-dev`
    • Flac for audio input
    • python-duckduckgo
      • Download custom duckduckgo library (for search)
      • Go into the directory
      • Install it
      • Leave the directory
  • Download main project from git
  • Go into the project folder
  • Build it with make
  • Install system service for running at start

User Instructions

When you do `sudo make install`, the program should autostart upon reboot.

Otherwise, start it with:

cd <Location of Repo>

Once running, you can talk to the robot and it will respond.

Currently, you can only say "bye" or "goodbye" to end the program and "tell me about `x`" to get web info on `x`


Here is where you brag about what your project can do.

Include a YouTube demo the audio description.

Theory of Operation

At the top level, there is the Mandroid object (Brain.hpp/.cpp). It is an abstract class in C++ and has instances of two other abstract classes: a SpeechRecognizer as its ears (Listen.hpp/.cpp) and a SpeechSynthesizer as its mouth (Speech.hpp/.cpp).

The created instance of a Mandroid is currently a child called IfElseBot (Brain.hpp/.cpp). This implementation is based on if and if-else statements, the most barebones way to program a chat bot. Another implementation could utilize a natural language processing library or machine learning to be more "real," but as it stands, the only implementation is the IfElseBot.

As an implementation of the Mandroid class, the IfElseBot utilizes a SpeechRecognizer and a SpeechSynthesizer. The specific children of these abstract classes utilized by the IfElseBot are a PythonSpeechRecognizer (Listen.hpp/.cpp) which calls a python function from the C++ code to process language into an std::string and a ClipBasedSpeechSynthesizer (Speech.hpp/.cpp) which loads audio clips and pieces them together to produce sound.

The ClipBasedSpeechSynthesizer also makes use of a Servo (Servo.hpp/.cpp) to physically move a mouth. This Servo makes a system call to launch a python program that initializes the PWM pin (for some reason it was the only way to make it work). It then uses the sysfs interface to control the duty cycle driven into the physical servo.

Back to the top-level, now that it can speak and hear, the IfElseBot is able to process speech and produce a result. One of its operations also makes use of a python library that grabs information about a topic from Duck Duck Go. The other operation exits the program.

Different implementations of speech are possible as long as they produce methods for producing sound from IPA and converting English text to IPA. Different implementations for listening are possible as long as they have a listen method for producing a string represent heard speech. Different implementations of the Brain are possible as long as they have a respond function which produces a boolean for if the program should quit or not.

There is also a set of test functions (Tests.hpp) which go through the various functionalities.



Work Breakdown

As the only team member, I did all of the work.

The project can be broken up into four main sections with subsections.

  • Servo/Head control
    • PWM Control
  • Speech Recognition
  • Speech Synthesis
    • Recording sound files
    • IPA map to sound files
    • Synthesis object with instance of Servo
  • Brain (Chat-bot)
    • Tie it all together
    • Process inputs and produce sentences in IPA as response

The only thing really unfinished is the brain, though the synthesis can also be improved.

Future Work

The biggest improvement is in the chat bot, giving it natural language processing and more commands it can do.

Afterwards, you'd have to improve the speech synthesis. It sounds like a speak and spell, and the dictionary of known pronunciations is small, so it has to guess pronunciations a lot.

Both of these are based on abstract classes, so both additions could integrate well with the system.

Beyond that, the power circuitry can be improved so there's less power cords, a USB wifi adapter would make it simpler to initialize, the hardware could be better hidden, the physical structure could be more robust, and more motion could be added to the face like moving eyes and multiple "muscles" for better facial movement.


Give some concluding thoughts about the project. Suggest some future additions that could make it even more interesting.