Difference between revisions of "ECE434 Project - Mandroid"

From eLinux.org
Jump to: navigation, search
(External Hardware)
(Undo revision 538886 by Royn (talk))
(Tag: Undo)
 
(55 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[Category:ECE497 |Pm]]
 
[[Category:ECE434Fall2020 |Mandroid]]
 
[[Category:ECE434Fall2020 |Mandroid]]
  
Line 5: Line 6:
 
== Executive Summary ==
 
== Executive Summary ==
  
/* IMAGE HERE EVENTUALLY */
+
[[File:MandroidHead.jpg|640px|frameless|]]
  
 
A humanoid talking robot head.
 
A humanoid talking robot head.
Line 11: Line 12:
 
The head listens to you with a microphone, formulates a response, and then replies using speech synthesis while moving its mouth.
 
The head listens to you with a microphone, formulates a response, and then replies using speech synthesis while moving its mouth.
  
Currently, I have PWM working on the beaglebone and a rudimentary speech synthesizer as well as speech recognition.
+
Currently, I have PWM working on the beaglebone along with servo control (with a circuit for a high-torque servo), a rudimentary speech synthesizer, speech recognition, the mouth, and a rudimentary chat bot all working!
 
 
The primitive chat-bot, the mouth, and servo control are not working.
 
  
 
An example of a much more complex version of what I'm aiming for can be found here. At a minimum, this robot will respond to some speech input, move its mouth a bit, and output some speech output. If I can't build a complex chat bot in time, that's okay with me. I plan to focus on the other three parts more than anything else.
 
An example of a much more complex version of what I'm aiming for can be found here. At a minimum, this robot will respond to some speech input, move its mouth a bit, and output some speech output. If I can't build a complex chat bot in time, that's okay with me. I plan to focus on the other three parts more than anything else.
Line 19: Line 18:
 
== Packaging ==
 
== Packaging ==
  
If you have hardware, consider [http://cpprojects.blogspot.com/2013/07/small-build-big-execuition.html Small Build, Big Execuition] for ideas on the final packaging.
+
Everything is placed inside the mask and uses wooden dowels glued together with hot glue to hold it with support.
 +
 
 +
The base is connected to a thick wooden dowel which has a second platform on its top. On this secondary platform rests the BeagleBone black, and on top of the black is a tiny breadboard with a circuit to control the servo.
 +
 
 +
The servo is attached to the support structure within the mask whereas all other components are attached to the primary wooden dowel and simply *covered* by the mask. The mask itself (with servo) is completely removable to allow tinkering with the internals.
 +
 
 +
<gallery mode=nolines widths="400px" heights="400px">
 +
Mandroid-usb-hardware.jpg|The central dowel with attached usb microphone and speaker
 +
Mandroid-beagle-brain.jpg|The location of the beaglebone with servo power circuit on top within the head of the Mandroid
 +
Mandroid-t-bar.jpg|The T shaped dowel structure with attached servo within the mask of the Mandroid
 +
</gallery>
  
 
== Installation Instructions ==
 
== Installation Instructions ==
Line 25: Line 34:
 
=== External Hardware ===
 
=== External Hardware ===
  
The dowels and wooden base are used to make a framework for holding up the mask as well as holding the beaglebone.
+
Obviously, there's a beaglebone
  
Another dowel is attached to a motor, which is attached to the central dowel.
+
The dowels and wooden base are used to make a framework for holding up the mask as well as holding the beaglebone. A T-shaped structure created from thin dowels attaches at the mask's ears and nose, and the motor rests under their joint, attached with hot-glue.
  
The motor's dowel is also connected at the other end to the base of the jaw so that the mouth can be opened.
+
The T-shaped dowel structure is attached to a servo at the angle. I used a 5V 20kg servo because of the weight. This central dowel is connected to the jaw of the head, which has been cut where it meets the chin and on the sides, so the mouth can open more easily.
 +
 
 +
There is a small circuit to provide a large current to the servo without voltage drops (as the servo can be driven with a 3.3V signal which can be provided by the Beaglebone, but requires 4.8-6.8V with a decent current at its source). This circuit is dead simple, and it's described in the fritzing diagram below. The transistors used are NPN B547.
 +
 
 +
Attached to the central pole is a downward facing Rock-band microphone and an outward facing mini-speaker. The microphone is a USB mic, and the speaker attaches to a 3.5mm cable which thin goes into a USB adapter. Then both USB devices are plugged into a hub which hangs down from the beagle bone.
 +
 
 +
To see how it physically attaches, view the images in the above section.
 +
 
 +
Fritzing diagram:
 +
 
 +
[[File:Mandroid-diagram bb.png|480px]]
  
 
=== Software ===
 
=== Software ===
  
Here's the installation commands for installing the mandroid software:
+
First, make sure you have Python 3.7
 +
 
 +
If you don't, then you can install it with:
 +
 
 +
sudo apt install libpython3.7-dev
  
 +
If it gets deprecated, you can update the versions in the Makefile (all instances of 3.7 become 3.x or whatever), or build it from source using the instructions [https://linuxize.com/post/how-to-install-python-3-7-on-ubuntu-18-04/ here].
 +
 +
Afterwards, you're ready to install the program.
 +
 +
==== Installing from Package (easier, but not recommended) ====
 +
 +
Base install:
 +
 +
wget https://github.com/blueOkiris/man-droid/releases/download/1.0/mandroid-arm.deb
 +
sudo dpkg -i mandroid.deb
 +
 +
Then to make easier to access (i.e. run it by calling "mandroid" in a terminal):
 +
 +
ln -s /opt/mandroid/mandroid /usr/bin/mandroid
 +
 +
And finally, to startup on boot:
 +
 +
sudo systemctl enable mandroid
 +
 +
==== Building from Source ====
 +
 +
Here's the installation commands for installing the Mandroid software:
 +
 +
sudo apt install -y libsdl2-dev libsdl2-mixer-dev python3-pyaudio pybind11-dev flac
 +
pip3 install PyAudio
 +
pip3 install SpeechRecognition
 +
git clone https://github.com/blueOkiris/python-duckduckgo
 +
cd python-duckduckgo
 +
sudo python3 setup.py install
 +
cd ..
 
  git clone https://github.com/blueOkiris/man-droid
 
  git clone https://github.com/blueOkiris/man-droid
  pip3 install SpeechRecognition
+
  cd man-droid
sudo apt install -y libsdl2-dev libsdl2-mixer-dev python3-pyaudio pybind11-dev
 
 
  make
 
  make
 
  sudo make install
 
  sudo make install
Line 43: Line 95:
 
Here's the explanation:
 
Here's the explanation:
  
* Download from git
+
* Install Dependencies:
** Install Dependencies: {{ordered list
 
 
** SDL2_mixer is required for speech synthesis: `libsdl2-dev libsdl2-mixer-dev`
 
** SDL2_mixer is required for speech synthesis: `libsdl2-dev libsdl2-mixer-dev`
** The Python pip library `SpeechRecognition` is required for speech recognition. It relies on: `python3-pyaudio`
+
** The Python pip libraries `PyAudio` and `SpeechRecognition` is required for speech recognition. It relies on: `python3-pyaudio`
 
** The python speech recognition library is called in C++ using pybind: `pybind11-dev`
 
** The python speech recognition library is called in C++ using pybind: `pybind11-dev`
* Build with make
+
** Flac for audio input
 +
** python-duckduckgo
 +
*** Download custom duckduckgo library (for search)
 +
*** Go into the directory
 +
*** Install it
 +
*** Leave the directory
 +
* Download main project from git
 +
* Go into the project folder
 +
* Build it with make
 
* Install system service for running at start
 
* Install system service for running at start
  
 
== User Instructions ==
 
== User Instructions ==
  
Once everything is installed, how do you use the program?  Give details here, so if you have a long user manual, link to it here.
+
When you do
 +
 
 +
sudo make install
 +
 
 +
after building, the program should autostart upon reboot.
 +
 
 +
If you opt-out of that, you can start it with start it with:
  
Consider making it autostart for full credit.
+
cd <Location of Repo>
 +
./mandroid
 +
 
 +
Once running, you can talk to the robot and it will respond.
 +
 
 +
Currently, you can only say "bye" or "goodbye" to end the program and "tell me about `x`" to get web info on `x`
  
 
== Highlights ==
 
== Highlights ==
  
Here is where you brag about what your project can do.
+
At this moment in time, Jorge the Mandroid can do three things:
  
Include a [http://www.youtube.com/ YouTube] demo the audio description.
+
* Respond to "hello"
 +
* Respond to "goodbye" and quit the program
 +
* Respond to "tell me about _____" which will cause him to duckduckgo search a small paragraph about a topic
 +
 
 +
[https://www.youtube.com/watch?v=8WY1pg1DUaE Here is a Youtube video demonstrating]
  
 
== Theory of Operation ==
 
== Theory of Operation ==
  
Give a high level overview of the structure of your software. Are you using GStreamer?  Show a diagram of the pipeline. Are you running multiple tasks?  Show what they do and how they interact.
+
At the top level, there is the Mandroid object (Brain.hpp/.cpp). It is an abstract class in C++ and has instances of two other abstract classes: a SpeechRecognizer as its ears (Listen.hpp/.cpp) and a SpeechSynthesizer as its mouth (Speech.hpp/.cpp).
 +
 
 +
The created instance of a Mandroid is currently a child called IfElseBot (Brain.hpp/.cpp). This implementation is based on if and if-else statements, the most barebones way to program a chat bot. Another implementation could utilize a natural language processing library or machine learning to be more "real," but as it stands, the only implementation is the IfElseBot.
 +
 
 +
As an implementation of the Mandroid class, the IfElseBot utilizes a SpeechRecognizer and a SpeechSynthesizer. The specific children of these abstract classes utilized by the IfElseBot are a PythonSpeechRecognizer (Listen.hpp/.cpp) which calls a python function from the C++ code to process language into an std::string and a ClipBasedSpeechSynthesizer (Speech.hpp/.cpp) which loads audio clips and pieces them together to produce sound.
 +
 
 +
The ClipBasedSpeechSynthesizer also makes use of a Servo (Servo.hpp/.cpp) to physically move a mouth. This Servo makes a system call to launch a python program that initializes the PWM pin (for some reason it was the only way to make it work). It then uses the sysfs interface to control the duty cycle driven into the physical servo.
 +
 
 +
Back to the top-level, now that it can speak and hear, the IfElseBot is able to process speech and produce a result. One of its operations also makes use of a python library that grabs information about a topic from Duck Duck Go. The other operation exits the program.
 +
 
 +
Different implementations of speech are possible as long as they produce methods for producing sound from IPA and converting English text to IPA. Different implementations for listening are possible as long as they have a listen method for producing a string represent heard speech. Different implementations of the Brain are possible as long as they have a respond function which produces a boolean for if the program should quit or not.
 +
 
 +
There is also a set of test functions (Tests.hpp) which go through the various functionalities.
 +
 
 +
Hierarchy:
 +
 
 +
[[File:Bot.png|Bot.png]]
 +
 
 +
It should be noted that I've made an attempt to never let the program crash, and to keep retrying or simply move on if something fails.
 +
 
 +
In cases of failure or dysfunction, when installed a service, the program outputs to /var/mandroid.log.
 +
 
 +
Speaking of the service it installs the program and necessary files to /opt/mandroid/ and creates a symlink to /usr/bin/mandroid.
  
 
== Work Breakdown ==
 
== Work Breakdown ==
  
List the major tasks in your project and who did what.
+
As the only team member, I did all of the work.
 +
 
 +
The project can be broken up into four main sections with subsections.
 +
 
 +
* Servo/Head control
 +
** PWM Control
 +
** Building structure to support jaw (T-structure)
 +
** Attaching motor to jaw and to support structure
 +
* Speech Recognition
 +
** Link to Python speech library
 +
** Attach USB microphone
 +
* Speech Synthesis
 +
** Recording sound files
 +
** IPA map to sound files
 +
** Synthesis object with instance of Servo
 +
** Attach speaker through USB
 +
* Brain (Chat-bot)
 +
** Tie it all together
 +
** Process inputs and produce sentences in IPA as response
 +
** Create main dowel "tower" to place brain on top and rest mask over parts
  
Also list here what doesn't work yet and when you think it will be finished and who is finishing it.
+
The main areas of improvement are improving the sound of the speech synthesis and the brain's complexity.
  
 
== Future Work ==
 
== Future Work ==
  
Suggest addition things that could be done with this project.
+
The biggest improvement is in the chat bot, giving it natural language processing and more commands it can do.
 +
 
 +
Afterwards, you'd have to improve the speech synthesis. It sounds like a speak and spell, and the dictionary of known pronunciations is small, so it has to guess pronunciations a lot.
 +
 
 +
Both of these are based on abstract classes, so both additions could integrate well with the system.
 +
 
 +
Beyond that, the power circuitry can be improved so there's less power cords, a USB wifi adapter would make it simpler to initialize, the hardware could be better hidden, the physical structure could be more robust, and more motion could be added to the face like moving eyes and multiple "muscles" for better facial movement.
 +
 
 +
Also, it may be possible to power the beaglebone from the same power source as the servo.
  
 
== Conclusions ==
 
== Conclusions ==
  
Give some concluding thoughts about the project. Suggest some future additions that could make it even more interesting.
+
The project isn't amazing. It definitely is a good framework, but pretty much all aspects could be improved to some degree.
 +
I definitely feel like I've created a system that can be easily expanded upon in the future with more time.
 +
 
 +
As far as future additions go, for one thing, more typical assistant tools like weather, calendar, music playing, etc.
 +
Also maybe a hand/body for moving and grabbing things playing with the idea of an android.
 +
 
 +
Overall I'm happy with how it turned out, but recognize it isn't wholly what I set out to achieve.

Latest revision as of 21:27, 15 November 2020


Team members: Dylan Turner

Executive Summary

MandroidHead.jpg

A humanoid talking robot head.

The head listens to you with a microphone, formulates a response, and then replies using speech synthesis while moving its mouth.

Currently, I have PWM working on the beaglebone along with servo control (with a circuit for a high-torque servo), a rudimentary speech synthesizer, speech recognition, the mouth, and a rudimentary chat bot all working!

An example of a much more complex version of what I'm aiming for can be found here. At a minimum, this robot will respond to some speech input, move its mouth a bit, and output some speech output. If I can't build a complex chat bot in time, that's okay with me. I plan to focus on the other three parts more than anything else.

Packaging

Everything is placed inside the mask and uses wooden dowels glued together with hot glue to hold it with support.

The base is connected to a thick wooden dowel which has a second platform on its top. On this secondary platform rests the BeagleBone black, and on top of the black is a tiny breadboard with a circuit to control the servo.

The servo is attached to the support structure within the mask whereas all other components are attached to the primary wooden dowel and simply *covered* by the mask. The mask itself (with servo) is completely removable to allow tinkering with the internals.

Installation Instructions

External Hardware

Obviously, there's a beaglebone

The dowels and wooden base are used to make a framework for holding up the mask as well as holding the beaglebone. A T-shaped structure created from thin dowels attaches at the mask's ears and nose, and the motor rests under their joint, attached with hot-glue.

The T-shaped dowel structure is attached to a servo at the angle. I used a 5V 20kg servo because of the weight. This central dowel is connected to the jaw of the head, which has been cut where it meets the chin and on the sides, so the mouth can open more easily.

There is a small circuit to provide a large current to the servo without voltage drops (as the servo can be driven with a 3.3V signal which can be provided by the Beaglebone, but requires 4.8-6.8V with a decent current at its source). This circuit is dead simple, and it's described in the fritzing diagram below. The transistors used are NPN B547.

Attached to the central pole is a downward facing Rock-band microphone and an outward facing mini-speaker. The microphone is a USB mic, and the speaker attaches to a 3.5mm cable which thin goes into a USB adapter. Then both USB devices are plugged into a hub which hangs down from the beagle bone.

To see how it physically attaches, view the images in the above section.

Fritzing diagram:

Mandroid-diagram bb.png

Software

First, make sure you have Python 3.7

If you don't, then you can install it with:

sudo apt install libpython3.7-dev

If it gets deprecated, you can update the versions in the Makefile (all instances of 3.7 become 3.x or whatever), or build it from source using the instructions here.

Afterwards, you're ready to install the program.

Installing from Package (easier, but not recommended)

Base install:

wget https://github.com/blueOkiris/man-droid/releases/download/1.0/mandroid-arm.deb
sudo dpkg -i mandroid.deb

Then to make easier to access (i.e. run it by calling "mandroid" in a terminal):

ln -s /opt/mandroid/mandroid /usr/bin/mandroid

And finally, to startup on boot:

sudo systemctl enable mandroid

Building from Source

Here's the installation commands for installing the Mandroid software:

sudo apt install -y libsdl2-dev libsdl2-mixer-dev python3-pyaudio pybind11-dev flac
pip3 install PyAudio
pip3 install SpeechRecognition
git clone https://github.com/blueOkiris/python-duckduckgo
cd python-duckduckgo
sudo python3 setup.py install
cd ..
git clone https://github.com/blueOkiris/man-droid
cd man-droid
make
sudo make install

Here's the explanation:

  • Install Dependencies:
    • SDL2_mixer is required for speech synthesis: `libsdl2-dev libsdl2-mixer-dev`
    • The Python pip libraries `PyAudio` and `SpeechRecognition` is required for speech recognition. It relies on: `python3-pyaudio`
    • The python speech recognition library is called in C++ using pybind: `pybind11-dev`
    • Flac for audio input
    • python-duckduckgo
      • Download custom duckduckgo library (for search)
      • Go into the directory
      • Install it
      • Leave the directory
  • Download main project from git
  • Go into the project folder
  • Build it with make
  • Install system service for running at start

User Instructions

When you do

sudo make install

after building, the program should autostart upon reboot.

If you opt-out of that, you can start it with start it with:

cd <Location of Repo>
./mandroid

Once running, you can talk to the robot and it will respond.

Currently, you can only say "bye" or "goodbye" to end the program and "tell me about `x`" to get web info on `x`

Highlights

At this moment in time, Jorge the Mandroid can do three things:

  • Respond to "hello"
  • Respond to "goodbye" and quit the program
  • Respond to "tell me about _____" which will cause him to duckduckgo search a small paragraph about a topic

Here is a Youtube video demonstrating

Theory of Operation

At the top level, there is the Mandroid object (Brain.hpp/.cpp). It is an abstract class in C++ and has instances of two other abstract classes: a SpeechRecognizer as its ears (Listen.hpp/.cpp) and a SpeechSynthesizer as its mouth (Speech.hpp/.cpp).

The created instance of a Mandroid is currently a child called IfElseBot (Brain.hpp/.cpp). This implementation is based on if and if-else statements, the most barebones way to program a chat bot. Another implementation could utilize a natural language processing library or machine learning to be more "real," but as it stands, the only implementation is the IfElseBot.

As an implementation of the Mandroid class, the IfElseBot utilizes a SpeechRecognizer and a SpeechSynthesizer. The specific children of these abstract classes utilized by the IfElseBot are a PythonSpeechRecognizer (Listen.hpp/.cpp) which calls a python function from the C++ code to process language into an std::string and a ClipBasedSpeechSynthesizer (Speech.hpp/.cpp) which loads audio clips and pieces them together to produce sound.

The ClipBasedSpeechSynthesizer also makes use of a Servo (Servo.hpp/.cpp) to physically move a mouth. This Servo makes a system call to launch a python program that initializes the PWM pin (for some reason it was the only way to make it work). It then uses the sysfs interface to control the duty cycle driven into the physical servo.

Back to the top-level, now that it can speak and hear, the IfElseBot is able to process speech and produce a result. One of its operations also makes use of a python library that grabs information about a topic from Duck Duck Go. The other operation exits the program.

Different implementations of speech are possible as long as they produce methods for producing sound from IPA and converting English text to IPA. Different implementations for listening are possible as long as they have a listen method for producing a string represent heard speech. Different implementations of the Brain are possible as long as they have a respond function which produces a boolean for if the program should quit or not.

There is also a set of test functions (Tests.hpp) which go through the various functionalities.

Hierarchy:

Bot.png

It should be noted that I've made an attempt to never let the program crash, and to keep retrying or simply move on if something fails.

In cases of failure or dysfunction, when installed a service, the program outputs to /var/mandroid.log.

Speaking of the service it installs the program and necessary files to /opt/mandroid/ and creates a symlink to /usr/bin/mandroid.

Work Breakdown

As the only team member, I did all of the work.

The project can be broken up into four main sections with subsections.

  • Servo/Head control
    • PWM Control
    • Building structure to support jaw (T-structure)
    • Attaching motor to jaw and to support structure
  • Speech Recognition
    • Link to Python speech library
    • Attach USB microphone
  • Speech Synthesis
    • Recording sound files
    • IPA map to sound files
    • Synthesis object with instance of Servo
    • Attach speaker through USB
  • Brain (Chat-bot)
    • Tie it all together
    • Process inputs and produce sentences in IPA as response
    • Create main dowel "tower" to place brain on top and rest mask over parts

The main areas of improvement are improving the sound of the speech synthesis and the brain's complexity.

Future Work

The biggest improvement is in the chat bot, giving it natural language processing and more commands it can do.

Afterwards, you'd have to improve the speech synthesis. It sounds like a speak and spell, and the dictionary of known pronunciations is small, so it has to guess pronunciations a lot.

Both of these are based on abstract classes, so both additions could integrate well with the system.

Beyond that, the power circuitry can be improved so there's less power cords, a USB wifi adapter would make it simpler to initialize, the hardware could be better hidden, the physical structure could be more robust, and more motion could be added to the face like moving eyes and multiple "muscles" for better facial movement.

Also, it may be possible to power the beaglebone from the same power source as the servo.

Conclusions

The project isn't amazing. It definitely is a good framework, but pretty much all aspects could be improved to some degree. I definitely feel like I've created a system that can be easily expanded upon in the future with more time.

As far as future additions go, for one thing, more typical assistant tools like weather, calendar, music playing, etc. Also maybe a hand/body for moving and grabbing things playing with the idea of an android.

Overall I'm happy with how it turned out, but recognize it isn't wholly what I set out to achieve.