ECE597 Project pyWikiReader

Jump to: navigation, search

Project Overview

The goal of this project is to get a program capable of using user voice recognition in order to identify sections of a Wikipedia page to read to the user. It is designed for an embedded system, but can be ported to a desktop environment with ease.

Team Members

Yannick Polius

Guide to Project Setup

This is a quick guide to setting up the software to be able to compile the project for both x86 and ARM Linux.

Programming Environment Configuration

    Download Netbeans

    Download the TIESR project source code:

     svn checkout --username developername 

    For the beagleboard a few extra steps need to be taken:

    • Extract the toolchain to the Tools/ARM directory in the TIESR root directory (trunk)
    • The documentation says to create an include and lib link inside of Tools/ARM, but this causes problems if you link to the x86 include files. To work around this, I copied the ALSA headers from my /usr/include directory and moved it into an include folder I created inside of the ARM directory. For the lib directory, you need to put into the lib folder the ALSA library. Some problems were had using the ALSA library that came with x86 Ubuntu, so I had to resort to grabbing the library from the beagleboard itself and putting it there. Also, if there are complaints about not finding the library, remove the .2 at the end to rename it to
    • The last major task of setup is to create a new set of build tools in Netbeans. To do this go to Tools -> Options and the C/C++ tab. Add a new build tools, and change the C, C++, and Assembler directories to the ARM/bin directory.

    Now that the toolchain is setup, you can open the projects in Netbeans. The project folders must be imported manually. Once all the projects are imported, the build order for the program goes TIESRFlex --> TestTIESRFlex, TIESRSI --> TestTIESRSI. The two test programs are dependent on the libraries built by the previous program. To make sure their libraries are included right click on the project name and go to properties to make sure the libraries are being called in the linker. Once this setup is done, to switch between x86 and ARM configurations just right click the project name and set the configuration.

Python Scripting

After the projects have been built, it is time to integrate the speech recognition with the python script. Due to licensing restrictions, the script has the speech recognition and grammar building redacted, but the functionality is there and can be substituted for.

Copyright © 2005 Free Software Foundation, Inc.     
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.

# -*- coding: utf-8 -*-
import commands
import HTMLParser
import re
import sys
#this is the parser for the wikireader, designed to store parse the wiki page
#into a dictoinary and clean up the files
class CustomParser(HTMLParser.HTMLParser):
    section = 'intro'
    reading = ""
    dictionary = {}
    def __init__(self, *args, **kwargs):
        self.stack = []
    #gets called when the parser encounters the start of an html tag, eg <p>
    def handle_starttag(self,  tag,  attrs):
        if tag.lower() == 'p':
            self.reading = 'text'
        if tag.lower() == 'h2':
            if(self.section == 'intro'):
                self.stack = []
            if(self.section != 'Contents' and self.section != 'intro'):
                clean_title = re.sub('\[[a-zA-Z0-9]\]','', self.section)
                clean_title = re.sub('\[edit\]','', clean_title)
                clean_title = re.sub('\.','', clean_title)
                self.dictionary[clean_title.strip()]= "".join(self.stack)
                self.stack = []
            self.reading = 'title'
            self.section = ""
        if(self.section == 'Contents'):
            if(tag.lower() == 'span'):
                self.reading = 'text'
    #gets called when the parser encounters the end of an html tag, eg </p>
    def handle_endtag(self,  tag):
        if tag.lower() == 'p':
            self.reading = ""
        if tag.lower() == 'h2':
            self.reading = ""
            if(self.section == 'Contents'):
                self.dictionary['Contents'] = []
        if tag.lower() == 'table':  
            if self.section == 'Contents':
                self.reading = ""  
    #        if self.section == 'intro':                
       #         self.reading = 'text'
        if tag.lower() == 'span':            
            if self.section == 'Contents':
                self.reading = ""  
                self.stack = []
    #called when the parser encounters data between a tag, eg <p> DATA </p>
    def handle_data(self,  data):
        if self.reading == 'text':
        if self.reading == 'title':
            self.section += data
#called to activate the speech recognizer
def Listen():
    command  =  [REDACTED]
    output = commands.getoutput(command)
    print output
    index = output.find('Recognized:')
    index_end = index + 1
    found = ''
    while output[index_end] != '\n':
        found += output[index_end]
        index_end +=1
    return output[index+12:index_end].strip()
def BuildGrammar(grammar):    
    #build grammar using speech recognition API
    command =  [REDACTED]
    print commands.getoutput(command)
#converts the corresponding sectoin text to a wav file and plays it
def PlaySection(section):    
    command = 'flite ' + 'output/' + section + '.txt ' +  'output/' + section + '.wav'
    commands.getoutput('aplay ' +  ' output/' + section + '.wav')
#get user input
topic = raw_input('Enter a wikipedia topic you would like to hear about\n').replace(' ',  '+')
#search Wikipedia
commands.getstatusoutput('wget --output-document=result.html' + topic +'&go=Go')
#parse the html document
parser = CustomParser()
html = open('result.html',  'r')
#cleanup files to remove citations
for k, v in parser.dictionary.iteritems():
    if k != 'Contents':
        parser.dictionary[k] = re.sub('\[[0-9]*\]', '',  v)
#create the grammar by building on all the found sectoins
grammar = "start(_S). _S ---> "
for k, v, in parser.dictionary.iteritems():
    grammar = grammar + k + ' | '
grammar += "stop | "
grammar += "continue | "
grammar += "skip | "
grammar += "_SIL."
grammar = '\"' + grammar + '\"'
print grammar
commands.getstatusoutput('mkdir output')
#begin to output the text files that are available, starting with the intro
output = open('output/intro.txt',  'w')
navigable = []
for k in range(len(parser.dictionary['Contents'])):
    if k % 2 != 0:
            section = parser.dictionary['Contents'][k]
            clean_title = re.sub('\[[a-zA-Z0-9]\]','', section)
            clean_title = re.sub('\[edit\]','', clean_title)
            clean_title = re.sub('\.','', clean_title)
            output = open('output/' + clean_title + '.txt', 'w')
            output.write(clean_title + '\n')
        except KeyError:
            print 'no section: ' + section
#show all the available choices for speech recognition
print 'Navigation Options:'
for places in navigable:
    print places  + ' ',  
print '\n'
#enter loop, waiting to receive a stop command to exit
word = Listen()
while(word != 'stop'):
    print 'found: ' + word
    print 'speak now'
    word = Listen()
commands.getoutput('rm output/*')

Once the python script is integrated with a speech recognition system, you will be able to run the pyWikiReader and have it read out the available sections of wiki content.