Department of Computer Science and Engineering | CSE 134A |
University of California at San Diego | Fall 2002 |
The task for this team project is to build a dynamic repository for news stories relevant to a fixed set of themes. This software application has two parts:
You should design the content retriever with a sensible update policy, considering issues like data freshness and not imposing too much burden on the news sources (so-called "netiquette"). The content browser should use the information already in the database at the time of each request. You should not implement any real-time retrieval of news.
Content sources. Many web sites provide news content. These are designed primarily to be viewed directly by humans, but several have fairly simple and regular structures from which news stories can be parsed using regular expressions (screen scraping). Here are some examples of such sites:
Each news story should be parsed, then stored in a MySQL database in its parsed form. That is, the fields of information should be extracted by parsing with regular expressions, like author, city, date, time, title, etc., and then each should be stored in its own database column. Do not use XML for this project. Do choose a type of news that is updated by the source web site frequently, and that is reasonably interesting and varied. State lottery numbers are too simple.
The main focus of the project is a voice interface, but you should also implement a simple web interface that is similar to the voice interface. You will probably find it easier to implement the web interface first. Both interfaces should provide roughly similar functionality, but they should have different designs that take into account the different features of each medium. What humans find usable is very different for web and voice interfaces!
VoiceXML and TellMe Studio. Here is a tutorial
on how to use VoiceXML inside TellMe
Studio. You can use this as a guide when implementing your content
browser. Other tutorials are available on the web also. Your
voice interface should allow the listener to provide a value or values
for one or more story identifiers, and then read out the most appropriate,
most recent story or other news information. TellMe Studio has several
advanced features, including voice recognition, that you can use to enhance
the usability of your system.
Similar to previous projects, this project will be graded as follows:
We will provide instructions for how to submit your software and report, and how we will test your system. Don't forget to attach a copy of the team self-evaluation form, and check the class web page and Discus regularly for updates to this project description. Be sure to follow all the rules and guidelines explained in the general course description. Complete academic honesty is always required.