Department of Computer Science and Engineering CSE 134A
University of California at San Diego Fall 2002

Project 3

DUE FRIDAY, NOVEMBER 15, 2002, AT 5 PM.



The task for this team project is to build a dynamic repository for news stories relevant to a fixed set of themes.  This software application has two parts:

The two parts should function independently and asynchronously.  The retriever should populate and refresh the database separately from the content browser querying the database.  This behavior can be achieved simply by implementing the content retriever as a PHP script, and scheduling periodic executions of the script with the Unix crontab feature.

You should design the content retriever with a sensible update policy, considering issues like data freshness and not imposing too much burden on the news sources (so-called "netiquette").  The content browser should use the information already in the database at the time of each request.  You should not implement any real-time retrieval of news.

Content sources.  Many web sites provide news content.  These are designed primarily to be viewed directly by humans, but several have fairly simple and regular structures from which news stories can be parsed using regular expressions (screen scraping).  Here are some examples of such sites:

Using the organization of the web page and the content of the articles, you should pick some interesting class of identifiers referenced in the story (e.g. state or country, stock symbol, etc.) which you use as a key to populate the database with labeled content.  When choosing an identification scheme, keep in mind that your choice will have an impact on the voice-based interface you must also implement. Also consider the total size of the content stored in your database, and when particular content should be purged.

Each news story should be parsed, then stored in a MySQL database in its parsed form.  That is, the fields of information should be extracted by parsing with regular expressions, like author, city, date, time, title, etc., and then each should be stored in its own database column.  Do not use XML for this project.  Do choose a type of news that is updated by the source web site frequently, and that is reasonably interesting and varied.  State lottery numbers are too simple.

The main focus of the project is a voice interface, but you should also implement a simple web interface that is similar to the voice interface.  You will probably find it easier to implement the web interface first.  Both interfaces should provide roughly similar functionality, but they should have different designs that take into account the different features of each medium.  What humans find usable is very different for web and voice interfaces!

VoiceXML and TellMe Studio.  Here is a tutorial on how to use VoiceXML inside TellMe Studio.  You can use this as a guide when implementing your content browser.  Other tutorials are available on the web also.  Your voice interface should allow the listener to provide a value or values for one or more story identifiers, and then read out the most appropriate, most recent story or other news information.  TellMe Studio has several advanced features, including voice recognition, that you can use to enhance the usability of your system.
 

Project organization

You should describe all your work in a well-written report of length at most six pages, single-spaced.  The report should provide documentation for your software, and also explain all the important design decisions that you made for the project.  As before, three of the most important design issues that you must address during this project are usability, scalability, and security.  Other important design issues include, but are not limited to, portability and modularity.  Your report should explain all the important decisions you make concerning each of these issues.

Similar to previous projects, this project will be graded as follows:

Your software should have commenting of professional quality.  In all, your documentation should be sufficient for another software engineer to maintain the program easily.  Remember that good documentation is necessary but not sufficient. Comments and user instructions cannot alleviate bad engineering.

We will provide instructions for how to submit your software and report, and how we will test your system.  Don't forget to attach a copy of the team self-evaluation form, and check the class web page and Discus regularly for updates to this project description.    Be sure to follow all the rules and guidelines explained in the general course description.  Complete academic honesty is always required.