Department of Computer Science and Engineering CSE 130
University of California at San Diego Fall 1999

Team Project 4

DUE WEDNESDAY DECEMBER 8, 1999, BEFORE 6:50 PM.





The purpose of this project is to study some important features of scripting languages through the experience of solving the same problem in awk and in Java.  (You may use Perl or Python instead of awk if you prefer, but awk is recommended because it is a simpler language.)  Your report should discuss how scripting languages and object-oriented languages are "high-level" in different ways.

The assignment is to find a good investment strategy based on the daily prices of various mutual funds.  Given the change in price from the previous day to today,  how likely is it that a trend will continue tomorrow? Or will tomorrow's closing price be affected by what happens in other financial markets today?

Linear regression is an algorithm for finding the straight line relationship that best fits two vectors of data X and Y.  For more information on linear regression see any good statistics textbook, or here or here.

For this assignment you should compute the linear regression between two sets of vectors (also called time series):
(1) X = {percentage change of price of some mutual fund from yesterday to today}
       Y= {percentage change of price of the same mutual fund from two days ago to yesterday}
(2) X = {percentage change of price of some mutual fund from yesterday to today}
       Y = {percentage change of closing price of S&P500 from two days ago to yesterday}

In each case "yesterday" means the previous business day, i.e. the previous day for which a closing price is available.  Case (1) estimates the accuracy of trend-following, while case (2) estimates how much the price one day of a mutual fund is influenced by the S&P500 the previous day.

Your program will take one command line argument: the name of a file containing a list of mutual fund indices.  A mutual fund index is a string of five uppercase letters ending with X, for example JAOSX.  Your program should obtain fund price data from Yahoo!Finance.  However, design the program so that it will cache the data in a local file. The cache file should be named data/<index>. For example, given an index ZZZZX, first search for the cache file data/ZZZZX. If ithis file does not exist, obtain the data from Yahoo! and store it into the cache file. When creating the cache file, you can assume that the directory data already exists.  The range of the data should be from 1/1/1998 to 12/31/1998.

To download data from the web, for awk, call the Unix text web browser lynx with the option -dump using the awk builtin function system.  For the Java program, use the java.net.URL class to create an input stream.

Use /usr/bin/nawk for the awk interpreter, and JDK 1.2.1 for the java compiler.

Your program should output the indices and the scores of the top 5 mutual funds for each of the cases (1) and (2):

#Autocorrelation
AAAAX  0.8100
BBBBBX 0.8005
CCCCX 0.7993
...
#Correlation with S&P500
ZZZZX 0.5567
YYYYX 0.5234
...
In each case, the top five mutual funds are those with the highest r2 correlation coefficients, which ranges between 0.0 and 1.0, and is 0.0 when the two series are independent, that is, the prediction is no better than a random guess.
 

WHAT WE WILL PROVIDE

We will provide the following:
1. An awk function that computes linear regressions is in ../public/project4/linreg.awk. You must implement the corresponding Java function yourself.
2. An example file containing a list of indices of mutual funds is in  ../public/project4/fundindex.
3. Example awk code that extracts only the indices from ../public/project4/namelist, which was obtained from http://www.fundalarm.com/sort_x.htm.
4. The S&P500 index is named ^spc.
5. Given an index name, say YAFFX, the following URL returns the closing price of YAFFX from 1/2/1999 to 3/4/1999 in comma-separated format, unfortunately in reverse order:
http://chart.yahoo.com/table.csv?s=YAFFX&a=1&b=2&c=1999&d=3&e=4&f=1999&g=d&q=q&y=0&z=YAFFX&x=.csv
You might find other possibly helpful stuff under ../public/project4/.
 

WHAT TO SUBMIT BY COMPUTER

You should create two files: one tar file java.tar for the Java program and one file money.script for the program written in awk (or Python or Perl).  The tar file java.tar should contain all the source files and all the corresponding class files. The class containing main should be called Money. The other file money.script should be a single file that is directly executable under Unix by making use of the Unix #! convention.

To turn in your work, cd to the directory containing the two files and type the Unix command bundleP4. The bundleP4 script knows which files to look for and submit. Remember to check the class web page at http://www-cse.ucsd.edu/classes/fa99/cse130 and the message board for any changes to these directions.
 

WHAT TO SUBMIT ON PAPER

You should describe all your work in a well-written report of length at most ten pages.  The report should have a table of contents and a list of references, with references cited explicitly in the text of the report, at all places where you use them.  References may include web pages and books.  The GNU Awk User's Guide  is a good place to start. However, since the awk interpreter that we will be using is slightly different from GNU Awk, be careful when you make references to this "User's Guide".

Features of awk and relevant issues that should be discussed in your report include but are not limited to the following:

For this report, it might be difficult to perform a pairwise comparison of the individual features between the two languages since the two differ so radically. Try to focus on the higher level concepts when you make comparisons.  How can both  the languages be "high-level", despite their differences? What are the tradeoffs between a general purpose language and a task-specific language?

If you use Perl or Python instead of awk for the programming part of this project, then your report should discuss awk and also the language that you use.

As an appendix to your report, you should submit a printout of your software, with comments and documentation of professional quality. This documentation should be sufficient for another software engineer to maintain the program. Remember that good documentation is necessary but not sufficient. Comments and user instructions cannot alleviate bad engineering.

Also, don't forget to attach a copy of the team self-evaluation form.  The form is required for all four projects, including this one.  Be sure to follow all the rules and guidelines explained in the CSE 130 course description.  Complete academic honesty is again required.