We have published the scores for the first project on gradesource.com. If you have not received your secret number by email, contact Victor Gidofalvi.
Some students are upset about losing points for not providing a working URL. We did request these through announcements in class and on the web. Now we have to be fair to the great majority of students, who did the project on time and followed directions.
For the current project, be sure to put your URL, the names of your team members, etc. in your report. Also be sure that all your file permissions are set correctly, etc. We will not log in to your account, or fix things for you. You must carefully take care of all details in your project. The report is your opportunity to explain anything that might be unclear.
The deadline for the current project is next Monday, October 22.
We've canceled the part where you build your own server for checking spelling
is canceled. We'll revisit that topic later in the quarter.
select c1, c2 from t1, t2 where c1 = c2*2When indexing a text column, make the index be on a string prefix only. For example to make an index on the first ten characters only of an author's name, use the command alter table messages add index ax (author(10))
Indexes are less useful for values that are repeated, most useful for unique values.
Always do performance experiments when you are not 100% sure about the value of an index. Note that the same query can be much more than ten times slower when the database is ten times larger. This can be true even if the smaller database is actually the larger one narrowed done with a where clause.
The same query can be ten times faster when you run it a second time, because the tables have been loaded into the server's in-memory cache.
It is possible to use data from the index directly, instead of using the index to guide a read from the table. Consider the query
select author, zip from messages, address where messages.author = address.nameNow we are taking the zip value from the table address, for each author in messages. Suppose we have an index on (name,zip) in the address table. Then the additional values can be taken straight from the index, so the table itself is never accessed at all. This can be a make-or-break optimization.
mysql> explain select distinct ticker from messages where date>'2001-09-23';Which index is used is shown by the key column. Which value from another table will be looked up in this index is shown by the ref column. How the index is used is shown by the type column. The least efficient value for type is ALL, then index, then range, ref, eq_ref.
+----------+-------+---------------+------+---------+------+-------+-------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+----------+-------+---------------+------+---------+------+-------+-------+
| messages | range | date | date | NULL | NULL | 11321 | |
+----------+-------+---------------+------+---------+------+-------+-------+
The product of rows from each table is an estimate of the number of combinations to be processed.
If only index is shown under Extra, then the the actual table does not need to be accessed at all, which is the major efficiency bonus mentioned above.
Here's how an index is used for an equijoin:
mysql> explain select distinct ticker from messages, sorted where messages.id=sorted.k;And here's how both are used together:
+----------+--------+---------------+---------+---------+----------+--------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+----------+--------+---------------+---------+---------+----------+--------+-------------+
| sorted | index | k | k | 4 | NULL | 124113 | Using index |
| messages | eq_ref | PRIMARY | PRIMARY | 4 | sorted.k | 1 | |
+----------+--------+---------------+---------+---------+----------+--------+-------------+
mysql> explain select distinct ticker from messages, sorted
where messages.id=sorted.k and date>'2001-04-23' order by ticker;
+----------+-------+---------------+------+---------+-------------+-------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+----------+-------+---------------+------+---------+-------------+-------+-------------+
| messages | range | PRIMARY,date | date | NULL | NULL | 11321 | |
| sorted | ref | k | k | 4 | messages.id | 10 | Using index |
+----------+-------+---------------+------+---------+-------------+-------+-------------+
The most important guideline is to avoid redundancy. Consider this schema:
messages: first name, last name, address, zip, title, bodyTo reduce redundancy, we should have two tables. We still have redundancy in the names, so we should introduce unique ids for authors:
messages: aid, title, bodyNote that we will often want unique ids for many different entities, e.g. for authors and separately for messages. That's why the new field is named aid and not just id.
person: aid, first name, last name, address, zip
Why is redundancy bad? There are at least four reasons:
Different database servers can enforce more or fewer types of constraint. For example, MySQL can enforce the first constraint by saying that the id field is not null in the messages table. The fourth constraint can be enforced with a primary key declaration on the pair of fields (name,address).
The second constraint is accommodated by not declaring id to be unique for the messages table. The third constraint is accommodated by making the messages and person tables separate. This constraint would be violated if we used one big table.
The last constraint is an example of a functional dependency. Unlike more sophisticated database systems, MySQL has no features to enforce functional dependencies.
Guideline: Before choosing a database design, write down what constraints
you know should be true. Then select a design that enforces these
constraints.