CSE134A LECTURE NOTES

October 8, 2001
 
 

ANNOUNCEMENTS

The deadline for the second project is Monday October 22, 4:30pm.  I'm handing out the joint team self-evaluation form now.

Teamwork skills are important for graduate school, for software jobs with top companies, and for project leader and management jobs in all organizations.
 
 

INDEXES

Indexes are additional data structures that the db server can use to execute queries faster.  As a db user you need to know which indexes exist, but not how they are implemented exactly.

A simple index is on exactly one column of a table.  Fundamentally, it allows the table to be scanned efficiently with records sorted according to the value in that column.

To see why indexes are necessary, suppose that you want to scan messages sometimes by author in alphabetical order, and sometimes by date in time order.  You could do either of these if the table was stored in the right sequence, but not both.

The general way to create an index is with the alter table command, whose syntax is rather verbose.  The column you want the index on is specified inside parentheses.  For example:

alter table messages add index dx (date)
Now queries with order by date or group by date can be efficient.
 
 

INDEXES AND JOIN QUERIES

Consider the query
select c1, c2 from t1, t2 where c1=c2
With an index on tablet2 this query can be executed by one scan of table t1.

Doing direct accesses to each row of a table is still much slower than scanning the table sequentially.  Reason: for each row, one block must be read from disk.  When scanning, many rows are processed per read of a block.

EFFICIENCY WITH AND WITHOUT INDEXES

We discussed the big-O time efficiency (linear, n log n, quadratic) of select queries with and without indexes in detail.  Be sure to read carefully the relevant sections of the MySQL book, especially in the chapter entitled Query Optimization.

We also discussed the definition of scalability from the point of view of a user: constant or near-constant response time despite increases in task size, e.g. database size, number of users, number of simultaneous users.

Every index causes additional slowdown for inserts and updates, and uses space proportional to the number of rows in the table.  So, only create an index if it is necessary.
 
 

MORE COMPLEX INDEXES

An index can be on two or more columns, for example
alter table messages add index dax (date,author)
This would make order by date, author efficient but not order by author, date

An index on multiple columns, e.g. (state, city, zip), is also an index on its prefixes, i.e. (state, city) and (state).

The keyword unique before the keyword index adds the constraint that every tuple must have a different value for the column(s) named.  You might make the index on (date,author) be unique if dates include times down to the second, but you wouldn't want the index on just date to be unique.

A special alternative to unique is primary key.  Each table can have many different unique indexes, but only one primary key.  Often the table is actually stored sorted by its primary key.
 
 



Copyright (c) by Charles Elkan, 2001.