CSE134A LECTURE NOTES

April 23, 2001
 
 

ANNOUNCEMENTS

The deadline for the first project is now 4pm Wednesday.  I'm handing out the second project description.

Good design is very important in all the 134A projects.  Four design issues that always come up are usability, scalability, security, and modularity.
 
 

NULL VALUES AND OTHER MODIFIERS

What happens when we create a new tuple and only provide values for some of its fields?  Then the database server fills in default values for the remaining fields.  The most common default value is NULL.

As we saw on Monday, in MySQL and in all other relational databases, columns have fixed, predeclared types.  Unlike in most programming languages, the basic types also have modifiers available.  These include:

The syntax is inspired by Cobol, a 1960s business-oriented language.  Upper/lower case is not significant, and some keywords contain spaces, e.g. NOT NULL.

What happens when an insert command fails to specify a value for a field that is supposed to be NOT NULL?  I couldn't find the answer in our MySQL book!
 
 

INDEXES

Indexes are additional data structures that the db server can use to execute queries faster.  As a db user you need to know which indexes exist, but not how they are implemented exactly.

A simple index is on exactly one column of a table.  Fundamentally, it allows the table to be scanned efficiently with records sorted according to the value in that column.

To see why indexes are necessary, suppose that you want to scan postings sometimes by author in alphabetical order, and sometimes by date in time order.  You could do either of these if the table was stored in the right sequence, but not both.

The general way to create an index is with the alter table command, whose syntax is rather verbose.  The column you want the index on is specified inside parentheses.  For example:

alter table postings add index dx (date)
Now queries with order by date or group by date can be efficient.

An index can be on two or more columns, for example

alter table postings add index dax (date,author)
This would make order by date, author efficient but not order by author, date

The keyword unique before the keyword index adds the constraint that every tuple must have a different value for the column(s) named.  You might make the index on (date,author) be unique if dates include times down to the second, but you wouldn't want the index on just date to be unique.

A special alternative to unique is primary key.  Each table can have many different unique indexes, but only one primary key.  Often the table is actuallty stored sorted by its primary key.
 
 

UNIQUE IDENTIFIERS

Often it is convenient to have an id number for each record in a table.  You can also specify primary key as a modifier in a field declaration.  For example:
create table postings (
    ...
    id    integer unsigned not null auto_increment primary key
    ... )
Now every record should be inserted with a NULL value for the field named id.  The server will then automatically assign a value to this field that is one larger than the largest value ever assigned previously to a record for this table.
 
 

INDEXES AND JOIN QUERIES

Consider the query
select c1, c2 from t1, t2 where c1=c2
With an index on tables t2 and t3 this query can be executed by one scan of table t1.

Doing direct accesses to each row of a table is still much slower than scanning the table sequentially.  Reason: for each row, one block must be read from disk.  When scanning, many rows are processed per read of a block.
 
 



Copyright (c) by Charles Elkan, 2001.