Midterm Grading Criteria and Comments
 
Below are the criteria that were used to grade the midterm. We also mention some common mistakes that people made.



Question 1

Graded by Greg Hamerly.  Email him with questions.

General Comments:
This question asked for PHP functions that used regular expressions to extract identifiers from strings. These functions should be efficient and concise. Each problem can be solved with 2-3 lines of code.

See the problem-specific comments for further details. Also, carefully read the documentation on split, explode, ereg, and other similar functions in PHP if you are confused about how they are used.

(a) [5 points]
Carefully write a PHP function using regular expressions that returns the first identifier present anywhere in a long string.

ANSWER:

<?
function firstid($string) {
    if (ereg("[[:alpha:]][[:alnum:]]*", $string, $result)) {
        return $result[0];
    }
    return null;
}
?>
COMMON PROBLEMS: (b) [5 points]
Carefully write a PHP function that returns all the identifiers in a string.

ANSWER:

<?
function allids($string) {
    $result = split("[^[:alnum:]]+", $string);
    $arr = [];
    $c = count($result);
    for ($i = 0; $i < $c; $i++) {
        if (ereg("[[:alpha:]][[:alnum:]]*", $result[$i], $out)) {
            $arr[] = $out[0];
        }
    }
    return $arr;
}
?>
The following simpler function incorrectly allows identifiers that start with digits, but points were not taken off for this mistake.
<?
function allids($string) {
    $result = split("[^[:alnum:]]+", $string);
    return $result;
}
?>
COMMON PROBLEMS: (c) [5 points]
Carefully write a PHP function that returns the last identifier in a string. (For efficiency, this function must not be based on extracting all the identifiers in the string).
<?
function lastid($string) {
    if (ereg("([[:alpha:]][[:alnum:]]*)[^[:alnum:]]*$", $string, $result)) {
        return $result[1];
    }
    return $null;
}
?>


COMMON PROBLEMS:


Question 2

Graded by Victor Gidofalvi.  Email him with questions.
(a) [2 points]
Write an SQL query that returns the names of presidents who are still alive.

ANSWER:

SELECT last_name, first_name
FROM president
WHERE death is null;

COMMON PROBLEMS:
 

(b) [3 points]
Write an SQL query that finds the state in which the greatest number of presidents have been born.

ANSWER:

SELECT state,
count(*) AS times
FROM president
GROUP BY
state
ORDER BY
times DESC
LIMIT 1;

COMMON PROBLEMS:
 

(c) [4 points]
Can you write a query that finds the states in which no presidents have been born? Explain your answer.

ANSWER:

NO. The entries in the president table only include states where presidents have been born. To list states where no presidents have been born we would need to have an additional table listing all states.

COMMON PROBLEMS:

Almost everyone gave a correct answer to this question. Students who answered YES with a reasonable explanation about what is needed to achieve this [an additional table with all the states] received also full credit.
 

(d) [4 points]
Write an SQL query that finds which presidents have the same
last name.
 

ANSWER:

SELECT last_name
FROM president
GROUP BY last_name
HAVING count(*) > 1;

An almost correct answer would have been:

SELECT p1.first_name,
p1.last_name,
p2.first_name,
p2.last_name
FROM president p1,
president p2
WHERE p1.last_name = p2.last_name AND
p1.first_name > p2.first_name;

However this query produces many duplicates. Consider a scenario, where there are four presidents with the same last name and distinct first names. Even though the last comparison of first name enforces a unique ordering of the presidents, the output of this query will contain 3 + 2 + 1 = 6 duplicates.

COMMON PROBLEMS:

(e) [4 points]
Why is the following query likely to be slow, regardless of what indexes are available?

SELECT * FROM president WHERE 2001 - year(BIRTH) < 75;

Explain how to write a different query that computes the same result but is much faster.

ANSWER:

In order for an index to work, the indexed field should appear alone (without any arithmetic operations or functions like year() or other user-defined functions) on one side of the comparison.

The following are examples of queries that are not taking advantage of indexes:

SELECT * FROM t1 WHERE a - 2;
SELECT * FROM t1 WHERE abs(a) = 3;
SELECT * FROM t1 WHERE abs(a) > 3;
SELECT * FROM t1 WHERE user_defined_function(a) < 100;

The following are examples of queries that are speeded up because of proper use of indexes:

SELECT * FROM t1 WHERE a > 3;
SELECT * FROM t1 WHERE a > abs(-3);
SELECT * FROM t1 WHERE a > year('19930104');
SELECT * FROM t1 WHERE a > year('19930104') - 1995;

This is the correct way to do use the index, and therefore speed the query:

# In this query, the date format is 'YYYYMMDD'
SELECT * FROM president WHERE birth > '19260101';

COMMON PROBLEMS:

(f) [3 points]
When you are testing the speed of an SQL query with an index and without an index, it is important to run each version of the query several times. Explain why.

ANSWER:

For both versions the table and/or the index will likely be copied to the cache, making subsequent queries faster. By running both versions of the query several times, we make sure that the cached versions are compared. This way we know better whether  the unindexed version is really slower than the indexed version in production use.

COMMON PROBLEMS:


Question 3

Graded by Joe Drish.  Email him with questions.

(a)  Scalability

Option A (database)

Important Advantage


Important Disadvantage
 


Option B (real-time web information retrieval, no data base)
 

Important advantage
 


Important disadvantage
 


Those were the main advantages and disadvantages.  If you mentioned anything along those lines you were given 3 points.  If you mentioned other, less important advantages or disadvantages you were also given credit.  If you said something that was not necessarily true then you may have been docked points, depending on the degree to which the statement was incorrect.
 

(b)  Convenience of presenting media
 

Option A (database)
 

Important Advantage
 


Important Disadvantage
 


Option B (real-time web information retrieval, no data base)
 

Important advantage


Important disadvantage


Those were the main advantages and disadvantages.  If you mentioned anything along those lines you were given 3 points.  If you mentioned other, less important advantages or disadvantages you were also given credit.  If you said something that was not necessarily true then you may have been docked points, depending on the degree to which the statement was incorrect.
 

(c)  Security
 

Option A (database)
 

Important Advantage
 


Important Disadvantage
 


Option B (real-time web information retrieval, no data base)
 

Important advantage
 


Important disadvantage


Those were the main advantages and disadvantages.  If you mentioned anything along those lines you were given 3 points.  If you mentioned other, less important advantages or disadvantages you were also given credit.  If you said something that was not necessarily true then you may have been docked points, depending on the degree to which the statement was incorrect.

(d)  User perception

Option A (database)

Option B (real-time web information retrieval, no data base) If you mentioned something about the speed of which data was obtained by using either option, then you were given 3 points.  If you mentioned something about how up-to-date the information was for either option, you were given an additional 3 points.  You were not given any points if you talked about how difficult it was to implement either option.
 

Common problems for Question 3:

Some people listed advantages for the data base option when they really applied to the no data base option.  You may have been deducted a point for this.  It is better to be clear and simple rather than wordy when writing your explanations.  Scalability is defined as how well a solution to a problem will work when the size of the problem increases.  Size in this case is users and data.  Please keep this definition in mind when thinking about scalability.