Remember that the midterm is next Wednesday, October 31. Here
are the Spring 2001
midterm
and final
examination. Sample solutions are not available currently.
The midterm will count for 1/6 of your overall grade.
The PHP function ereg($pattern,$input) returns TRUE if the pattern is found in the string. Remember to use ^ and $ if you want the pattern to match only if it matches the whole string. The function eregi() is similar except it ignores upper case/lower case differences.
The function split($pattern,$input) returns an array of strings that is the result of dividing up its second argument into pieces, using matches to the first argument as a delimiter. For example:
list ($month, $day, $year) = split ('[/.-]', $date);Note that in this example the slash, period, and hyphen are not escape sequences: they stand for themselves. The list operator produces a tuple instead of an array.
If the delimiter is not found or the delimiter is empty, then the first element of the array (with subscript 0) gets the whole input. If the delimiter is repeated consecutively, the array will include null items.
An optional third argument says how many items to return. The last item then contains the whole remainder of the string. This is useful for writing before() and after() functions that take a specific part of a web page These functions are useful for information extraction, for example:
function after($pattern, $text) {Note that this function returns all of $text after the first occurrence of the pattern.
if ($pattern == "") return $text;
$s = split($pattern, $text, 2);
return $s[1];
}
| [:alpha:]]+ | Greetings |
| [:alpha:]]* | Greetings |
| n[et]* | n in Greetings |
| n[et]+ | n in planet |
| G.*t | Greetings, planet Eart |
Leftmost, longest behavior means that to match the first string delimited by single quotes you must write the pattern '[^']*' This pattern explicitly says that quote characters are not allowed inside the match.
Note that the top priority for the match found is "leftmost."
"Longest" is only the second priority.
Avoid especially expressions that match multiple ways. For example do not write .*<big><b>.*</b></big>.* This is bad for several reasons.
$text = after("<big><b>",$text);
$title = before("</b></big>",$text);
When possible, use plain strings instead of regular expressions. The explode() function has the same effect as split(), but the delimiter is an ordinary string, not a regular expression. Therefore explode() is much more efficient.
For the new project and in general, it is important that you adopt a
well-organized and efficient approach for doing information extraction.
Do not just use regular expressions developed by trial and error.
One part of your report should be an explanation of your approach,
which should be as clear and simple as possible. In the report, describe
the capabilities and limitations of your strategy. Which changes
in the data sources could you handle, and which changes would break your
strategy?
If ereg() finds any matches at all, then $regs is filled with exactly ten elements, even though more or fewer than ten parenthesized substrings may actually match. If no matches are found, then $regs is not altered.
For example, to convert from an ISO date to a U.S. style date:
if (ereg ("([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})", $date, $regs)) {
echo "$regs[3].$regs[2].$regs[1]";}else { echo "Invalid date format: $date"; }
For example, suppose we want to separate all the words in a string by commas:
ereg_replace("[ \n\r\t]+", ",", trim($str));
$string=ereg_replace("([a-z])([A-Z])", "\\1xxx\\2" , "FieldNamePlus");
FieldxxxNamexxxPlus