CSE 134A Discussion Section

Sockets and Spell Checking

Friday, 2002-10-18
TA: Dana Dahlstrom

Sockets

Sockets are covered in Chapter 13 of Professional PHP4, in the section beginning on page 457.

The following section (taken from http://joplin.ucsd.edu/~ghamerly/134A_f02/sockets/ by Greg Hamerly) concisely explains what you need to know about using sockets in PHP4 to communicate with the spelling server:

Using sockets in a client PHP script

PHP makes it very easy to do client socket programming. To get basic functionality, you should use the functions:

You can test your spell checker using code similar to the following:

<?php
    $fp = fsockopen("ai.ucsd.edu", 8518);
    fputs($fp, "this stirng has speling misteaks\n");
    while ("$$$\n" != ($misspelled = fgets($fp, 256))) {
        print("you misspelled $misspelled");
    }
    fclose($fp);
?>

See more on PHP networking functions here: http://www.php.net/manual/en/ref.network.php . Note that you should not use the persistent networking calls (e.g. pfsockopen) for this project. See more on PHP file functions here: http://www.php.net/manual/en/ref.filesystem.php.

Spell checking

The above code segment sends a sentence to the spell-checking server and retrieves its response. The output it prints is:

you misspelled stirng
you misspelled speling
you misspelled misteaks

To look under the hood, so to speak, and see what the spell-checking server's protocol looks like, you can use telnet like this:

$ telnet ai.ucsd.edu 8518
Trying 132.239.95.171...
Connected to joplin.ucsd.edu.
Escape character is '^]'.
this stirng has speling misteaks
stirng
speling
misteaks
$$$
^]
telnet> quit
Connection closed.

In this dialog, the strongly emphasized text is given as input, and the rest is output. Notice the server returns the misspelled words, one per line, followed by a line containing three dollar signs ($$$). Keep in mind each line is terminated by a line terminator (the '\n' character in PHP4).

Two ways to check spelling

There are at least two ways to use the spell-checking server; we'll call them batch mode and word-by-word mode. In batch mode, as in the example above, you send a block of text and then collect the list of misspelled words. In word-by-word mode, you send one word on a line, check to see whether the server flags it as misspelled, then send another word:

$ telnet ai.ucsd.edu 8518
Trying 132.239.95.171...
Connected to joplin.ucsd.edu.
Escape character is '^]'.
this
$$$
stirng
stirng
$$$
has
$$$
speling
speling
$$$
misteaks
misteaks
$$$
^]
telnet> quit
Connection closed.

You're free to use either mode, but the one you choose will affect how you cause the misspelled words to be highlighted in the preview. You can probably imagine looping over each word, sending it to the server, then printing it either highlighted or not depending on the result. Batch mode is faster because it doesn't require a round-trip time to the server for every word, but it requires a more sophisticated approach such as pattern replacement (perhaps ereg_replace() or preg_replace()) or a nested loop checking for each word whether it is in the list of misspelled words.

By the way, when we say misspelled words should be "highlighted" in the preview, you should accomplish this with a simple HTML tag like <STRONG> as a lowest common denominator to accommodate as many viewing modes as possible. If you know how to use cascading style sheets (CSS), feel free to change the colors, perhaps with a special <STRONG> class.