CSE 134A LECTURE NOTES

October 23, 2002
 
 

ANNOUNCEMENTS

Remember that the midterm is in two weeks, on Wednesday, November 6.  Here are the Spring 2001 midterm and final examination.  Sample solutions are not available currently.  The midterm will count for 1/6 of your overall grade.
 
 

SESSION MANAGEMENT

The information here is taken from Session Handling with PHP 4 by Tobias Ratschiller,

PHP offers the key characteristics required of a session management library:

Session ending is not automatic, because it is difficult for the system to tell when the user is finished with his/her session.  The programmer can force a session end with the command session_destroy().  The default cookie lifetime is 0, meaning that the cookie is deleted when the user closes the browser.
 
 

SAVING SESSION DATA

To read and save session data, PHP uses storage modules. There are currently three storage modules available: With callbacks you can store sessions however you want: in a database like MySQL, in XML, on a remote server.  To do this  you need to provide PHP implementations of six functions.  To register these calback functions, you use session_set_save_handler().

The six functions are:

bool open (string save_path, string sess_name);

This function is executed on the initialization of a session; you should use it to prepare your functions, to initialize variables, or the like. It takes two strings as arguments. The first is the path where sessions should be saved, which is set in php.ini or by the session_save_path() function.  You can use this variable for module-specific configuration. The second argument is the session's name, by default the variable $PHPSESSID.
bool close ();
This function with no arguments is executed on shutdown of a session. Use it to free memory or to destroy your variables.
mixed read (string sess_id, );
This function is called whenever a session is re-started. It must read out the data of the session identified with sess_id and return it as a serialized string. If there's no session with this ID, an empty string "" is returned.
bool write (string sess_id, , string value);
When the session needs to be saved, this function is invoked. The first argument is a string containing the session ID; the second argument is the serialized representation of the session variables.
bool destroy (string sess_id, );
When a script calls session_destroy(), this function is executed.  It should delete all data associated with the session id .
bool gc (int max_lifetime, );
This function is called on a session's start-up with the probability specified in gc_probability. It's used for garbage collection; that is, to remove sessions that weren't updated for more than gc_maxlifetime seconds.
 
 

GARBAGE COLLECTION

The gc_maxlifetime configuration directive specifies how long after the last access to each session its data is destroyed.  This happens even if the cookie still exists on the client side.

Cleaning up old sessions (called "garbage collection") on every page request would cause considerable overhead.  Therefore, with gc_maxlifetime, you should use gc_probability.  This specifies with what probability the garbage collection routine is invoked. If gc_probability = 100, the cleanup is performed on every request.  By default gc=1 meaning old sessions will be removed with a probability of 1% per request.

If you pass the session ID via GET or POST instead, you need to pay special attention to the garbage collection routines.  Users might bookmark URLs containing the session ID, so you need to make sure that sessions are cleared frequently.  If the session data still exists when the user accesses the page with the session ID at a later time, PHP simply resumes the previous session instead of starting with a new session, which may not be your intention.  A value of 10 to 20 for gc_probability would better fit this scenario than the default value of 1.
 

You might ask yourself why PHP allows you to specificy a probability (gc_probability) which determines when garbage collection will occur, rather than a function which cleans up every n times.  Using the probability function means that the server does not have to store and update a global counter, which translates to faster execution.
 
 

SCREEN SCRAPING

"Screen scraping" means collecting via software information that was designed to be read by humans.  "Screen scraping" has been used for decades to create new interfaces to old software.  It is now quite common with web sites.  Writing code to extract nuggets of information from an HTML page, e.g. the price and availability of a book, is usually quite possible although not trivial.

Companies like Yodlee  (founded by UCSD CSE professor Venkat Rangan) use screen scraping to collect information fom financial sites about bills and accounts; see http://www.time.com/time/digital/reports/banking/screen.html .

Yahoo Shopping uses screen scraping to buikd its databases of items and prices offered by merchants.  Price comparison sites like  Addall.com use screen scraping to compile similar information.


 




Copyright (c) by Charles Elkan, 2002.