CSE134A LECTURE NOTES

September 24, 2001
 
 

WELCOME

This is CSE 134A.  Please see today's introductory handout.  Points to note: Class will start at 4:40pm sharp and end at 6pm sharp.
 
 

OVERVIEW OF THE COURSE

We will study modern software that is in widespread use: the PHP scripting language (version 4) and the MySQL relational database client/server software.

CSE 134A teaches how to develop web-based applications that deliver information dynamically, either to human users or ("web services") to other applications.

The course will cover architectures for server-side web-based systems that scale to thousands and even millions of users.  The topics discussed will be related to Windows DNA, Microsoft .NET, and Sun ONE, but will not be vendor-specific.  Standards for information exchange over the Internet including XML and SOAP will be covered.

The course may also cover algorithms for organizing and searching web-based information, such as those used by the Google search engine.

Topics that will be mentioned in 134A but not covered systematically include IP networking, relational database systems in general, web server design, HTML, user interface design, and text processing using regular expressions.

We chose PHP and MySQL because they are open-source and free.  You are encouraged to install your own copies of the software on your own Linux machine.
 
 

PHP IS A SCRIPTING LANGUAGE

The first web servers, around 1995, provided only static web pages, written in HTML.

A PHP-based web page looks like HTML interleaved with programming commands in a language that resembles C and Perl.  Other software technologies that are similar include Microsoft ASP, Cold Fusion, Java, Java Server Pages, and Java servlets.

One main advantages of PHP version 4 is Zend, a byte-code compiler that is invoked automatically.  This makes PHP pages run much faster, and is similar to how Java is implemented.

A PHP program (called a script) is executed on the web server computer, unlike Javascript, which is executed by the client browser.

PHP commands have access to information about the browser, the values of fields in a form on the browser, and can retrieve and/or update information in a database.

General applications: Responding to forms, displaying user-customized content, rotating content.  Examples: Slashdot, bug tracking, shopping cart.
 
 

DESIGN OBJECTIVES FOR WEB SITES

Most web-based information services have similar high-level design and objectives.

Design: Information is stored in a database, usually relational such as MySQL or Oracle.  Scripts (i.e. programs) running on the web server query the database and then add formatting etc. to create an HTML page "on the fly" that is sent to the user's browser.

Design objectives include:

SPECIAL GOOGLE SEMINAR

Date: Wednesday, September 26, 2001
Time: 11:00am
Place: AP&M Room 4301
Speaker: Jeff Dean, Google, Inc.
Title:  Google: Finding Needles in a 10 TB Haystack, 140M Times/Day

Abstract:  Search is one of the most ubiquitous and important applications used on the internet, but it is also one of the hardest applications to do well.   Google is a search engine company that began as a research project at Stanford University, and has evolved into the world's largest and most trafficked search engine in just under three years.  Three main characteristics have driven this growth: search quality, index size, and speed.  Addressing these issues has required tackling problems in a range of computer science disciplines, including algorithm and data structure design, networking, operating systems, distributed and fault-tolerant computing, information retrieval, and user interface design.  In this talk, I'll focus on Google's unique hardware platform of 10,000 commodity PCs running Linux, and some of the challenges and benefits presented by this platform.  I'll also describe some of the interesting problems that arise in crawling and indexing more than a billion web pages, and performing 140 million queries per day on this index.  Finally, I'll describe some of the challenges facing search engines in the future.
 
 



Copyright (c) by Charles Elkan, 2001.