CSE 134A teaches how to develop web-based applications that deliver information dynamically, either to human users or ("web services") to other applications.
The course will cover architectures for server-side web-based systems that scale to thousands and even millions of users. The topics discussed will be related to Windows DNA, Microsoft .NET, and Sun ONE, but will not be vendor-specific. Standards for information exchange over the Internet including XML and SOAP will be covered.
The course may also cover algorithms for organizing and searching web-based information, such as those used by the Google search engine.
Topics that will be mentioned in 134A but not covered systematically include IP networking, relational database systems in general, web server design, HTML, user interface design, and text processing using regular expressions.
We chose PHP and MySQL because they are open-source and free.
You are encouraged to install your own copies of the software on your own
Linux machine.
A PHP-based web page looks like HTML interleaved with programming commands in a language that resembles C and Perl. Other software technologies that are similar include Microsoft ASP, Cold Fusion, Java, Java Server Pages, and Java servlets.
One main advantages of PHP version 4 is Zend, a byte-code compiler that is invoked automatically. This makes PHP pages run much faster, and is similar to how Java is implemented.
A PHP program (called a script) is executed on the web server computer, unlike Javascript, which is executed by the client browser.
PHP commands have access to information about the browser, the values of fields in a form on the browser, and can retrieve and/or update information in a database.
General applications: Responding to forms, displaying user-customized
content, rotating content. Examples: Slashdot, bug tracking, shopping
cart.
Design: Information is stored in a database, usually relational such as MySQL or Oracle. Scripts (i.e. programs) running on the web server query the database and then add formatting etc. to create an HTML page "on the fly" that is sent to the user's browser.
Design objectives include:
Abstract: Search is one of the most ubiquitous and important applications
used on the internet, but it is also one of the hardest applications to
do well. Google is a search engine company that began as a
research project at Stanford University, and has evolved into the world's
largest and most trafficked search engine in just under three years.
Three main characteristics have driven this growth: search quality, index
size, and speed. Addressing these issues has required tackling problems
in a range of computer science disciplines, including algorithm and data
structure design, networking, operating systems, distributed and fault-tolerant
computing, information retrieval, and user interface design. In this
talk, I'll focus on Google's unique hardware platform of 10,000 commodity
PCs running Linux, and some of the challenges and benefits presented by
this platform. I'll also describe some of the interesting problems
that arise in crawling and indexing more than a billion web pages, and
performing 140 million queries per day on this index. Finally, I'll
describe some of the challenges facing search engines in the future.