CSE134A LECTURE NOTES

November 5, 2001
 
 

ANNOUNCEMENTS

On Friday we handed back grade slips for the second project.
 
 

WEB SITE ARCHITECTURAL OBJECTIVES

Today's lecture is based on A Blueprint for Building Web Sites Using the Microsoft Windows DNA Platform, Draft Version 0.9,  Microsoft Corporation, January 2000.  (The link is to a local PDF copy of this document.)

This document explains a standard architecture that is successful in achieving several important objectives.  The article is an example of what is called a "white paper" in the computer industry.

Important design objectives for any large web site include:

The standard overall design that meets these goals consists of loosely-connected tiers (i.e. levels) of replicated, task-focused servers. Application complexity is managed by specialization: different servers perform different functions.  Manageability and scalability are often achieved in part by outsourcing, i.e. remote hosting.  A specialized company installs the servers near a main Internet access point.

Usability issue: provide limited functionality even if some backends are unavailable, e.g. let users send mail even if they cannot read messages; let users see the catalog even if they cannot place orders.

Availability/security issues: prevent a script failure from crashing the web server, prevent failure on one server from being repeated on identical servers.
 
 

SECURITY DOMAINS

Security is a vital issue with at least two aspects: protect servers from attacks, and also protect data from theft.  Security is based on multiple separate security zones.  Each zone is protected by a firewall, i.e. network packet filter.  Typically there are at least three zones: the public Internet, a public-facing so-called "demilitarized zone," abbreviated DMZ, and a private zone with sensitive data.

A security mechanism, for example the use of security domains, has three major objectives:

Security domains are regions with restricted and monitored communication.  Domains may be geographical, organizational, by server type, by data type.  Domains may be nested but preferably not overlapping.

A "firewall" is a device that inspects every packet of data coming into (or out of) a domain.  Typically suspicious packets are simply dropped.  Simple firewalls, known as packet filters, just look at the IP addresses and port numbers mentioned in packets.

Each frontend server in the DMZ has an operating system hardened for security.  Firewalls separate it from the Internet and also from the internal network.
 
 

NETWORKING

A large web site will have several separate local-area networks (LAN): Each security domain often has a separate management network that overlays the other networks.  Management involves consoles, human monitors, and automated monitoring software.  With a physically separate management network, each host has two network interface cards (at least).

Monitoring is based on logging, which can be a heavy network and storage load, up to 4000 gigabytes per day at Yahoo.

Clients benefit from multiple ISPs through a feature of standard Internet Domain Name Servers (DNS): round-robin behavior.  If an IP address does not respond, the client just has to hit "reload."

All frontends at one ISP respond to the same IP address, which is handled by a load-balancer.
 
 



Copyright (c) by Charles Elkan, 2001.