CSE134A LECTURE NOTES

December 2, 2002
 
 

ANNOUNCEMENTS

Wednesday and Friday will be review sessions.
 
 

WEB SERVICES

The SOAP standard does not provide any of the following: Most of these are necessary for all non-trivial distributed applications, including Internet-based web services.

Dilemma:  There is little pressure on developers to implement all parts of a so-called open standard.  Open-source developers get satisfaction and praise for the initial version of an implementation, not for bug fixes and extensions.  Commercial developers have strategic reasons to do partial implementations.
 
 

THE FUTURE OF SOAP

Microsoft is pushing SOAP as part of its .NET initiative.  To support .NET, and all its programming standards, Microsoft provides software development environments and large, useful libraries of reusable code.  Critically, Microsoft provides easy-to-use GUI-based programming tools, so it can attract a much larger number of developers than vendors who do not provide visual tools.  Microsoft also provides excellent documentation and guidance for programmers at all levels of expertise.

Microsoft's strategy is always embrace, extend, dominate.  Because so many services are needed on top of SOAP by applications, this strategy should work well with SOAP.

Most SOAP services will be private, not public for end-users.  End-users need attractive, easy-to-use graphical services, as provided by HTML, not remote-procedure-call interfaces

The era of free web services subsidized by over-optimistic investors is over.  How will SOAP services be paid for?  There may be an initialization problem: we need reliable, useful micropayment services in order to support the development of other SOAP services.
 
 

APPLICATION SERVERS

What hardware and software architectures will be used to deploy web services?

Web servers have evolved into what are now called "application servers."  These software platforms provide runtime environments for executing complex software in response to requests from clients.

There are two main types of app server: page-based and component-based.  PHP is page-based, while EJB (Enterprise Java Beans) is the most common among component-based.  For an overview see a May 2001 article by Timothy Dyck .

PAGE-BASED APP SERVERS

In addition to PHP, the main types are Microsoft ASP, ColdFusion, and Java server pages (JSP).

Code for generating displays is mixed with HTML output code.  This reduces modularity and makes developing multiple alternative interfaces difficult.  On the other hand prototyping is easy.

ColdFusion has fail-over and load-balancing between servers.  PHP version 4 and ASP.NET have compilation to byte-code.  Other performance features include caching.  Early implementations had poor performance and scalability, but the latest versions can be very good.

Typically, page-based app servers do not provide script-level multithreading, or Java support.  They typically do have the ability to invoke other servers, e.g. via sockets, CORBA (common object request broker architecture), Microsoft COM (common object model), SOAP (simple object access protocol), and HTTP.
 
 

COMPONENT-BASED

These typically use Java as their programming language, and a distributed object standard called Enterprise Java Beans (EJB), which is part of the general Java 2 Enterprise Edition (J2EE) standard along with a database API known as JDBC (supposedly not an acronym for Java Data Base Connectivity).

Pricing is $795 to $35000 per CPU for IBM WebSphere.  Microsoft products are much cheaper.

Despite using Java and the EJB so-called standard, you are typically locked in to one platform vendor due to incompatibilities.

The platforms provide

These last three features have historically been provided by so-called transaction monitors.

New modules of code can encapsulate existing databases and other servers.  Different modules can provide different outputs for different display devices.

The importance of Java may decline because Sun and Microsoft both support SOAP and XML.

The scalability of component-based servers is not as good as suggested by the hype from vendors.  One concrete case study reports 4000 users at 177 pages per second on six servers: two web, three application, one Oracle.  Extreme scalability requires custom architecture and software, and traditional computer science design principles.  Note that Yahoo is migrating to PHP and MySQL.

WEB SITE ARCHITECTURAL OBJECTIVES

These comments are based on A Blueprint for Building Web Sites Using the Microsoft Windows DNA Platform, Draft Version 0.9,  Microsoft Corporation, January 2000.  (The link is to a local PDF copy of this document.)

This document explains a standard architecture that is successful in achieving several important objectives.  The article is an example of what is called a "white paper" in the computer industry.

Important design objectives for any large web site include:

The standard overall design that meets these goals consists of loosely-connected tiers (i.e. levels) of replicated, task-focused servers. Application complexity is managed by specialization: different servers perform different functions.  Manageability and scalability are often achieved in part by outsourcing, i.e. remote hosting.  A specialized company installs the servers near a main Internet access point.

Usability issue: provide limited functionality even if some backends are unavailable, e.g. let users send mail even if they cannot read messages; let users see the catalog even if they cannot place orders.

Availability/security issues: prevent a script failure from crashing the web server, prevent failure on one server from being repeated on identical servers.
 
 

SECURITY DOMAINS

Security is a vital issue with at least two aspects: protect servers from attacks, and also protect data from theft.  Security is based on multiple separate security zones.  Each zone is protected by a firewall, i.e. network packet filter.  Typically there are at least three zones: the public Internet, a public-facing so-called "demilitarized zone," abbreviated DMZ, and a private zone with sensitive data.

A security mechanism, for example the use of security domains, has three major objectives:

Security domains are regions with restricted and monitored communication.  Domains may be geographical, organizational, by server type, by data type.  Domains may be nested but preferably not overlapping.

A "firewall" is a device that inspects every packet of data coming into (or out of) a domain.  Typically suspicious packets are simply dropped.  Simple firewalls, known as packet filters, just look at the IP addresses and port numbers mentioned in packets.

Each frontend server in the DMZ has an operating system hardened for security.  Firewalls separate it from the Internet and also from the internal network.
 
 

NETWORKING

A large web site will have several separate local-area networks (LAN): Each security domain often has a separate management network that overlays the other networks.  Management involves consoles, human monitors, and automated monitoring software.  With a physically separate management network, each host has two network interface cards (at least).

Monitoring is based on logging, which can be a heavy network and storage load, up to 4000 gigabytes per day at Yahoo.

Clients benefit from multiple ISPs through a feature of standard Internet Domain Name Servers (DNS): round-robin behavior.  If an IP address does not respond, the client just has to hit "reload."

All frontends at one ISP respond to the same IP address, which is handled by a load-balancer.
 
 

FRONT-END SERVER DESIGN

Front-end servers have no long-term state, so they can be cloned.  Each will have its own copy of the same content, i.e. HTML, PHP, etc.

Load-balancing software and/or hardware spreads requests across multiple front-end servers, and includes failure detection for frontend servers.  Several different load-balancing techniques exist, with different levels of complexity: random, adaptive, stateful, adaptive and stateful.

Session management stores state information in clients and a backend server, not in the frontend servers.  Client state can easily be distributed across multiple state servers.  Typically session information is not highly confidential and may be stored inside the DMZ.

SSL sessions are segregated from regular HTTP sessions.  SSL servers have hardware for encryption.
 
 

BACK-END SERVER DESIGN

Persistent content is divided across multiple back-end servers.  Fault tolerance is expensive for servers that must maintain state.  Failover clustering assumes that different servers can access the same or replicated disk drives.  A group of servers that share storage is called a partition.

Allocating data to partitions is difficult.  The objective is to avoid hot spots.  We need tools to split and merge partitions.  A large multiprocessor system can replace multiple partitions, but is usually more expensive.

The most complex sites use other applications, encapsulated as objects, e.g. Enterprise Java beans.  Other applications include legacy databases, existing enterprise software e.g. for manufacturing planning, external ad servers.
 
 



Copyright (c) by Charles Elkan, 2002.