Dilemma: There is little pressure on developers to implement all
parts of a so-called open standard. Open-source developers get satisfaction
and praise for the initial version of an implementation, not for bug fixes
and extensions. Commercial developers have strategic reasons to do
partial implementations.
Microsoft's strategy is always embrace, extend, dominate. Because so many services are needed on top of SOAP by applications, this strategy should work well with SOAP.
Most SOAP services will be private, not public for end-users. End-users need attractive, easy-to-use graphical services, as provided by HTML, not remote-procedure-call interfaces
The era of free web services subsidized by over-optimistic investors
is over. How will SOAP services be paid for? There may be an
initialization problem: we need reliable, useful micropayment services
in order to support the development of other SOAP services.
Web servers have evolved into what are now called "application servers." These software platforms provide runtime environments for executing complex software in response to requests from clients.
There are two main types of app server: page-based and component-based. PHP is page-based, while EJB (Enterprise Java Beans) is the most common among component-based. For an overview see a May 2001 article by Timothy Dyck .
Code for generating displays is mixed with HTML output code. This reduces modularity and makes developing multiple alternative interfaces difficult. On the other hand prototyping is easy.
ColdFusion has fail-over and load-balancing between servers. PHP version 4 and ASP.NET have compilation to byte-code. Other performance features include caching. Early implementations had poor performance and scalability, but the latest versions can be very good.
Typically, page-based app servers do not provide script-level multithreading,
or Java support. They typically do have the ability to invoke other
servers, e.g. via sockets, CORBA (common object request broker architecture),
Microsoft COM (common object model), SOAP (simple object access protocol),
and HTTP.
Pricing is $795 to $35000 per CPU for IBM WebSphere. Microsoft products are much cheaper.
Despite using Java and the EJB so-called standard, you are typically locked in to one platform vendor due to incompatibilities.
The platforms provide
New modules of code can encapsulate existing databases and other servers. Different modules can provide different outputs for different display devices.
The importance of Java may decline because Sun and Microsoft both support SOAP and XML.
The scalability of component-based servers is not as good as suggested by the hype from vendors. One concrete case study reports 4000 users at 177 pages per second on six servers: two web, three application, one Oracle. Extreme scalability requires custom architecture and software, and traditional computer science design principles. Note that Yahoo is migrating to PHP and MySQL.
This document explains a standard architecture that is successful in achieving several important objectives. The article is an example of what is called a "white paper" in the computer industry.
Important design objectives for any large web site include:
Usability issue: provide limited functionality even if some backends are unavailable, e.g. let users send mail even if they cannot read messages; let users see the catalog even if they cannot place orders.
Availability/security issues: prevent a script failure from crashing
the web server, prevent failure on one server from being repeated on identical
servers.
A security mechanism, for example the use of security domains, has three major objectives:
A "firewall" is a device that inspects every packet of data coming into (or out of) a domain. Typically suspicious packets are simply dropped. Simple firewalls, known as packet filters, just look at the IP addresses and port numbers mentioned in packets.
Each frontend server in the DMZ has an operating system hardened for
security. Firewalls separate it from the Internet and also from the
internal network.
Monitoring is based on logging, which can be a heavy network and storage load, up to 4000 gigabytes per day at Yahoo.
Clients benefit from multiple ISPs through a feature of standard Internet Domain Name Servers (DNS): round-robin behavior. If an IP address does not respond, the client just has to hit "reload."
All frontends at one ISP respond to the same IP address, which is handled
by a load-balancer.
Load-balancing software and/or hardware spreads requests across multiple front-end servers, and includes failure detection for frontend servers. Several different load-balancing techniques exist, with different levels of complexity: random, adaptive, stateful, adaptive and stateful.
Session management stores state information in clients and a backend server, not in the frontend servers. Client state can easily be distributed across multiple state servers. Typically session information is not highly confidential and may be stored inside the DMZ.
SSL sessions are segregated from regular HTTP sessions. SSL servers
have hardware for encryption.
Allocating data to partitions is difficult. The objective is to avoid hot spots. We need tools to split and merge partitions. A large multiprocessor system can replace multiple partitions, but is usually more expensive.
The most complex sites use other applications, encapsulated as objects,
e.g. Enterprise Java beans. Other applications include legacy databases,
existing enterprise software e.g. for manufacturing planning, external
ad servers.