CSE130 LECTURE NOTES


November 19, 1999
 
 

AN EXCELLENT PAPER TO READ

The paper distributed today is Growing a Language by Guy Steele.

From www.uvc.com/videos/oo98Steele.video.html:

"Over the last quarter-century Guy Steele has been convinced that trying to design a complete and perfect programming language is the worst thing you can do. A programming language (including its associated libraries) must grow over time as its user community and its development community grow. This is a different situation from 25 years ago, when all such communities were relatively small. The difference is a problem of scale. As a result, programming language design now and in the future is necessarily as much a matter of social engineering as technical engineering and must rely more on a set of general principles than on a set of specific technical decisions."

"Guy L. Steele Jr. is a Distinguished Engineer at Sun Microsystems, Inc., where he is responsible for research in language design and implementation strategies, and architectural and software support. He received his AB in Applied Mathematics from Harvard College, and his S.M. and Ph.D. in computer science and artificial intelligence from MIT. He has held positions at Carnegie Mellon University; at Tartan Labs in Pittsburgh, PA; and at Thinking Machines Corporation. He joined Sun Microsystems in 1994. He is author or co-author of five books. The ACM awarded him the 1988 Grace Murray Hopper Award and named him an ACM Fellow in 1994. He led the team that received a 1990 Gordon Bell Prize honorable mention for achieving the fastest speed to that date for a product application: 14.182 Gigaflops. He was also awarded the 1996 ACM SIGPLAN Programming Languages Achievement Award."
 
 

THE DIFFICULTY OF REQUIREMENTS ANALYSIS

I sense that there is some frustration about the vagueness of the description of the third project.  Renata is doing an excellent job of trying to pin down every detail of the specification of the universe ADT.  But in addition there are deeper issues to understand and reflect upon here.

(1) Requirements analysis is intrinsically difficult.  For complicated tasks, there are always many, many details to be specified.  It is difficult to write down a specification that is unambiguous, comprehensive, and an order of magnitude smaller than an actual implementation.

Interviewing clients to determine the detailed specification for a programming task is always a frustrating effort for both the interviewers and the interviewees.

(2) You should think about what your real job is, i.e. what would really make the client happy.  If the client hasn't given details in the initial task description, perhaps s/he does not want to.  Your job is not to force the client to do something s/he does not want to.  Instead, try to understand the wider context, and what the customer really wants.  Here, there are at least three different perspectives:

(3) In any large project, you have to make assumptions in order to make progress.  You should try to ensure that assumptions are easy to change.  This means that assumptions should be documented and encapsulated.
 
 

STRUCTURAL VERSUS NAME TYPE EQUIVALENCE

Every programming language has to choose between satisfying case 1 or case 2 behavior.

Satisfying the first case is called name equivalence for types.  Two variables are defined to have equivalent types if and only if they have literally the same type, i.e. the names of their types are the same.

Example: Consider

type alpha = array [1..10] of real;

var a,b: alpha;
    c,d: array [1..10] of real;

Here a and b have equivalent types, and so do c and d, but alpha is not equivalent to the type of c and d.

Satisfying the second case is called structural equivalence for types.  Two variables are defined to have equivalent types if their types are constructed in similar ways.

The definition of structural equivalence needs to be made more concrete.  It is actually a recursive definition:

When Pascal was first designed by Prof. Niklaus Wirth, he failed to say what type equivalence policy the language should follow.  Early implementations of the language were therefore incompatible.
 
 

FIRST SPECIAL CASE OF TYPE EQUIVALENCE: COERCION

Every programming language also has some ad hoc type equivalence rules that cover special cases.

The most common rule is that values of type integer can be converted to values of type real when necessary. This is called coercion.

This rule can be ambiguous: consider <missing example>

Note that coercion is different from overloading, where the same name is used for different operations. For example the minus sign is usually overloaded to mean four different operations: integer and real negation (a unary operation) and subtraction (a binary operation).

A programming language can have overloading without having coercion!
 
 

SECOND SPECIAL CASE OF TYPE EQUIVALENCE: SUBTYPES

In many languages, the sets of values of some types are subsets of the sets of values of other types. These types can be called subtypes. Typically every operation that is legal for the bigger type is legal for the subtype.  (This property of these operations is called inclusion polymorphism--see later.)

For example in Pascal, consider

type smallint = -256 .. 255;
var x: smallint;
Here smallint is a subtype of integer, so despite name equivalence, all integer operations can be used on x.
 
 

TYPE EQUIVALENCE AND OBJECT-ORIENTED PROGRAMMING

An abstract data types (ADT) should clearly have name equivalence, because its implementation (i.e. the structure of its concrete type) is invisible to its users.

Similarly, an object-oriented class should have name equivalence.  But by default a class method should also be available for objects of a child class.  This is a form of overloading.

In many OO PLs the programmer can choose the visibility of methods, i.e. choose which overloading is allowed.  This really allows the programmer to choose the exact notion of type equivalence to use for each class, which makes some language mechanisms (subtypes for example) redundant.  A subtype is really just a derived class.
 
 

AN EXAMPLE OF A TASK FOR A SCRIPTING LANGUAGE SUCH AS awk

How should we shuffle the lines of a large file into random order?

Answer: place a random number at the start of each line, sort the file according to these numbers, and finally remove the random numbers.

The time complexity of the entire shuffling process is dominated by the time complexity of sorting, which is O(n log n) while adding and removing the random numbers is only O(n).

The shuffling process should work for files that do not fit into main memory.  Therefore traditional space complexity is less important than I/O complexity.

No programming language can make a slow algorithm fast.  Therefore, no PL can compensate for a bad choice of algorithm.  If part of an process is intrinsically slow, then a slow implementation of another part of the process is harmless.

Whatever the PL one uses, one should try to reuse existing code.  An important criterion for choosing a PL is whether it facilitates code reuse.
 
 



Copyright (c) by Charles Elkan, 1999.