CSE130 LECTURE NOTES

October 13, 1999
 
 

ADMINISTRATION

Mark Smith has an update about the first project.
 
 

FUNCTIONS SHOULD BE FIRST-CLASS VALUES

Definition: A first-class value is one that can has all the standard operations that should be available for all values of all types.  A value is first-class if it can be: In typical PLs, being "first-class" is not all-or-nothing.  Some values are more first-class and some values are less first-class.

The property that the fewest values usually possess is the ability to be returned as the result of a function.  For example, in Pascal functions cannot return records.  In other words, the result type of a function cannot be a Cartesian product type.

Functions really are values similar, and they should be first-class in a powerful PL.  However in standard PLs such as C functions are not first-class.  In higher-level, more modern PLs such as ML, functions are first-class.

Definition: A routine is called higher-order if it takes another function as a parameter, and/or returns a function as its result.
 
 

TAGGED UNIONS

We'll return to higher-order functions and ML.  For now, let's finish our investigation of operators for creating composite types.

Remember, a type is a set of values. In mathematics, one of the most basic ways of making a new set is to take the union of two old sets.

In programming, different types have different legal operations. When we make a union, we need to still be able to identify which operations are legal on an element of the union.

Each member of the union type must be "tagged" to indicate what type it came from.  In Pascal:

type
   country = (canada, usa);
   zipcode =
       record
           case where: country of
              canada: (czip: string);
              usa : (azip: number)
       end;
Now we can use a switch command to do different operations on Canadian and American zipcodes.  Note that the switch command is semantically different from the case notation for a union type.  It's unfortunate that the same keyword case is used for both concepts.
 
 

THE DIFFERENCE BETWEEN A TAGGED UNION AND A CARTESIAN PRODUCT

Unions and products are very different ideas. However in practice, most unions are unions of products.

Many programming languages like Pascal have one notation that combines unions and products. C has separate notations using the keywords union and struct.

Tagged unions are also called disjoint unions. The idea is that no value of type zipcode can have a czip component and also an azip component. It's one or the other.

In Pascal and C unions are not required to be tagged. This is a notorious source of runtime errors.
 
 

RECURSIVE TYPES

A recursive type is a higher-level, mathematical version of a data structure using pointers.

These are a category of types that standard languages don't permit, but which are very useful. With 1980s compiler algorithms, they can be implemented efficiently.

Using pointers, the programmer must allocate and recapture memory, which is low-level.

For local variables (which are kept on the stack) and for intermediate storage used in arithmetic subexpressions, memory is managed automatically.
 
 

LISTS ARE RECURSIVE

Think about the type of lists. There are two fundamental differences between lists and tuples: Informally, a list of integers consists of an integer, possibly followed by another list of integers.

Rather more precisely, we can say that a list is either the empty list, or a pair consisting of a "first" element which is an integer and a "second" element which is itself a list of integers.

Formally let's write        integerlist =  nil + (integer * integerlist)

Here + means tagged union and * means tuple as before.

Note that this equation is circular: the type integerlist is defined in terms of itself.

However the definition is not meaningless because it has a base case: the empty list of length zero, which doesn't contain any integer, and which we just write nil.

Definition: A recursive definition is a circular definition that is meaningful because it has a non-circular base case.

Here are some examples of values of the type integerlist:

nil
<13, nil>
<13, <14, nil>>
<13, <4, <8, <16, nil>>>>
The last three examples are all special cases of the fact that <13,x> is an integerlist for any x such that x is an integerlist.
 
 

RECURSIVE TYPES IN GENERAL

Generally, a recursive type is one where the type is defined in terms of itself, with one or more non-circular base cases.  For example:

        myytree = real + string + (binarytree * integer * binarytree)

Any value of this type can be drawn as a binary tree where each leaf is an integer or a string and each internal node is a record containing two children and a real number.  For example:

   4
  / \
2.0  5
    / \
  1.5 "three"
Notice that the recursive definition of mytree has two base cases.
 
 

RECURSIVE TYPES IN TRADITIONAL LANGUAGES

In C and Pascal and other similar languages, the programmer must implement a recursive type using pointers.  For example lists of integers can be defined in Pascal as follows:
type
    listchoice = (nil, cons);

    intlistptr = ^ intlistrecord;

    intlistrecord =
        record
            case flag: listchoice of
                nil: ();
                cons: (first: integer; rest: intlistptr)
        end;

This is basically how lists are implemented in ML, Lisp, and Prolog also. The difference is that the compiler generates code like the above.  This is one justification for calling those languages "higher-level".

The advantages of using explicit pointers are:

The advantages of using recursive types are: Java hides pointers from the programmer, but not completely, like ML does.  It's not clear why Java doesn't go all the way.
 
 

INTRODUCTION TO ML

Functional programming involves giving declarations and evaluating expressions interactively.
pindar% sml
Standard ML of New Jersey, Version 75, November 11, 1991

- val two = 2;
val two = 2 : int

- fun double x = 2 * x;
val double = fn : int -> int

- double two;
val it = 4 : int

The ML type system is quite elegant. There are base types int, real, string, and some more

There are operators that create new types from old types:

Remember that a tuple is like a record, but fields are identified by position, not by name.

ML infers types automatically:

- fun d2 x = 2.0*x;
val d2 = fn : real -> real
Deduction: 2.0 is a real, so * must be the function of type real * real -> real, so x must be a real, so d2 must be of type real -> real.
 
 



Copyright (c) by Charles Elkan, 1999.