CSE130 LECTURE NOTES

April 15, 2002
 
 

ADMINISTRATION

The second project is now available.  The due date is Wednesday May 1 changed to May 8.  This project will be easy if you understand the concepts in today's lecture well, so ask questions about these on Discus, not just about the assignment.
 
 

TAGGED UNIONS

We'll return to higher-order functions and ML.  For now, let's finish our investigation of operators for creating composite types.

Remember, a type is a set of values. In mathematics, one of the most basic ways of making a new set is to take the union of two old sets.

In programming, different types have different legal operations. When we make a union, we need to still be able to identify which operations are legal on an element of the union.

Each member of the union type must be "tagged" to indicate what type it came from.  In Pascal:

type
   country = (canada, usa);
   zipcode =
       record
           case where: country of
              canada: (czip: string);
              usa : (azip: number)
       end;
Now we can use a switch command to do different operations on Canadian and American zipcodes.  Note that the switch command is semantically different from the case notation for a union type.  It's unfortunate that the same keyword case is used for both concepts.
 
 

THE DIFFERENCE BETWEEN A TAGGED UNION AND A CARTESIAN PRODUCT

Unions and products are very different ideas. However in practice, most unions are unions of products.

Many programming languages like Pascal have one notation that combines unions and products. C has separate notations using the keywords union and struct.

Tagged unions are also called disjoint unions. The idea is that no value of type zipcode can have a czip component and also an azip component. It's one or the other.

In Pascal and C unions are not required to be tagged. This is a notorious source of runtime errors.
 
 

NOTATION IN THE "ML" LANGUAGE FOR UNIONS AND PRODUCTS

ML has an important difference from traditional languages regarding union types: A union must be tagged by an enumeration type, which is given in the union type definition.

For example, here is a type that is the tagged union of int and real:

datatype mynumber = exact of int | approx of real;
Here is a variable named num whose type is mynumber:
val num = approx 2.5;
Here is a variable named trunc that is initialized to a value determined by a case operation that looks at the tag of num:
val trunc = case num of exact i => i | approx x => floor(x+0.5)
The righthand side of the = above is an expression, not a command. However it is similar to a switch command in C.  Each line in the ML case expression has a pattern such as exact i on the lefthand side of the => symbol.
 
 

RECURSIVE TYPES

A recursive type is a higher-level, mathematical version of a data structure using pointers.  These are a category of types that standard languages don't permit, but which are very useful. With 1980s compiler algorithms, they can be implemented efficiently.

Using pointers, the programmer must allocate and recapture memory, which is low-level.

In all standard languages, for local variables (which are kept on the stack) and for intermediate storage used in arithmetic subexpressions, memory is managed automatically.

Generally, a recursive type is one where the type is defined in terms of itself, with one or more non-circular base cases.  For example:

        mytree = int + (binarytree * int * binarytree)

Any value of this type can be drawn as a binary tree where each leaf is an integer and each internal node is a record containing two children and aan integer.  For example:

   4
  / \
  2  5
    / \
   5   3
Notice that the definition of mytree is circular, but meaningful, because it has at least one base case: a leaf node which contains an integer but does not contain a tree recursively.

Definition: A recursive definition is a circular definition that is meaningful because it has a non-circular base case.
 
 

RECURSIVE TYPES IN ML

In ML recursion is allowed with the datatype notation for tagged unions. So we can write

        datatype tree = leaf of int | node of tree * int * tree;

Consider this tree with two internal nodes and three leaves:

  4
 / \
2   5
   / \
   1 3
It would be written node (node (leaf 1, 2, leaf 3), 4, leaf 5) in ML.  Remember that in ML parentheses are used to indicate tuples, not to indicate function arguments.
 
 

RECURSIVE TYPES IN TRADITIONAL LANGUAGES

In C and Pascal and other similar languages, the programmer must implement a recursive type using pointers.  For example lists of integers can be defined in Pascal as follows:
type
    listchoice = (nil, cons);

    intlistptr = ^ intlistrecord;

    intlistrecord =
        record
            case flag: listchoice of
                nil: ();
                cons: (first: integer; rest: intlistptr)
        end;

This is basically how lists are implemented in ML, Lisp, and Prolog also. The difference is that the compiler generates code like the above.  This is one justification for calling those languages "higher-level".

The advantages of using explicit pointers are:

The advantages of using recursive types are: Java hides pointers from the programmer, but not completely, like ML does.  It's not clear why Java doesn't go all the way.
 
 



Copyright (c) by Charles Elkan, 2002.