Remember, a type is a set of values. In mathematics, one of the most basic ways of making a new set is to take the union of two old sets.
In programming, different types have different legal operations. When we make a union, we need to still be able to identify which operations are legal on an element of the union.
Each member of the union type must be "tagged" to indicate what type it came from. In Pascal:
Now we can use a switch command to do different operations on Canadian and American zipcodes. Note that the switch command is semantically different from the case notation for a union type. It's unfortunate that the same keyword case is used for both concepts.type
country = (canada, usa);
zipcode =
record
case where: country of
canada: (czip: string);
usa : (azip: number)
end;
Many programming languages like Pascal have one notation that combines unions and products. C has separate notations using the keywords union and struct.
Tagged unions are also called disjoint unions. The idea is that no value of type zipcode can have a czip component and also an azip component. It's one or the other.
In Pascal and C unions are not required to be tagged. This is a notorious
source of runtime errors.
For example, here is a type that is the tagged union of int and real:
Here is a variable named num whose type is mynumber:datatype mynumber = exact of int | approx of real;
Here is a variable named trunc that is initialized to a value determined by a case operation that looks at the tag of num:val num = approx 2.5;
The righthand side of the = above is an expression, not a command. However it is similar to a switch command in C. Each line in the ML case expression has a pattern such as exact i on the lefthand side of the => symbol.val trunc = case num of exact i => i | approx x => floor(x+0.5)
Using pointers, the programmer must allocate and recapture memory, which is low-level.
In all standard languages, for local variables (which are kept on the stack) and for intermediate storage used in arithmetic subexpressions, memory is managed automatically.
Generally, a recursive type is one where the type is defined in terms of itself, with one or more non-circular base cases. For example:
mytree = int + (binarytree * int * binarytree)
Any value of this type can be drawn as a binary tree where each leaf is an integer and each internal node is a record containing two children and aan integer. For example:
Notice that the definition of mytree is circular, but meaningful, because it has at least one base case: a leaf node which contains an integer but does not contain a tree recursively.4
/ \
2 5
/ \
5 3
Definition: A recursive definition is a circular definition that
is meaningful because it has a non-circular base case.
datatype tree = leaf of int | node of tree * int * tree;
Consider this tree with two internal nodes and three leaves:
It would be written node (node (leaf 1, 2, leaf 3), 4, leaf 5) in ML. Remember that in ML parentheses are used to indicate tuples, not to indicate function arguments.4
/ \
2 5
/ \
1 3
This is basically how lists are implemented in ML, Lisp, and Prolog also. The difference is that the compiler generates code like the above. This is one justification for calling those languages "higher-level".type
listchoice = (nil, cons);intlistptr = ^ intlistrecord;
intlistrecord =
record
case flag: listchoice of
nil: ();
cons: (first: integer; rest: intlistptr)
end;
The advantages of using explicit pointers are: