A value is a piece of data that is involved in computing. A type is a collection of values on which the same operations make sense.
For example:
Example of different operations on integers and reals:
A built-in type (or operation) is one that is supplied by the language. The opposite of built-in is programmer-defined. Typically, integers, reals, and characters are built-in, and maybe booleans and strings. For types that are not built-in, a PL often has a standard way to make them programmer-defined.
Different languages often use different names for the same primitive type. This is just a "syntactic" difference.
Sometimes there are minor semantic differences, e.g. maybe -7 mod 3 = 2 but -7 rem 3 = -1.
An implementational issue is a semantic question that is left unspecified by the definition of a PL. Implementation issues are choices that different compilers or interpreters can make differently. The most troublesome PL differences are often implementational. For example, are floating point numbers 32 bit or 64 bit?
Implementational differences can be very troublesome because they are not well-documented (like syntactic differences) and they do not have work-arounds (like documented semantic differences).
Ideally, nothing in the specification of a language should be implementation-dependent.
For example in Pascal you can write type direction = {n, e, s, w}
Note that n, e, s, and w are not characters. They are identifiers (i.e. names) which introduce brand-new values that previously did not exist in the language.
Remember, a type is a set of values on which the same operations make sense. Which operations are provided for values of type direction? Pascal provides the built-in functions succ, pred, and ord. For example:
succ(s) = w
pred(s) = e
ord(s) = 2
Using enumeration types makes programs clearer, and helps to catch errors at compile-time. For example writing s*w by mistake gives a compiler error message.
In earlier versions of C, enumeration types are not available. Instead, it's good style to write
#define n 0
#define e 1
#define s 2
#define w 3
However this doesn't allow catching errors at compile-time. Conclusion: it's good for a language to be designed so that the compiler can enforce type correctness rules.
Enumeration types don't substitute for the programmer's good sense.
He or she must still choose to only use the values of the type in sensible
ways, for example to enumerate directions in a sensible order such as clockwise.
With an "abstract data type" (we'll study these later), the programmer
defining the type can specify what the type-specific sensible operations
are, and can disallow all other operations.
There are big differences between programming languages in how a user can define his own types of compound values. Many differences are only syntactic (Pascal records are called structures in C) but still, we need some concepts that let us understand the commonalities.type point = record x, y: real end;
Our aim is to classify the wide variety of compound types. We will do
this by looking at the mathematical ways of constructing sets of compound
values. In ordinary mathematics we know about intersections, unions,
and Cartesian products. For programming we need some slightly more
complex concepts.
Mathematically, the Cartesian product of sets A and B is the set of all possible ordered pairs <x,y> where the first component x of a pair is from A and the second component y is from B. Formally, the Cartesian product is written A x B and defined to be the set { <x,y> : x in A, y in B }.
If A has n elements, i.e. its cardinality is n, and the cardinality of B is n, then the cardinality of A x B is mn. Formally, |A x B| = |A| |B|.
Records in Pascal or structures in C are Cartesian product types. In records and structures components are given names. The benefit of these names is mostly documentation, since the components could be referred to by their numerical position: first, second, third, etc.
In ML product types are defined without giving names to components, in a more mathematical way. For example:
Note that a tuple is the same as a row in a relational database, where components are often called "fields" or "attributes".type person = string * string * int * real;
What is the set of values of the type "field"? Given a value of type direction, the array can return a value of type real. The array is a mapping from type "direction" to type "real".type direction = (n, e, s, w);
field = array [direction] of real;
Mathematically, a mapping from set A to set B is a set of ordered pairs where
(1) the first item in each pair is from A and the second item
is from B
(2) for each item x in A, there is a unique y
in B such that the pair <x,y> is in the mapping.
The mathematical notation for an ordered pair is <x,y> using "angle brackets" or (x,y) using parentheses. The mathematical notation for the set of all mappings from one set to another is ->. For example, cosine is a member of R -> R where R is the set of real numbers.
The cardinality of A -> B is |B| to the power |A|. Intuitively, to specify a single mapping we must make |B| choices, |A| times.
Mappings are often called functions. The set on the left of the arrow
(the "from" set) is called the domain and the set on the right is
called the range. Note that the domain of a mapping need not
be a primitive type. For arrays the domain type is called the index
type and is often a tuple.
Be sure to understand the difference between a single mapping, such
as cosine, and the set of all mappings of the same type,
such as R -> R.
Consider the mapping type string -> integer. A variable of this type might be convenient for storing student scores on a quiz, for example. No standard language allows this type. However some languages that are biased towards ease-of-use rather than efficiency do, for example the "awk" language.
Data structures question: how is a variable of type string -> integer
implemented? Answer: maybe a labeled binary tree sorted in "in-order",
or by hashing.