CSE130 LECTURE NOTES

April 3, 2002
 
 

ADMINISTRATION

We'll be using web-based software at gradesource.com to keep track of scores.  Email Greg Hamerly if you haven't received a "secret number" to access your scores.

In section on Friday, Greg will talk about the syntax of C and Pascal in some simple programs.  He'll go over the tools for compiling under Linux and Solaris (ieng9), and discuss using function pointers in general terms. This section will be useful for doing the first assignment.
 
 

VALUES AND TYPES

These are two of the most basic concepts to understand in understanding programming languages. They are the source of some of the most important similarities and differences between languages.

A value is a piece of data that is involved in computing.  A type is a collection of values on which the same operations make sense.

For example:

What we call a type depends on the level of abstraction that we adopt.  At a high level, e.g. in mathematics, "number" is a type.  Common PLs are lower-level and distinguish integers from reals, because computer hardware provides very different representations and operations for integers and reals. In other words, this low-level feature is propagated to high-level programming languages.

Example of different operations on integers and reals:

At a very low level, all values are strings of bits, i.e. the types available are nibble, bytes, words, longwords, etc.  PLs provide abstractions on top of this level.
 
 

PRIMITIVE VALUES AND THEIR TYPES

A  value is one that (conceptually) is not made up from other values. The opposite is a composite value.  For example, characters are primitive values, whereas strings are composite in most languages.  But in Cobol, strings (of fixed length) are considered primitive, and in Snobol, strings of arbitrary length.

A built-in type (or operation) is one that is supplied by the language.  The opposite of built-in is programmer-defined.  Typically, integers, reals, and characters are built-in, and maybe booleans and strings.  For types that are not built-in, a PL often has a standard way to make them programmer-defined.

Different languages often use different names for the same primitive type. This is just a "syntactic" difference.

Syntactic differences can show up in the names of built-in operations on primitive types, e.g. mod versus rem.

Sometimes there are minor semantic differences, e.g. maybe -7 mod 3 = 2 but -7 rem 3 = -1.

An implementational issue is a semantic question that is left unspecified by the definition of a PL.  Implementation issues are choices that different compilers or interpreters can make differently.  The most troublesome PL differences are often implementational.  For example, are floating point numbers 32 bit or 64 bit?

Implementational differences can be very troublesome because they are not well-documented (like syntactic differences) and they do not have work-arounds (like documented semantic differences).

Ideally, nothing in the specification of a language should be implementation-dependent.
 
 

ENUMERATION TYPES

Are there any primitive types that are not built-in? Yes, user-defined primitive types. These are called enumeration types.

For example in Pascal you can write type direction = {n, e, s, w}

Note that n, e, s, and w are not characters. They are identifiers (i.e. names) which introduce brand-new values that previously did not exist in the language.

Remember, a type is a set of values on which the same operations make sense. Which operations are provided for values of type direction?  Pascal provides the built-in functions succ, pred, and ord. For example:

        succ(s) = w
        pred(s) = e
         ord(s) = 2

Using enumeration types makes programs clearer, and helps to catch errors at compile-time.  For example writing s*w by mistake gives a compiler error message.

In earlier versions of C, enumeration types are not available. Instead, it's good style to write

        #define n 0
        #define e 1
        #define s 2
        #define w 3

However this doesn't allow catching errors at compile-time. Conclusion: it's good for a language to be designed so that the compiler can enforce type correctness rules.

Enumeration types don't substitute for the programmer's good sense.  He or she must still choose to only use the values of the type in sensible ways, for example to enumerate directions in a sensible order such as clockwise.  With an "abstract data type" (we'll study these later), the programmer defining the type can specify what the type-specific sensible operations are, and can disallow all other operations.
 
 

COMPOSITE TYPES

A composite (also called compound) value is one that is made up from other values.  For example a record in Pascal is a composite value:
type point = record x, y: real end;
There are big differences between programming languages in how a user can define his own types of compound values. Many differences are only syntactic (Pascal records are called structures in C) but still, we need some concepts that let us understand the commonalities.

Our aim is to classify the wide variety of compound types. We will do this by looking at the mathematical ways of constructing sets of compound values.  In ordinary mathematics we know about intersections, unions, and Cartesian products.  For programming we need some slightly more complex concepts.
 
 

CARTESIAN PRODUCTS

A product type is one whose set of values is a mathematical Cartesian product, i.e. a set of tuples.  A tuple is a vector of components, where the order of the components is significant, and the components may have the same or different types.

Mathematically, the Cartesian product of sets A and B is the set of all possible ordered pairs <x,y> where the first component x of a pair is from A and the second component y is from B.  Formally, the Cartesian product is written A x B and defined to be the set  { <x,y> : x in A, y in B }.

If A has n elements, i.e. its cardinality is n, and the cardinality of B is n, then the cardinality of Ax B is mn.  Formally, |A x B| = |A| |B|.

Records in Pascal or structures in C are Cartesian product types.  In records and structures components are given names.  The benefit of these names is mostly documentation, since the components could be referred to by their numerical position: first, second, third, etc.

In ML product types are defined without giving names to components, in a more mathematical way.  For example:

type person = string * string * int * real;
Note that a tuple is the same as a row in a relational database, where components are often called "fields" or "attributes".
 
 



Copyright (c) by Charles Elkan, 2002.