CSE130 LECTURE NOTES

October 20, 1999
 
 

ADMINISTRATION

The first project is due Friday between 5pm and 6pm; see the course description for exact submission instructions.

Remember to fill-out and attach the team self-evaluation questionnaire.

We will distribute the description of the second project Wednesday this week.
 
 

SYNTAX IN GENERAL

Syntax as a field of study in (programming) linguistics is about the appearance and structure of expressions.  (We'll use the word expression to include any self-contained fragment of a program, for example a subroutine definition.)

A grammar specifies (1) conceptual structure and (2) lexical detail.  We are interested in understanding and representing (1), which is called "abstract" syntax because it leaves out the irrelevant details (2).

There can be many paths to meaning, and there can be many opinions on how expressions are structured, even if there is agreement on which expressions are legal.

There is less disagreement for programming languages than for natural languages, because PLs are designed, and they have unique meanings.

The structure of an expression is (almost always) a tree. The key structural relationship is what is a subexpression, i.e. child, of what.  Usually the number of children is fixed and their order is significant, but sometimes subexpressions may be grouped in sets.

Parsing follows a grammar to check and discard the lexical detail of an expression, while discovering its abstract syntactic structure.
 
 

AN ML datatype IS AN ABSTRACT SYNTAX NOTATION

There are many ways to notate abstract syntax. The most important are tree diagrams, and fully parenthesized expressions.

Consider for example typical syntax for simple arithmetic expressions, including variables.  We can represent the set of all these expressions as a recursive type, and therefore as an ML datatype:

datatype expression =
    Constant of int |
    Plus of expression * expression |
    Minus of expression * expression |
    Times of expression * expression;
Remember that you can declare a pointer-based type in C with exactly the same conceptual organization.

With the data structure defined above, it is easy to write a recursive function that evaluates expressions:

fun eval (Constant n) = n
  | eval (Plus (exp1,exp2) )  = (eval exp1 env) + (eval exp2 env)
  | eval (Minus (exp1,exp2) ) = (eval exp1 env) - (eval exp2 env)
  | eval (Times (exp1,exp2) ) = (eval exp1 env) * (eval exp2 env)
 
Notice how this function follows the structure of the expression datatype exactly.
 
 

USES OF ABSTRACT SYNTAX

(1) One use is to have a notation with details removed for use in compilation, interpretation, etc.

Remember the dragon book on compiler design. The dragon is "complexity of compiler design". The knight who vanquishes the dragon carries the emblem "syntax directed translation".  This means first recovering abstract syntax, then doing code generation from the abstract syntax tree.

(2) Another use is as a common "interlingua" for multiple concrete syntaxes. Note that the concrete syntax of some programming languages is not strings.

Consider a web page creation program. The same functionality (i.e. semantics) can be controlled by many interfaces:

  • a compiled language (e.g. TeX)
  • an interpreted language (e.g. HTML)
  • an interactive command line (e.g. vi)
  • a graphical user interfaces (GUI) with
  • menu choices
  • direct manipulation
  • a subroutine library, i.e. an API (application programming interface)
  • Microsoft Bob
  • (3) Abstract syntax makes it possible to systematically consider all possible expressions in a given sublanguage, and therefore to prove that a property is true of all these expressions.

    For example one can use the recursive definition above to prove a property of all simple arithmetic expressions (i.e. of expression) by induction.
     
     

    EXPRESSIONS, DEFINITIONS, AND COMMANDS

    Traditional programming languages have three big categories of program fragments: declarations, expressions, and commands.

    For example x[2*j+1] := sqrt(u+v) is a command with two expressions embedded in it: 2*j+1 and sqrt(u+v)

    Definitions:

    Expressions involve operators, constants (which are also called literals), and variables.

    Variables are created and initialized by declarations.  More formally, the effect of a declaration is to produce one or more bindings between an identifier and a value.
     
     



    Copyright (c) by Charles Elkan, 1999.