CSE 150 LECTURE NOTES

Thursday January 22, 2004
 

ANNOUNCEMENTS 

The first assignment is due today.  Here is the second assignment.

The calendar of all assignment deadlines is on the CSE 150 web site now.  The midterm will be in class on Tuesday February 17.  The final will be on Thursday March 18.

 

THE DAVIS-PUTNAM-LOVELAND-LOGEMANN PROCEDURE

DPLL is an algorithm for propositional satisfiability.  This is the fastest known algorithm for satisfiability testing that is not just sound, but also complete, unlike all randomized local search methods.  A version was first published in 1960 by Martin Davis and Hilary Putnam and the algorithm is often called the Davis-Putnam method.  However the method in its present form is really due to Davis, G. Logemann, and Donald Loveland in 1962.

Note that the description of the DPLL algorithm here is rather different from the description in the book.  The presentation here, and the heuristic below for variable selection, are more modern than thn what  is in the book.

DPLL is depth-first search with backtracking and unit propagation. 

The algoritm starts with a set of clauses, gradually instantiates variables, and gradually simplifies the set of clauses simultaneously.  The stopping cases are:

Every time we instantiate a variable, we can simplify the set of clauses.  For each clause, if one of its variables is instantiated so tha the clause is true, then the clause can be deleted from the set of clauses.  If one its variables is instantiated to the other value, then the clause can be simplified to be a shorter clause with just the remaining literals.

Example: If we instantiate y = F, then the clause set {x or not y or z, u or v or y, u or not x or not z} simplifies to {u or v, u or not x or not z}.  


UNIT PROPAGATION

Definitions: A literalis a variable or a negated variable.  A unit clause is one that consists of a single literal.

When we have a unit clause, then the correct value of the variable in it is obvious.  We can apply this value to all other clauses containing the same variable.  Specifically:

Performing unit propagation with x, i.e. with x = T, causes further unit propagation for all binary clauses of the pattern (not x) or y.

Note that unit propagation always terminates with clauses of size 2 or larger.  If we ever get a clause of size 0, then the set of clauses is unsatisfiable.  If we get a clause of size 1, then unit propagation continues.

With good data structures, we can implement unit propagation to take time linear in the size of the input set of clauses.
 
 

THE DPLL ALGORITHM

The input to the algorithm is a set of clauses.

1.  If the clause set is empty, return "success"
2.  If the clause set contains an empty clause, then return "failure"
3.  If the clause set contains a unit clause (i.e. a literal)

4.  Otherwise: choose a variable u heuristically

Above, DPLL(clauses[u:=T]) means call DPLL recursively with variable u instantiated to have value true.  Note that this is a recursive implementation of depth-first search, with unit propagation as forward-checking.  


A HEURISTIC FOR VARIABLE SELECTION IN DPLL

Because each variable has a domain of size exactly two, we cannot directly apply the "most constrained" heuristic for ordering variables.

A good approach is to choose a variable that causes the greatest amount of simplification, because this will lead the algorithm to failure or success as quickly as possible. 

We measure simplification by counting how many binary clauses remain after unit propagation.  Remaining binary clauses are good because then future applications of unit propagation will cause a cascade of simplifications.

Let pc(x) be the number of binary clauses remaining after doing unit propagation with x=T.  Similarly, let nc(x) be the number of binary clauses remaining after doing unit propagation with x=F.  We want to choose a variable x such that pc(x) and nc(x) are large.

Most likely the variable we choose will lead to failure, so we will need to try both values of it.  Therefore it is beneficial to choose a variable that causes a lot of simplification for both assignments to it.  One widely used heuristic is to maximize

score(x) = 1000*pc(x)*nc(x) + pc(x) + nc(x)
Question:  What is the effect of the very large constant 1000?  Answer:  Compare pc(x) = 12 and nc(x) = 10 with pc(y) = nc(y) = 11.  If we just added pc and nc, then x and y would look equally good.  But really y is better, because we need to continue search for both values of the variable we choose.  Multiplying pc and nc gives 120 for x and 121 for y, so score(y) is higher.

Given which variable should be instantiated next, no useful heuristic is known for which value of that variable should be investigated first. 

For difficult 3SAT problems with clause to variable ratio around 4.26, the running time of a good implementation of the DPLL method is empirically O(2v / 19.5) where v is the number of variables.  This is exponential but with a slowly growing exponent.  Problems with 400 or more variables are not solvable in reasonable time.


LOCAL SEARCH

Now we start investigating a major new topic, algorithms that do not directly attempt to find globally optimal solutions to search problems.

A*, backtracking search for CSPs, and all the other search methods we have seen so far are systematic and global.  With A* for example, at each step, we expand the node that appears best among all previously discovered nodes.

But in general, global algorithms have infeasible space complexity for large problems.  One response is to consider iterative deepening variants, and indeed memory-limited A* methods exist--see the book.

Another response is to consider search methods that ignore global information.  The general framework is:

choose a random initial state
while not termination-condition do
    expand current state to find its children
    evaluate the h value of each child state
    let new current state be child with lowest h value  
Here h is the heuristic value of a state, similar to h as used for A*.  The goal is to find a state with h = 0.

When discussing local search, children found by expansion are typically called neighbors.  The set of all neighbors is called the moveset.  Ties are typically broken randomly.

The termination condition can be any of the following: a goal state is found (h = 0), no state in the moveset has a strictly better h value, or a time limit has been reached.
 
 



Copyright (c) by Charles Elkan, 2004.