CSE 150 LECTURE NOTES
Thursday January 22, 2004
ANNOUNCEMENTS
The first assignment is due today. Here is the second assignment.
The calendar of all assignment deadlines is on the CSE
150 web site now. The midterm will be in class on Tuesday
February 17. The final will be on Thursday March 18.
THE DAVIS-PUTNAM-LOVELAND-LOGEMANN PROCEDURE
DPLL is an algorithm for propositional satisfiability. This is
the fastest known algorithm for satisfiability testing that is not just
sound, but also complete, unlike all randomized local search
methods. A version was first published in 1960 by Martin Davis and Hilary Putnam
and the algorithm is often called the Davis-Putnam method. However
the method in its present form is really due to Davis, G. Logemann, and Donald Loveland in 1962.
Note that the description of the DPLL algorithm here is rather
different from the description in the book. The presentation here,
and the heuristic below for variable selection, are more modern than thn
what is in the book.
DPLL is depth-first search with backtracking and unit
propagation.
The algoritm starts with a set of clauses, gradually instantiates
variables, and gradually simplifies the set of clauses simultaneously.
The stopping cases are:
- An empty set of clauses is
satisfiable--in this case, the entire algorithm terminates with
"success."
- A set containing an empty clause is not satisfiable--in
this case, the algorithm backtracks and tries a different value for an
instantiated variable.
Every time we instantiate a variable, we can simplify the set of
clauses. For each clause, if one of its variables is instantiated
so tha the clause is true, then the clause can be deleted from the set
of clauses. If one its variables is instantiated to the other
value, then the clause can be simplified to be a shorter clause with
just the remaining literals.
Example: If we instantiate y
= F, then the clause set {x
or not y or z, u or v or y, u or not x or not z} simplifies to {u or v, u or not x or not z}.
UNIT PROPAGATION
Definitions: A literalis
a variable or a negated variable. A unit clause is
one that consists of a single literal.
When we have a unit clause, then the correct value of the variable
in it is obvious. We can apply this value to all other clauses
containing the same variable. Specifically:
- If a clause consists of a variable alone, say x, then any
other clause containing not x can be simplified to consist
of just its other disjuncts.
- If a clause consists of a negated variable alone, say not x,
then any other clause containing x can be simplified to consist
of just its other disjuncts.
Performing unit propagation with x, i.e. with x = T,
causes further unit propagation for all binary clauses of the pattern (not
x) or y.
Note that unit propagation always terminates with clauses of size 2 or
larger. If we ever get a clause of size 0, then the set of clauses
is unsatisfiable. If we get a clause of size 1, then unit
propagation continues.
With good data structures, we can implement unit propagation to take
time linear in the size of the input set of clauses.
THE DPLL ALGORITHM
The input to the algorithm is a set of clauses.
1. If the clause set is empty,
return "success"
2. If the clause set contains an empty clause, then return
"failure"
3. If the clause set contains a unit clause (i.e. a literal)
4. Otherwise: choose a variable u
heuristically
4.a. If DPLL(clauses[u=T])
succeeds, return "success"
4.b. Otherwise: return the result of DPLL(clauses[u=F])
Above, DPLL(clauses[u:=T])
means call DPLL recursively with variable u instantiated to have value
true. Note that this is a recursive implementation of depth-first
search, with unit propagation as forward-checking.
A HEURISTIC FOR VARIABLE SELECTION IN DPLL
Because each variable has a domain of size exactly two, we cannot
directly apply the "most constrained" heuristic for ordering variables.
A good approach is to choose a variable that causes the greatest
amount of simplification, because this will lead the algorithm to
failure or success as quickly as possible.
We measure simplification by counting how many binary clauses remain
after unit propagation. Remaining binary clauses are good because
then future applications of unit propagation will cause a cascade of
simplifications.
Let pc(x) be the number of binary clauses remaining after
doing unit propagation with x=T. Similarly, let nc(x) be the
number of binary clauses remaining after doing unit propagation with
x=F. We want to choose a variable x such that pc(x) and nc(x) are
large.
Most likely the variable we choose will lead to failure, so we will
need to try both values of it. Therefore it is beneficial to
choose a variable that causes a lot of simplification for both assignments
to it. One widely used heuristic is to maximize
score(x) = 1000*pc(x)*nc(x) + pc(x) + nc(x)
Question: What is the effect of the very large constant 1000?
Answer: Compare pc(x) = 12 and nc(x) = 10 with pc(y) = nc(y)
= 11. If we just added pc and nc, then x and y would look equally
good. But really y is better, because we need to continue search
for both values of the variable we choose. Multiplying pc and nc
gives 120 for x and 121 for y, so score(y) is higher.
Given which variable should be instantiated next, no useful heuristic
is known for which value of that variable should be investigated
first.
For difficult 3SAT problems with clause to variable ratio around 4.26,
the running time of a good implementation of the DPLL method is
empirically O(2v / 19.5) where v is the number of variables.
This is exponential but with a slowly growing exponent.
Problems with 400 or more variables are not solvable in reasonable
time.
LOCAL SEARCH
Now we start investigating a major new topic, algorithms that do not
directly attempt to find globally optimal solutions to search problems.
A*, backtracking search for CSPs, and all the other search methods
we have seen so far are systematic and global. With A* for
example, at each step, we expand the node that appears best among all
previously discovered nodes.
But in general, global algorithms have infeasible space complexity
for large problems. One response is to consider iterative
deepening variants, and indeed memory-limited A* methods exist--see the
book.
Another response is to consider search methods that ignore global
information. The general framework is:
choose a random initial state
while not termination-condition do
expand current state to find its children
evaluate the h value of each child state
let new current state be child with lowest h value
Here h is the heuristic
value of a state, similar to h as
used for A*. The goal is to find a state with h = 0.
When discussing local search, children found by expansion are typically
called neighbors. The set of all neighbors is called the moveset.
Ties are typically broken randomly.
The termination condition can be any of the following: a goal state
is found (h = 0), no state in the moveset has a strictly better h
value, or a time limit has been reached.
Copyright (c) by Charles Elkan, 2004.