CSE 150 LECTURE NOTES
Tuesday January 13, 2004
ANNOUNCEMENTS
Our discussion board is up and running at http://webboard.ucsd.edu/.
Your login is your UCSD email username, and your password is your
PID number.
A* HEURISTIC SEARCH
The A* algorithm is
fringe := make-queue(initial-node)
loop
if empty?(fringe) then return FAIL
else
n := remove-front(fringe)
if isa-goal?(state(n)) then return n
else
fringe :=
sorted-insert(fringe,eval(expand(n)))
end loop
This variety of best-first search is called A* for historical reasons.
The idea is that if we want to find the optimal path to the goal,
i.e. the shortest or cheapest path, then the next node to explore should
be the one that looks like it will yield the overall cheapest
path.
The best estimate of the total cost of the path through a candidate
node n is f(n) = g(n) + h(n) where h is the same estimator function of
remaining distance as above and g(n) = cost of path from the
initial state to state(n).
The heuristic scoring function h is defined as follows:
h(n) = estimated cost of the cheapest path from the
state represented by the node n to a goal state
The names f, g, h, and n are standard, so you should remember to use
this notation.
Remember we assume that (1) the h function is admissible,
and that (2) the f = g + h function is monotonic. In
words, h must be conservative, i.e. it must be an under-estimator.
THE IMPORTANT THEOREM ABOUT A*
Theorem:
(a) Optimality: the first path to a goal found by A* is an
optimal path to a goal.
(b) Completeness: if a path to a goal exists, then A* will find it.
(c) Efficiency: no algorithm that is optimal and complete and has
only g and h available as knowledge will always expand fewer nodes
than A*.
Proof of optimality: See previous lecture.
Proof of completeness: Suppose there are a finite number of nodes
with f value less than the optimal cost c*. Then A* will
eventually expand all these nodes with lower f value and arrive at a
solution. Can there be infinitely many such nodes? No, if
(a) all nodes have finite branching factor and (b) the cost of each edge
is greater than some epsilon > 0.
Proof of efficiency (idea only!): Consider an algorithm that uses
the same information, i.e. the g and h functions, but does not
visit all nodes with sub-optimal f value. Then we can construct an
adversary who chooses the true costs so that one of the unvisited nodes
is actually the optimal solution.
REMARKS ON A*
An "online" exploration algorithm is one that guides searching and
acting simultaneously. A* and related search algorithms are
"offline" since the agent uses search to "imagine" an optimal path, and
only afterwards executes the optimal path.
A* is not fully offline: the agent cannot explicitly "see" in
a global way the entire state space. Rather, the state space
becomes revealed to the agent one state at a time as the search
algorithm proceeds. The initial state is known explicitly. A
complete map is available implicitly; running the search algorithm makes
this map explicit.
A* is a global search algorithm: the next node expanded is the
one that looks best among all seen so far. We'll discuss
local search methods later.
A* expands all nodes with f(n) = g(n) + h(n) < c* and some nodes
with f(n) = c*, where c* is the minimum cost of any goal node. For
a given search problem, g(n) and c* are fixed. Everything else
being fixed, if h(n) is bigger then A* explores fewer nodes. In
the limit where h(n) is perfect, A* explores only nodes along optimal
path(s). (If there is more than one such path, we would like to
explore only one such path, but A* would need a tie-breaker for choosing
which node to remove from the fringe in order to guarantee this.)
The proof of optimality depends on the heuristic function h being
admissible, but not on monotonicity. The function f(n) is monotonic
if f never decreases along any path from the root. Combined with
admissibility, this assumption says that f = g+h does not become less
accurate as an estimate of the total cost as you move towards a goal.
Let's draw the state space that A* explores, and let's draw contours
of equal value of f(n). If the function f is monotonic, then the
contours are properly nested.
What is called optimality here is an "offline" property, while
efficiency is an "online" property. (Note on terminology:
Optimality is sometimes called admissibility, in which case efficiency
is sometimes called optimality.)
Like all variants of the general search algorithm, A* may explore
multiple nodes that correspond to the same real-world state. Some
variants of the general search algorithm use a data structure called the closed
list to remember already-visited nodes. Each node discovered
through expansion is then compared against all nodes on the closed list
to detect whether the new node corresponds to an already-visited
state. Using this terminology, the fringe is called the open
list.
FINDING HEURISTIC FUNCTIONS
It is common in AI that a heuristic idea is good but proving that it is
good is difficult, precisely because heuristics are not guaranteed.
Here is an important general principle for studying a
heuristic: look for assumptions under which the heuristic is
guaranteed to be correct.
The practical usefulness of A* depends entirely on having a good
heuristic function. Consider the 8 puzzle and two heuristic
functions:
- h1 = number of misplaced tiles
- h2 = sum over all tiles of distance of each tile from goal
position
Both h1 and h2 always underestimate the number of tiles that must be
moved to reach a solution, so they are both admissible. (Each move
affects only one tile, so summing over each tile considered separately
is legitimate for h2.)
For all states of the 8 puzzle, h1(n) <= h2(n). Hence h2 is
a better estimator than h1. See Table 4.8 for statistics on the
effectiveness of these heuristics.
If one cannot guarantee that a heuristic function h is always an
under-estimator, then A* is not guaranteed to find a shortest
path. However it can be proved that if h is "close to" an
under-estimator, then A* finds a path that is "close to" being shortest.
NOTES ON NOT EXPANDING THE SAME STATE REPEATEDLY
This issue is often called "eliminating duplicates." We did not
discuss it in class.
Definition: A duplicate is a
node that contains the same state as some previously generated node.
It is not enough to look for duplicates in the "fringe". We also
must check for duplicates among nodes already expanded, hence removed
from the fringe; otherwise we miss some duplicates. Suppose state
s1 leads to s2 leads to s1. The second time we create a node with
state s1, the first node with state s1 will no longer be in the
fringe. So if we check for duplicates inside the fringe only, we
won't realize that the new node with s1 is a duplicate.
The nodes that have already been expanded (hence removed from the
fringe) are often called the "closed" list. The fringe is then
called the "open" list; see Section 3.5 of the book. Using the
"open" and "closed" lists gives a variant of the "general search
algorithm" called "graph-search." The last paragraph before
"Memory-bounded heuristic search" on page 101 indicates that A* means
"A* with graph-search" for the authors. Hence, Figure 4.8 does
use duplicate elimination, it seems.
When we eliminate duplicates, the question arises of what g and h
values to use for the node we keep, because the duplicates may have
different g and/or h values. This issue is discussed on page 99
to motivate the concept of monotonicity.
Let m and n be duplicate nodes, i.e. both containing the same state
s. We might as well take the best path from the initial state to
s, so we should take whatever g-value is smaller. Assume that
h(n) is a deterministic function of states. (This is the usual
case.) Then h(n) will be the same for all nodes representing the
same state; these nodes can differ in g(n) only.
With monotonicity, a newly generated node must have f-value >= its
parent's f-value. Nodes are taken from the fringe in f-value
order, so the generated node must have f-value >= f-value of every
node in the "closed" list. So it must have g-value >= g-value
of each of its
duplicates in the closed list. Hence the best g-value for any
state is the g-value obtained the first time a node for the state is
created.
Therefore, if an old node n with state s is in the closed list, and a
new node with state s is created, the new node can be discarded.
If the old node is in the open list (i.e. the fringe) then either the
old or the new node may have the best g-value.
The book says that A* runs out of memory before it runs out of time on
page 101. Suppose the average branching factor is 1+x where x
>= 1 always. Then for every 1000 nodes removed from the
fringe, 1000+1000*x are added. This means that memory use
increases by 1000*x, so the size of the fringe increases linearly with
time, with no maximum. Important note: the size of the closed
list is linearly proportional to the size of the open list, so the
closed list does not increase big-O memory usage.
The book does not suggest any tie-breaking policy for nodes that have
same f-value. The policy "when f-values are equal, expand the
node with larger g-value first" is sensible. Larger g-value means
smaller h-value, so this can be implemented easily when g-values are
integers under 100 (e.g. for the 8-puzzle) as f(n) = g(n) +
1.01*h(n). Of course we still need a different tie-breaker for
when g and h are both equal.
Copyright (c) by Charles Elkan, 2004.