CSE 150 LECTURE NOTES

Thursday January 8, 2004
 

ANNOUNCEMENTS 

We're handing out the first assignment today. It's due in two weeks.  You may work by yourself, or with one partner.

Our discussion board is up and running at http://webboard.ucsd.edu/.  Your login is your UCSD email username, and your password is your PID number.


ANALYZING VARIANTS OF THE GENERAL SEARCH ALGORITHM

We can evaluate alternative algorithms according to: For simplicity, we will often consider a search tree where expanding a node always gives exactly b children, where the "branching factor" b >= 2.

Assuming this fixed branching factor,  the number of nodes at depth n or less is N = 1 + b + b^2 + ... + b^n.

Somewhat surprisingly, N = (b^(n+1) - 1)/(b-1) which is O(b^n).  Intuitively, almost all the nodes are at the deepest level, and the number of shallower nodes is negligible.

For a problem with branching factor b where the first solution is at depth k, the time complexity of BFS is O(b^k), and its space complexity is also O(b^k).

The big problem with breadth-first search is that it uses about as much space as it uses time.  Any real computer will run out of space before it runs out of time, if the amount of space used is linearly proportional to the amount of time used.  See the table on page ? for a numerical example.
 
 

ITERATIVE DEEPENING

The major advantage of DFS is that it only uses O(bm) space if the branching factor is b and the maximum depth is m.  Explanation: there are m nodes on the longest path, and for each of these b-1 siblings must be stored.

If a solution exists at depth M or less, then M-limited DFS will find it, and this algorithm is very space-efficient, since it uses only O(b*M) space.

If a solution exists but only at depth greater than M, then depth-limited DFS is not complete.

Iterative deepening is a very simple, very good, but counter-intuitive idea that was not discovered until the mid 1970s.  Then it was invented by many people simultaneously.

The idea is to depth-limited DFS repeatedly, with an increasing depth limit, until a solution is found.

Intuitively, this is a dubious idea because each repetition of depth-limited DFS will repeat uselessly all the work done by previous repetitions.  But, this useless repetition is not significant because a branching factor b >= 2 implies that

 number of nodes at depth k > number of nodes at depth k-1 or less
Therefore, iterative deepening simulates BFS with linear space complexity.

For a problem with branching factor b where the first solution is at depth k, the time complexity of iterative deepening is O(b^k), and its space complexity is O(b*k).
 
 

HEURISTIC "BEST-FIRST" SEARCH

"Knowledge is power."  How can we use knowledge to improve the general search algorithm?

One common type of knowledge is a scoring function that estimates how good a node would be as the next node to expand.

Note the word "estimates".  If the scoring function was perfect, then we wouldn't need any search at all.  The word "heuristic" means trial-and-error, exploratory, unguaranteed, rule of thumb.

The heuristic best-first search algorithm is

 input:(1) a function "eval" that scores nodes according to how
         promising they are for further search (lower is better)
      (2) a function "sorted-insert" that places new nodes into
                   the queue according to their scores

 fringe := make-queue(initial-node)
 loop
   if empty?(fringe) then return FAIL
   else
     n := remove-front(fringe)
     if isa-goal?(state(n)) then return n
     else
        fringe := sorted-insert(fringe,eval(expand(n)))
 end loop

This type of queue is called a "priority queue" by algorithm designers.  The idea is that the next node to be taken from the queue and expanded will be the one with the highest priority, i.e. the lowest score.
 
We will represent a node as a tuple with the following components: The depth of a node is the number of nodes on the path to this node from the root.  The depth of the root is 1 by this definition.

Note: In the book the authors also talk about the cost of a node, which is the cost of the path from the root to this node.  This is normally the sum of the costs of the operators used along the path to the node.  We assume that all operators have non-negative costs.


GREEDY HEURISTIC SEARCH

Let the heuristic scoring function h be defined as follows:
h(n) = estimated cost of the cheapest path from the state represented by the node n to a goal state
Greedy search is best-first search with h as its "eval" function.

For each particular problem, the designer must choose an appropriate h function.  For example, for road finding let h(n) be the straight line "as the crow flies" distance from the current state (i.e. geographical position) to the goal position.

Greedy search is called greedy because it always tries to make the biggest jump possible towards the goal, without "thinking" further ahead about what will happen from that point.  See Figure ? in AIMA for an example of how greedy search does not find the optimal path.

In the worst-case, the time and space complexity of greedy search are the same as for breadth-first search.  In the "average" case having a good heuristic function h should make greedy search much better.

Proving that greedy search is better is difficult mathematically.  It is common in AI that a heuristic idea is good but proving that it is good is difficult, precisely because heuristics are not guaranteed.

Here is an important general principle for studying a heuristic:  look for assumptions under which the heuristic is guaranteed to be correct.

Note that this is a global greedy search algorithm: the next node expanded is the one that looks best among all seen so far.  We'll discuss local search methods later.
 
 

MINIMIZING TOTAL PATH COST: A* SEARCH

This variety of best-first search is called A* for historical reasons.

The idea is that if we want to find the optimal path to the goal, i.e. the shortest or cheapest path, then the next node to explore should be the one that looks like it will yield the overall cheapest path.

The best estimate of the total cost of the path through a candidate node n is f(n) = g(n) + h(n) where h is the same estimator function of remaining distance as above and  g(n) = cost of path from the initial state to state(n).

The names f, g, h, and n are standard, so you should remember to use this notation.   Study the example in Figure ? carefully.
 
 

ANALYZING THE BEHAVIOR OF A*

For search algorithms, the word optimal has a special, restricted meaning: an algorithm is called optimal if it always finds a lowest-cost solution.  Note that no solution may exist, or multiple equally lowest-cost solutions may exist.

We will do an analysis here that shows that A* is optimal.  This analysis is different from a conventional algorithm analysis because

The assumptions are that  (1) the h function is admissible, and that (2) the f = g + h function is monotonic.

Intuitively, admissible means acceptable or reasonable.  Formally, h(n) is admissible if it is always true that

h(n) = estimated cost of cheapest path from state(n) to goal
        < actual cost of cheapest path from state(n) to goal
In words, h must be conservative, i.e. it must be an under-estimator.

 

THE IMPORTANT THEOREM ABOUT A*

Theorem:
(a) Optimality: the first path to a goal found by A* is an optimal path to a goal.
(b) Completeness: if a path to a goal exists, then A* will find it.
(c) Efficiency: no algorithm that is optimal and complete and  has only g and h available as knowledge will always expand  fewer nodes than A*.
Proof of optimality:  By definition, the next node expanded by A* is always one of the remaining unexpanded nodes that has the lowest f = g+h value (maybe the equal-lowest f value).

When A* expands a goal node, it first checks and discovers that it is indeed a goal node.  Now the h value of a goal node is zero, so its f value is its g value, which is exactly the cost of the path to this node.  Call this cost c*.

So at the instant that A* finds a goal node, every other node in the fringe, say n, must have f(n) >= c*.  For each such node n, f(n) is an under-estimate of the true cost c' of reaching the cheapest goal node reachable through n.  Therefore every other goal node has cost c' >= f(n) >= c*.  This proves that A* is optimal.

 



Copyright (c) by Charles Elkan, 2004.