CompSci 006 - Summer 2005

Lecture 23 - 8/4/05


Linear Search vs. Binary Search

Now that we've analyzed the Last time we talked about two different ways to search: linear search and binary search. Here now are two applets for demonstrating those two different techniques.

Linear Search

Binary Search

Analyzing Binary Search

Our analysis for Binary Search starts out much that same as our analysis for Linear Search. Instead of iterations through a for-loop this time we've got recursive calls to the binarySearch method.
   public static boolean binarySearch(int[] input, int startIndex, int endIndex, int goal){
   
      System.out.println("Searching in the range: "+startIndex+" "+endIndex);
   
      if(startIndex > endIndex)
         return false;
      
      int middle = (endIndex + startIndex)/2;
      System.out.println(input[middle]);
		
      if(input[middle] == goal)
         return true;
      else if(input[middle] < goal)
         return binarySearch(input, middle+1, endIndex, goal);
      else
         return binarySearch(input, startIndex, middle-1, goal);
   }
Except for the actual recursive call, everything that's done in this method can be bounded by a constant amount of time. (We can just assume that whatever time the recursive call takes is "assigned" to that call of the binarySearch method.) Thus, we get a constant bound of c3 for time taken on every call to the binary search method. This handles the constant factor, but how do I figure out how many times the binarySearch method is called in the worst possible case? It seems to divide the array in half each time it is called. Is there some way we can represent how many "halvings" it will take to eliminate the entire array?

In order to figure out how many "halvings" we need it can help to look at the opposite operation: doubling. Doubling is something that just seems more natural to a lot of people. If we figure out how many times we have to multiply by 2 to get to our array size that will also be how many times we have to halve it to eliminate the entire array.

20 1
21 2
22 4
23 8
24 16
25 32
26 64
27 128
28 256
29 512
210 1024


We can see that to get to our number of entries, 36, it will take 6 "doublings" to surpass that number. So, we can expect about 6 "halving" operations in the worst case to get from 36 possible array entries to just 1. Then, when we're left with just one possibility left, we can check it and then finish the algorithm.

The Logarithmic Function

The table above showed what powers of 2 we needed to get to various values. How do calculate what power of 2 is needed to get to a value we might have (like 36)? Luckily the logarithmic function does just what we need. If we take log2 36 we get a floating point value between 5 and 6. Since we can't make a fractional call to our method we just round up to 6 to figure out how many calls to our binarySearch method will occur in the worst case. This means that the running time of our binary search method can now be bounded by c3log2(n). Again, n is the number of entries we're searching through and c3 is our constant time bound on each call to the binarySearch method.

In computer science everything is done in binary and the trick of "halving" used by binary search shows up in many other algorithms. Using the log2 function is quite common. Thus, the usual notation in computer science is lg to represent log2. Using this, we can rewrite our running time for the binary search method as c3lg(n). Here's a table showing the running times of each search algorithm for increasing values of n.

Comparing Linear Search vs. Binary Search

Now that we've analyzed the running time for binary search we can compare it to linear search. To fully account for everything, I should write the running time of binary search as c3lg(n) + c4. This extra constant c4 is just to account for whatever constant amount of time was spent before the first call to the binarySearch method. Similarly, the running time for linear search gets an extra constant added on: c1n + c2.

To show off the difference that the lg function makes, the table below compares the values of n and lg(n) for various values of n. I also give the full functions for linear search and binary search at the top, but the constants are left off for the table entries to save space.

Values of n Linear Search Binary Search
n c1n + c2 c3lg(n) + c4
100 100 7
1,000 1000 10
1,000,000 1,000,000 20
1,000,000,000 1,000,000,000 30
1,000,000,000,000 1,000,000,000,000 40
1 x 1015 1 x 1015 50


These values for lg(n) are rounded up to the closest integer, but this table is sufficient to show the immense difference that the lg(n) function makes. I left off the constant factors in that table, but would they matter? The c2 and c4 terms were justed added on and weren't changed by different values of n. However large those values might be, they're still constants. That means there's some point where the value of n can become so large that those constant become tiny, tiny low-order digits.

What about the c1 and c3 terms? They were actually multiplied against the functions of n. Still, they're just constants. As the table above demonstrates, multiplying lg(n) by a constant factor isn't going to matter. At some point n becomes large enough that binary search's function is going to give much smaller values than linear search's function. This occurs because the function lg(n) grows much more slowly than the function n. This was really the only important part of each algorithm's function when we let n grow large. Computer science recognizes this and even has special notation for comparing such functions.

O-Notation and &Theta-Notation

For the two search algorithms I analyzed above I was focusing on the worst possible cases. In keeping with this focus, my functions were designed to give an upper bound on how long each algorithm could take on a given input size of n. For the linear search algorithm I bounded it's running time with the function: c1n + c2. Recall that the constant factors didn't make much difference when we allowed the value of n to grow. So, in computer science we would say that the running time of linear search is O(n). This notation would be pronounced as big-Oh of n or Oh of n. It strips away the constant factors that didn't make much of a difference and focuses on just the important terms.

Similarly, for the binary search algorithm we would give it's running time as O(lg n). This notation is common for stating the running times for algorithms. What does the notation mean exactly? It describes the asymptotic upper bound for the magnitude of a function in terms of a simpler function. In this case we were describing the complicated functions with the constant factors with simpler functions that dropped those constant factors. But what's an asymptotic upper bound? It's a bound on how the function behaves as it's variable (in our case, n) grows towards infinity. The picture below gives such an example.




In this picture we've got a function g(n) that is asymptotically bounded by f(n). For some constant factor c1 multiplied against f(n), after a certain point x1 the g(n) function will always be equal to or below the c1f(n) function. Thus, we could say that g(n) is O(f(n).

Very similar to O-notation is &Theta-notation. The picture below shows an example, again with g(n) being bounded by f(n).




In this case we've got two different constant factors: c1 and c2. The first one allows f(n) to serve as an upper bound for g(n) after the point x1. The second constant c2 allows f(n) to serve as a lower bound for g(n) after the point x2. In this particular example f(n) grows so much like g(n) grows that f(n) can bound g(n) both above and below. This means that, asymptotically, f(n) gives a pretty good approximation for g(n). Therefore, we can say that g(n) is &Theta(f(n)).