Sunday, 18 November 2012

Algorithm: Binary search

Binary search is a decrease and conquer search algorithm than can be used on a sorted array. It operates by determining whether the search value is less than or greater than the middle value and recursively calling itself on the lower or upper half of the list respectively until either the value is found or not found.

The binary search algorithm is very similar to the binary search tree's search operation though not identical. Binary search's average and worst case time complexity is O(log n), while binary search tree does have an average case of O(log n), it has a worst case of O(n). Namely when the tree's height equals the number of items in the tree (incredibly unlikely in any real scenario).

The real power in binary search shows itself when we use it to search a huge list of items, much like any logarithmic algorithm. Exponential functions work by looking at the whole input data when considering each item. Logarithms (inverse-exponential functions) work by repeatedly halving the input data. Consider a list that contains 1 million items, if this list happens to be sorted we can use binary search to search for an item in no more than 20 steps.

Values compared Possible candidates
0 1000000
1 500000
2 250000
3 125000
4 62500
5 31250
6 15625
7 7812
8 3906
9 1953
Values compared Possible candidates
10 976
11 488
12 244
13 122
14 61
15 30
16 15
17 7
18 3
19 1

While maintaining a sorted list does take more effort, it may be worth it when we consider the incredible speed of the search. It all depends on what operation demands faster speed in your application.

Alternatively, we can get the same search speed with a smaller insert cost by using a balanced binary search tree, though there is quite a bit more to the implementation.

Complexity

Time, worst case
O(log n)
Time, best case
O(1)
Time, average case
O(log n)
Space, worst case
O(1)

Decrease and conquer

These types of algorithms have often been labelled as divide and conquer algorithms like merge sort. The name decrease and conquer has been proposed in order to differentiate any recursive algorithm from algorithms that halve a problem into two sub-problems which allowing for better parallelism.

Code

public class BinarySearch {
    public static boolean search(int[] sortedArray, int value) {
        return search(sortedArray, value, 0, sortedArray.length - 1);
    }
    
    private static boolean search(int[] sortedArray, 
                                  int value, 
                                  int min, 
                                  int max) {
        if (max < min)
            return false;
        else {
            int mid = (min + max) / 2;

            if (sortedArray[mid] > value)
                return search(sortedArray, value, min, mid-1);
            else if (sortedArray[mid] < value)
                return search(sortedArray, value, mid+1, max);
            else
                return true;
        }
    }
}