Randomized Algorithms
Randomized algorithms for on-line problems
=============================================================================
On-line problems: get a sequence of requests and need to
process
online.
Current decisions can affect future costs.
Example of rent-or-buy problem: from time to time you get
invited to
soem formal occasion and need to either rent or buy a
tuxedo. Say
rental costs $60 and purchase cost is $600. If you knew that you
would need a tux less than 10 times (before the styles
change or you
gain weight, etc) then it makes sense to rent. If you knew that you
would need it more than 10 times, then it makes sense to
buy right
away. But
what if you don't know?
Competitive ratio of algorithm: worst-case ratio of alg's
cost to the
optimal cost in hindsight. A deterministic algorithm with best
competitive ratio for the rent-or-buy problem is (assume
buy cost is
an integer multiple of rent cost):
"rent so long as total spent is less than purchase cost, then buy".
(e.g., for tuxedo problem, it is "rent 9 times then buy")
Competitive ratio < 2:
If
you end up needing tux < 10 times, then you are optimal. If you
end
up needing it more than that, then opt is $600 and your cost is
$540+$600.
Formally, we'll allow an additive constant in our defn of
competitive
ratio. So, deterministic alg A has comp ratio r if there
exists a
constant b such that for all request sequences sigma,
Cost_A(sigma)
< r*(opt cost on sigma) + b.
Here's a nice framework for many online problems:
The Metrical Task System problem:
You are
controlling a machine that can be in one of n states. There
is some distance metric d among the states of the
machine. You given,
online, a sequence of tasks. A task is a vector of costs, specifying
how much it costs to process the task in each state of
the machine.
Given some task T, the algorithm may choose to process it
in its
current state i, paying T_i, or it may choose to move to
some other
state j and process it there, paying a total of T_j +
d_{ij}.
Some results:
* For any n-state metric space, for any deterministic
algorithm, there exist
task sequences causing that algorithm to have a
competitive ratio at
least n.
Proof: let
the minimum inter-state distance in the metric space be
1. Each task will be a vector of the form
(0000010000), where the "1"
is in the
algorithm's current state. So, the
algorithm pays at least 1
per task
for a total cost of t in t tasks.
But, there must be some
state that
has at most t/n 1's in it. So, the
optimal cost is at most
t/n plus
the diameter of the metric space.
As t goes to infinity,
the ratio
goes to n.
Now, let's look at randomized algorithms. To do this, we need to
define the notion of competitive ratio for a randomized
algorithm.
Say that alg A has comp ratio r if there exists b such
that for all
request sequences sigma,
E[Cost_A(sigma)]
< r*(opt cost on sigma) + b.
This is often called the "oblivious adversary"
model since it
corresponds to a setting where the request sequence is
chosen FIRST,
and then the algorithm makes its random choices.
* For the uniform metric space, (d_ij = 1 for all i != j)
you can
achieve a ratio of O(log n) using the randomized
"Marking" algorithm:
The
algorithm runs in phases. In each
phase,
- begin at a random state with all states "unmarked"
- given some task T, mark all states whose total cost in this
phase
is at least 1.
- if our state is marked, we move to a random unmarked state; if
there
are none, then we clear all the marks, move to a random state,
and
end the phase.
If all task vectors are {0,1}^n valued, then the
competitive ratio of
this algorithm is H_n. More generally, the competitive ratio is at most 2H_n.
(H_n = 1 + 1/2 + 1/3 + ... + 1/n).
Proof: Let's for simplicity just look at task vectors of
the form
(00010000). It's easy to
generalize from here. Initially
our prob
distribution is (1/n,1/n,...,1/n).
Given a task such as (000000001),
our prob
dist changes to (1/(n-1),...,1/(n-1), 0).
More generally,
when the
ith state is marked, our probability of moving is 1/(n-i+1)
and
afterwards we have probability 1/(n-i) in each of the unmarked
states. So, our total
expected movement cost is at most 1/n + 1/(n-1)
+ ... + 1/2
+ 1 = H_n.
* For the uniform metric space, every algorithm has
competitive ratio
at least Omega(log n).
Proof: suppose each task is a random one of the form
(000010000000).
Then, the expected cost of any online algorithm per task
is at least
1/n. But,
the off-line algorithm, on each move, will go to the state
whose next "expensive" task is farthest in the
future. I.e., we are
throwing balls randomly into boxes, and the offline
algorithm moves
only when every box finally has a ball. This is the "coupon collector's
problem", and we've seen that the expected time is
Theta(n log n).
So, the offline expected cost per task is 1/(n log n).
In the paging problem, we have n pages and a cache of
size k. Given a
page request, we incur a fault if the page is not in the
cache, and we
then have to pay 1 unit to bring the page into the cache
and throw out
one of the current cache occupants. The uniform MTS
problem can model
this problem when n=k+1.
* More generally, the Marking algorithm achieves a ratio
2H_k for the
paging problem, no matter what n is. (The marking algorithm becomes:
mark each page in the cache as "recently used"
or "not recently used".
When a page is requested, mark the page as recently
used. When we
incur a page fault, throw out a random "not recently
used" page from
the cache.
If all pages are marked as recently used, then end the
phase and clear the marks.)
Proof: see the book...