Randomized Algorithms
* randomized rou{t,nd}ing Chap
4.3, 5.1, 5.2, 5.6
* "the probabilistic method"
=============================================================================
Randomized routing/rounding
---------------------------
Given an undirected graph and a set of pairs {(s_i, t_i)}
we want to
route these pairs to minimize congestion. NP-hard. Approximate?
Idea: (Raghavan & Thompson)
1. Solve fractionally. Think of as multi-commodity flow. (View as
a decision problem: is there a solution with congestion
at most C ==>
set capacities on edges to C, and then do binary search
on C). Can
solve with linear programming. __
/|\ |\
\|_\|/
2. For each pair (s_i, t_i) we have a flow: Now do "path stripping":
Can write this flow as a union of paths, with path j
carring some
fraction f_j of flow. sum_j f_j = 1.
3. Choose routing for (s_i, t_i) path by randomly
selecting according
to the distribution f_j.
Analysis: fix some edge. Let p_i be the flow of commodity i on this
edge. This
also means that p_i is the probability that we picked this
edge for routing (s_i, t_i) in step 3. So, for a given edge, can
think as a box with of a bunch of balls, one per (s_i,
t_i), each
thrown into this box with prob p_i, INDEPENDENTLY. Expected number of
balls is C.
Chernoff gives us:
Pr[total
> (1+epsilon)C] < e^{-epsilon^2*C/3}
So, if C >> log(n), then w.h.p., maximum is only
1+epsilon times
larger than the expectation.
What if C=1, or C is constant? In this case, a bound we can apply is
Pr[sum
> k*u] < (e^(k-1)/k^k)^u, where u is expectation.
(This bound holds for all k>1, and where
"sum" is a sum of independent
{0,1} random variables.)
So, set k to be O(log(n)/loglog(n)), and then get
1/n^c. Somewhat
like our discussion of the maximum for the occupancy
problem.
------------------------------------------------------------------------
Note: last time talked about how with pairwise
independence, can only
assume max is at most sqrt(n), and wondered how the
hashing methods we
discussed would behave. In fact, this is the topic of Gabor Tardos's
theory seminar this Friday.
-----------------------------------------------------------------------
"The Probabilistic Method": We want to show
that something exists. Do
this by setting up some probability distribution and
showing that what
we want happens with probability > 0.
Spencer's Graduation/Tenure Game
--------------------------------
Begin with some set of people at various positions.
2 players: Paul (partitioner) splits people into two
groups
Carole (chooser) selects one
group to be removed, other advances.
Paul
wins if somebody graduates, Carole wins if nobody graduates.
Theorem 1: If sum_k a_k / 2^k < 1, then Carole has a
winning strategy.
Theorem 2: If sum_k a_k / 2^k >= 1, then Paul has a
winning strategy.
Proof of theorem 1.
Suppose Carole plays randomly.
What is the
expected number of people that reach the goal? For instance, for
fixed person who is initially at level k, what is the
probability that
this person reaches the goal? Therefore, no matter what Paul's
strategy is, a random Carole has a non-zero chance of
winning.
Therefore, Paul cannot have a winning strategy and so
Carole does.
What is Carole's winning strategy? Just use Theorem 1: pick subset
where sum is less than 1/2, so when they advance, the sum
is less than
1, and we still have a winning strategy.
Idea sometimes called the "method of conditional
expectations".
General idea: say we've shown that making random choices
results in a
good expected value. Now we consider our first choice, and calculate
conditional expectations given this first choice, and we
pick one
where conditional expectation is still good (which we
know must occur,
for appropriate definitions of "good").
What about Paul?
His strategy is to split so that both groups have sum
>= 1/2, by same reasoning. Can always do this.
(Go through elements
from largest to smallest.)
MAX SAT
---------
CNF formula: an AND of clauses. "exactly k" CNF: each clause of size
exactly k. (k-CNF has each of size at most k.) Say we're
given one and
we want to satisfy as many clauses as we can.
Claim: For any "exactly k"-CNF, there exists a
solution that satisfies
at least 1 - 1/2^k of the clauses.
Proof: consider a random assignment. Expected # clauses satisfied
E=(1-1/2^k)m.
(m=number of clauses).
How about a deterministic algorithm? Idea: given any
partial assignment P,
we can calculate the expected number of clauses satisfied
given that
those variables not specified in P are set randomly.
(Just look
at the clauses one by one.) So, use conditional expectation method.
--> Calculate E_0 = expected number satisfied given
x_1=0 and rest
are random.
E_1 = expected number satsified given x_1=1 and rest are
random. E =
(E_0 + E_1)/2. So, fix x_1 depending on whichever is larger.
Continue with x_2, etc.
What about MAX-SAT in general? If there are m_k clauses of size k, we
satisfy sum_k m_k(1-1/2^k) of them.
In the worst case, this is 1/2. Can't hope for better, e.g., if the
formula is just "x_1 and not(x_1)". How about an approximation
algorithm that satisfies nearly as many clauses as the
best possible
solution? (note: MAX 2-SAT is NP-hard). Here's a way to
satisfy at
least 3/4 of the maximum possible. Note: above will work so long as
no singletons.
Now we'll look at a randomized rounding procedure that
does well so long as all clauses are small. Then we'll combine them.
--> solve fractional version of problem. Instead of requiring
variable x_i
to be in {0,1}, we allow x_i to be in [0,1]. define
not(x_i) to
be 1-x_i. Allow clauses to be
"partially satsified":
for clause
(x_1 or x_2 or not(x_3)), let "satisfiedness" be:
min(1, x_1 +
x_2 + (1-x_3)).
I.e.,
if sum is less than 1, then it represents how satisfied
the
clause is, and if greater than 1, then we say clause is
satisfied.
Then find
solution that maximizes total satisfaction.
Set up as
LP: x_i in [0,1]. Variable z_j for
clause j:
z_j
<= 1. z_j <=
sum-of-literals-in-clause-j.
Then,
maximize sum z_j.
--> Now, let's do randomized rounding: set variable i
to 1 with prob x_i.
Claim: if
clause j has k literals, then
prob(clause
j is satisfied) >= z_j * (1 - (1-1/k)^k)
[Note: for
k=1, this is z_j, for k=2, this is 3z/4, for k=3, this is 0.704z]
Proof: idea:
say z_j=1 and all variables in it are at 1/k. Then,
prob not
satisfied is exactly (1-1/k)^k.
Say z_j may not be 1, but
all variables
are equal at z_j/k. Then prob
satisfied = 1 - (1-z_j/k)^k.
This is
>= what we want since equal at z_j=0 or 1, and concave.
Then, have
to show that for a given sum, the prob is maximized when
they're all
equal which is intuitively easy to see (but I don't have a
nongrungy
argument right now...)
--> So, this strategy does well when clauses are
small, previous did
well when
clauses are big. E.g.,
prob(clause j is satisfied) as a
function of
k is:
strategy
1 | strategy 2
------------|-------------
k=1 1/2 | z_j
k=2 3/4 | 3/4 * z_j
k=3 7/8 | 0.704 z_j
--> so, let's just flip a coin and with prob 1/2 use
strategy 1, with
prob 1/2
use strategy 2. Goodness is
average of two values from above
table. Just want this to be >= 3/4 * z_j,
which, in fact, it
is. (just
have to do the calculation in general).
Notes:
Current best approxs: 0.931 for MAX 2-SAT
[Feige-Goemans], 0.801 for
MAX 3-SAT [Trevisan, Sorkin, Sudan, Williamson], 0.758
for MAX-SAT
[Goemans-Williamson]
Current best hardness results: 36/37 = 0.973 for
MAX-3SAT,
73/74=0.986 for MAX 2-SAT.