Randomized Algorithms

 

* randomized rou{t,nd}ing      Chap 4.3, 5.1, 5.2, 5.6

* "the probabilistic method"

 

 

=============================================================================

 

Randomized routing/rounding

---------------------------

Given an undirected graph and a set of pairs {(s_i, t_i)} we want to

route these pairs to minimize congestion. NP-hard.  Approximate?

 

Idea: (Raghavan & Thompson)

 

1. Solve fractionally.  Think of as multi-commodity flow.  (View as

a decision problem: is there a solution with congestion at most C ==>

set capacities on edges to C, and then do binary search on C).  Can

solve with linear programming.  __

                              /|\ |\

                              \|_\|/

 

2. For each pair (s_i, t_i) we have a flow:  Now do "path stripping":

Can write this flow as a union of paths, with path j carring some

fraction f_j of flow.  sum_j f_j = 1.

 

3. Choose routing for (s_i, t_i) path by randomly selecting according

to the distribution f_j.

 

 

Analysis: fix some edge.  Let p_i be the flow of commodity i on this

edge.  This also means that p_i is the probability that we picked this

edge for routing (s_i, t_i) in step 3.  So, for a given edge, can

think as a box with of a bunch of balls, one per (s_i, t_i), each

thrown into this box with prob p_i, INDEPENDENTLY.  Expected number of

balls is C.  Chernoff gives us:

 

     Pr[total > (1+epsilon)C] < e^{-epsilon^2*C/3}

 

So, if C >> log(n), then w.h.p., maximum is only 1+epsilon times

larger than the expectation.

 

What if C=1, or C is constant?  In this case, a bound we can apply is

 

     Pr[sum > k*u] < (e^(k-1)/k^k)^u, where u is expectation.

 

(This bound holds for all k>1, and where "sum" is a sum of independent

{0,1} random variables.)

 

So, set k to be O(log(n)/loglog(n)), and then get 1/n^c.  Somewhat

like our discussion of the maximum for the occupancy problem.

------------------------------------------------------------------------

Note: last time talked about how with pairwise independence, can only

assume max is at most sqrt(n), and wondered how the hashing methods we

discussed would behave.  In fact, this is the topic of Gabor Tardos's

theory seminar this Friday.

-----------------------------------------------------------------------

 

"The Probabilistic Method": We want to show that something exists. Do

this by setting up some probability distribution and showing that what

we want happens with probability > 0.

 

Spencer's Graduation/Tenure Game

--------------------------------

Begin with some set of people at various positions.

 

2 players: Paul (partitioner) splits people into two groups

        Carole (chooser) selects one group to be removed, other advances.

 

     Paul wins if somebody graduates, Carole wins if nobody graduates. 

 

Theorem 1: If sum_k a_k / 2^k < 1, then Carole has a winning strategy.

Theorem 2: If sum_k a_k / 2^k >= 1, then Paul has a winning strategy.

 

Proof of theorem 1.  Suppose Carole plays randomly.  What is the

expected number of people that reach the goal?  For instance, for

fixed person who is initially at level k, what is the probability that

this person reaches the goal?  Therefore, no matter what Paul's

strategy is, a random Carole has a non-zero chance of winning.

Therefore, Paul cannot have a winning strategy and so Carole does.

 

 

What is Carole's winning strategy?  Just use Theorem 1: pick subset

where sum is less than 1/2, so when they advance, the sum is less than

1, and we still have a winning strategy.

 

Idea sometimes called the "method of conditional expectations".

General idea: say we've shown that making random choices results in a

good expected value.  Now we consider our first choice, and calculate

conditional expectations given this first choice, and we pick one

where conditional expectation is still good (which we know must occur,

for appropriate definitions of "good").

 

 

What about Paul?  His strategy is to split so that both groups have sum

>= 1/2, by same reasoning.  Can always do this.  (Go through elements

from largest to smallest.)

 

 

 

MAX SAT

---------

 

CNF formula: an AND of clauses.  "exactly k" CNF: each clause of size

exactly k. (k-CNF has each of size at most k.) Say we're given one and

we want to satisfy as many clauses as we can.

 

Claim: For any "exactly k"-CNF, there exists a solution that satisfies

at least 1 - 1/2^k of the clauses.

 

Proof: consider a random assignment.  Expected # clauses satisfied

E=(1-1/2^k)m.  (m=number of clauses).

 

How about a deterministic algorithm? Idea: given any partial assignment P,

we can calculate the expected number of clauses satisfied given that

those variables not specified in P are set randomly. (Just look

at the clauses one by one.)  So, use conditional expectation method.

 

--> Calculate E_0 = expected number satisfied given x_1=0 and rest

are random.  E_1 = expected number satsified given x_1=1 and rest are

random.  E = (E_0 + E_1)/2. So, fix x_1 depending on whichever is larger.

Continue with x_2, etc.

 

 

What about MAX-SAT in general?  If there are m_k clauses of size k, we

satisfy sum_k m_k(1-1/2^k) of them.

 

In the worst case, this is 1/2.  Can't hope for better, e.g., if the

formula is just "x_1 and not(x_1)".  How about an approximation

algorithm that satisfies nearly as many clauses as the best possible

solution? (note: MAX 2-SAT is NP-hard). Here's a way to satisfy at

least 3/4 of the maximum possible.  Note: above will work so long as

no singletons.  Now we'll look at a randomized rounding procedure that

does well so long as all clauses are small.  Then we'll combine them.

 

--> solve fractional version of problem.  Instead of requiring

 variable x_i to be in {0,1}, we allow x_i to be in [0,1].  define

 not(x_i) to be 1-x_i.  Allow clauses to be "partially satsified":

 for clause (x_1 or x_2 or not(x_3)), let "satisfiedness" be:

 min(1, x_1 + x_2 + (1-x_3)).

     I.e., if sum is less than 1, then it represents how satisfied

     the clause is, and if greater than 1, then we say clause is

     satisfied.

 Then find solution that maximizes total satisfaction.

 

 Set up as LP: x_i in [0,1].  Variable z_j for clause j:

     z_j <= 1.  z_j <= sum-of-literals-in-clause-j.

 Then, maximize sum z_j.

 

 

--> Now, let's do randomized rounding: set variable i to 1 with prob x_i.

 

 Claim: if clause j has k literals, then

     prob(clause j is satisfied) >= z_j * (1 - (1-1/k)^k)

 

 [Note: for k=1, this is z_j, for k=2, this is 3z/4, for k=3, this is 0.704z]

 

 Proof: idea: say z_j=1 and all variables in it are at 1/k.  Then,

 prob not satisfied is exactly (1-1/k)^k.  Say z_j may not be 1, but

 all variables are equal at z_j/k.  Then prob satisfied = 1 - (1-z_j/k)^k.

 This is >= what we want since equal at z_j=0 or 1, and concave.

 Then, have to show that for a given sum, the prob is maximized when

 they're all equal which is intuitively easy to see (but I don't have a

 nongrungy argument right now...)

 

--> So, this strategy does well when clauses are small, previous did

 well when clauses are big.   E.g., prob(clause j is satisfied) as a

 function of k is:

 

               strategy 1  |   strategy 2

               ------------|-------------

          k=1     1/2        |      z_j

                k=2        3/4      |    3/4 * z_j

                k=3     7/8      |    0.704 z_j

 

--> so, let's just flip a coin and with prob 1/2 use strategy 1, with

  prob 1/2 use strategy 2.  Goodness is average of two values from above

  table.  Just want this to be >= 3/4 * z_j, which, in fact, it

  is. (just have to do the calculation in general).

 

Notes:

 

Current best approxs: 0.931 for MAX 2-SAT [Feige-Goemans], 0.801 for

MAX 3-SAT [Trevisan, Sorkin, Sudan, Williamson], 0.758 for MAX-SAT

[Goemans-Williamson]

 

Current best hardness results: 36/37 = 0.973 for MAX-3SAT,

73/74=0.986 for MAX 2-SAT.