Sampling:
- Given
complex state space
- Want
to sample from it
- Use
some Markov Chain
- Run
for a long time
- end
up ``near'' stationary distribution
- Reduces
sampling to local moves (easier)
- no
need for global description of state space
- Allows
sample from exponential state space
Formalize: what is ``near'' and ``long time''?
- Stationary
distribution

- arbitrary
distribution q
- relative
pointwise distance (r.p.d.)
| qj -
|/
- Intuitively
close.
- Formally,
suppose r.p.d.
.
- Then (1 -
)
q
- So can
express distribution q as ``with
probability 1 -
, sample from
. Else, do something
wierd.
- So if
small, ``as if'' sampling from
each time.
- If
poly small, can do poly samples without goof
- Gives
``almost stationary'' sample from Markov Chain
- Mixing
Time: time to reduce r.p.d to some

Outline:
- Describe
problem. Membership oracle
P hard to volume
intersection of half spaces in n dimensions
- In low
dimensions, integral.
- even
for convex bodies, can't do better than (n/log n))n ratio
- what
about FPRAS?
Estimating
:
- pick
random in unit square
- check if
in circle
- gives
ratio of square to circle
- Extends
to arbitrary shape with ``membership oracle''
- Problem:
rare events.
- Circle
has good easy outer box
Problem: rare events:
- In 2d,
long skinny shapes
- In
high d, even round shape has exponentially
larger bounding box
Solution: ``creep up'' on volume
- modify
P to contain unit sphere B1 r1, contined in larger B2 of radius r,
r polynomial
- choose
near 1 - 1/d.
- Consider
sequence of bodies
rP
B2
- note
for large i, get P
- but
for i = 0, body
contains B2
- so
volume known
- so
just need ratios
- At
each step, need to random sample from
rP
B2
- Sample
method: random walk forbidden to leave
- eigenvalues
show rapid mixing
- egienvalues
small because body convex: no bottlenecks
omitted
Another example and application: (n, d, c)-Expanders.
- bipartite
- n vertices, regular degree d
- |
(S)|
(1 + c(1 - 2| S|/n))| S|
- factor
c more neighbors, at least until S near n/2.
- Add
self loops (with probability 1/2 to deal with
periodicity.
- What
is stationary distribution? Uniform.
- Intuition
on convergence: because neighborhoods grow, position becomes unpredictable
very fast.
- Theorem:

1 - 
- Converse
theorem: if

1 -
, get expander
with
c
4(
-
)
Gabber-Galil expanders:
- Do
expanders exist? Yes! proof: probabilistic method.
- But in
this case, can do better deterministically.
- Gabber
Galil expanders.
- Let n = 2m2. Vertices are (x, y) where x, y
Zm (one set per side)
- 5
neighbors: (x, y),(x, x + y),(x, x + y + 1),(x + y, y),(x + y + 1, y) (add mod m)
- or 7
neighbors of similar form.
- Theorem:
this d = 5 graph
has c = (2 -
)/4, degree 7 has twice the
expansion.
- in
other words, c and d
are constant.
- meaning
= 1 -
for some constant 
- So
random walks on this expander mix very fast: for polynomially
small r.p.d., O(log n) steps of random walk suffice.
- Note
also that n can be huge, since only need to
store one vertex (O(log n) bits).
Application: conserving randomness.
- Consider
an BPP algorithm (gives right answer with probability 99/100
(constant irrelevant) using n bits.
- t independent trials with majority rule reduce
failure probability to 2-O(t)
(chernoff), but need tn bits
- in
case of RP, used 2-point
sampling to get error O(1/t) with 2n bits and t trials.
- Use
walk instead.
- vertices
are N = 2n
(n-bit) random strings for algorithm.
- edges
as degree-7 expander
- only
1/100 of vertices are bad.
- what
is probability majority of time spent there?
- in
limit, spend 1/100 of time there
- how
fast converge to limit? How long must we run?
- Power
the markov chain so

1/10 (constant number of steps)
- use
random seeds encountered every
steps.
- number
of bits needed:
- O(n) for stationary starting point
- 3
more per trial,
- Theorem:
after 7k samples, probability majority
wrong is 1/2k. So error 1/2n with O(n) bits!
- Let B be powered transition matrix
- let p(i) be
distribution of sample i, namely p0Bi
- Let W be indicator matrix for good witnesses, namely 1
at diagonal i if i
is a witness.
completmentary set I - W.
- | piW|1 is probability pi is witness set. similar for nonwitness.
- Consider
a sequence of 7k results ``witness or
not''
- represent
as matrices S = (S1,..., S7k)
{W,
}7k
- claim
Pr[S] = | p(0)(BS1)(BS2) ... (BS7k)|1.
(sums prob. of paths through
correct sequence of witness/nonwitness)
- defer:
| pBW|2
| p|2
and | pB
|2
| p|2
- deduce
if more than 7k/2
bad witnesses,
|
| p0 BSi|1
|

|
| p0 BSi|
|
|
|
|

|
( )7k/2| p0|
|
|
|
|

|
= ( )7k/2
|
|
- At
same time, only 27k bad sequences, so
error prob. 27k5-7k/2
2-k
- proof
of lemma:
- write
p =
ciei
- obviously
| pBW|
| pW|
since W jiust zeros some stuff out.
- write
p =
+ y as before where y .
= 0
- argue
that |
B
|
|
|/10 and yB
|
| y|/10, done.
- First
:
- recall
B
=
is uniform vector, all coords 1/
has only 1/100
of coordintes nonzero, so
- | e1
| =
=
1/10
- Now y: just note | yB|
| y|/10
since 
1/10. Then
zeros out.
- summary:
part unlikely to be in witness set, y part unlikely to be relevant.
Method
- Run
two copies of Markov chain Xt, Yt
- Each
considered in isolation is a copy of MC (that is, both have MC
distribution)
- but they are not independent: they make dependent
choices at each step
- in
fact, after a while they are almost certainly the same
- Start
Yt in
stationary distribution, Xt
anywhere
- Coupling
argument:
|
Pr[Xt = j]
|
=
|
Pr[Xt = j | Xt
= Yt]Pr[Xt = Yt] + Pr[Xt = j | Xt Yt]Pr[Xt Yt]
|
|
|
|
=
|
Pr[Yt = j]Pr[Xt
= Yt] + Pr[Xt = j | Xt Yt]
|
|
- So
just need to make
(which is
r.p.d.) small enough.
n-bit Hypercube walk: at each step, flip
random bit to random value
- At
step t, pick a random bit b,
random value v
- both
chains set but b to value v
- after
O(n log n) steps,
probably all bits matched.
Counting k colorings when k > 2
+ 1
- The
reduction from (approximate) uniform generation
- compute
ratio of coloring of G to coloring of G - e
- Recurse
counting G - e
colorings
- Base
case kn
colorings of empty graph
¬… Bounding
the ratio:
o note
G - e colorings
outnumber G colorings
o By
how much? Let L colorings in difference (u and v same color)
o to
make an L coloring a G
coloring, change u to one of k -
=
+ 1
legal colors
o Each
G-coloring arises at most one way from this
o So
each L coloring has at least
+ 1 neighbors unique to them
o So
L is 1/(
+
1) fraction of G.
o So
can estimate ratio with few samples
¬… The
chain:
o Pick
random vertex, random color, try to recolor
o loops,
so aperiodic
o Chain
is time-reversible, so uniform distribution.
¬… Coupling:
o choose
random vertex v (same for both)
o based
on Xt and Yt, choose
bijection of colors
o choose
random color c
o apply
c to v in Xt (if can), g(c)
to v in Yt (if can).
o What
bijection?
§ Let
A be vertices that agree in color, D that disagree.
§ if
v
D, let g
be identity
§ if
v
A, let N
be neighbors of v
§ let
CX be colors
that N has in X
but not Y (X
can't use them at v)
§ let
CY similar,
wlog larger than CX
§ g should swap each CX with some CY, leave other colors fixed. Result: if X doesn't change, Y doesn't
¬… Convergence:
o Let
d'(v) be number of neighbors of v
in opposite set, so
d'(v) =
d'(v) = m'
- Let
= | D|
- Note
at each step,
changes by 0,±1
- When
does it increase?
- v must be in A,
but move to D
- happens
if only one MC accepts new color
- If
c not in CX or CY, then g(c) = c and both
change
- If
c
CX,
then g(c)
CY so neither moves
- So
must have c
CY
- But
| CY|
d'(v), so probability this happens is

.
= 
- When
does it decrease?
- must
have v
D, only one moves
- sufficient
that pick color not in either neighborhood of v,
- total
neighborhood size 2
, but that counts the d'(v) elements of A twice.
- so
Prob.

.
= 
+ 
- Deduce
that expected change in
is difference of
above, namely
- 
= - a
.
- So
after t steps, E[
]
(1 - a)t
(1 - a)tn.
- Thus,
probability
> 0 at
most (1 - a)tn.
- But
now note a > 1/n2, so n2log n steps reduce to one over
polynomial chance.
Note: couple depends on state, but who cares
- From
worm's eye view, each chain is random walk
- so,
all arguments hold
Counting vs. generating:
- we
showed that by generating, can count
- by
counting, can generate: