Sampling:
- Given
complex state space
- Want
to sample from it
- Use
some Markov Chain
- Run
for a long time
- end
up ``near'' stationary distribution
- Reduces
sampling to local moves (easier)
- no
need for global description of state space
- Allows
sample from exponential state space
Formalize: what is ``near'' and ``long time''?
- Stationary
distribution

- arbitrary
distribution q
- relative
pointwise distance (r.p.d.)
| qj -
|/
- Intuitively
close.
- Formally,
suppose r.p.d.
.
- Then (1 -
)
q
- So
can express distribution q as ``with
probability 1 -
, sample from
. Else, do something
wierd.
- So if
small, ``as if'' sampling from
each time.
- If
poly small, can do poly samples without goof
- Gives
``almost stationary'' sample from Markov Chain
- Mixing
Time: time to reduce r.p.d to some

Method 1 for mixing time: Eigenvalues.
- Consider
transition matrix P.
- Eigenvalues

... 

- Corresponding
Eigenvectors e1,..., en.
- Any
vector q can be written as
aiei
- Then qP =
ai
ei
- and qPk =
ai
ei
- so
sufficient to understand eigenvalues and vectors.
- Is
any |
| > 1?
- If
so, eiP =
P
- let
M be max entry of ei (in absolute value)
- if
> 1,
then some eiP
entry is
M > M
- any
entry of eiP
is a convex combo of values at most M, so
max value M, contradiction.
- Deduce:
all eigenvalues of stochastic matrix at most 1.
- How
many
= 1?
- Stationary
distribution (e1
=
)
- if
any others, could add a little bit of it to e1, get second stationary distribution
- What
about -1? Only if periodic.
- so
all other coordinates of eigenvalue decomposition decay as
.
- So if
can show other
small,
converge to stationary distribution fast.
- In
particular, if
< 1 - 1/poly,
get polynomial mixing time
Expanders:
- Definition:
(n, d, c) expander is d-regular bipartite graph such that
|
(S)|
(1 + c(1 - 2| S|/n))| S|
- Translation:
any small set has constant factor as many neighbors
- no
bottlenecks in graph
- Lemma:
random walk on (n, d, c) expander with
constant c has uniform stationary
distribution and second eigenvalue 1 - O(1/d )
- Lemma:
if second eigenvalue of graph is 1 -
/d for constant
, then graph is an expander with constant c
- Deduce:
mixing time in expander is O(log n) to get
r.p.d. (since
= 1/n)
- How
bound eigenvalues? Messy math.
Counting perfect matchings
- Choose
random n-edge set
- check
if matching
- problem:
rare event
- to
solve, need sample space where matchings are dense
- Idea: Mn dense in Mn
Mn
- 1
- recurse
down
Random walk
- based
on using uniform generation to do sampling.
- applies
to minimum degree n/2
- Let Mk be k-edge matchings, | Mk| = mk
- algorithm
estimates all ratios mk/mk - 1, multiplies
- claim:
ratio mk + 1/mk polynomially bounded (dense).
- deduce
sufficient to generate randomly from Mk
Mk
- 1, test frequency of mk
- do so
by random walk of local moves:
- with
probability 1/2. stay still
- else
Pick random edge e
- if
in Mk
and e matched, remove
- if
in Mk - 1
end e can be added, add.
- if
in Mk, e = (u, v), u
matched to w and v
unmatched, then match u to w.
- else
do nothing
- Note
that exactly one applies
- Matrix
is symmetric (undirected), so double stochastic, so stationary
distribution is uniform as desired.
- In
text, prove
= 1 - 1/nO(1) on an n
vertex graph (by proving expansion property)
- so
within nO(1)
steps, rpd is polynomially small
- so
probably doesn't matter,
Self-reducibility relationship between approximate counting and approximate
uniform generation.
Outline:
- Describe
problem. Membership oracle
P hard to volume
intersection of half spaces in n dimensions
- In low
dimensions, integral.
- even
for convex bodies, can't do better than (n/log n))n ratio
- what
about FPRAS?
Estimating
:
- pick
random in unit square
- check
if in circle
- gives
ratio of square to circle
- Extends
to arbitrary shape with ``membership oracle''
- Problem:
rare events.
- Circle
has good easy outer box
Problem: rare events:
- In 2d,
long skinny shapes
- In
high d, even round shape has exponentially
larger bounding box
Solution: ``creep up'' on volume
- Assume
P contains small sphere, radius r1
- Consider
sequence of spheres S1, S2,..., Sk
growing by 1 + 1/d radii (so volume ratio
constant)
- Estimate
ratio of S1
P to S2
P etc
- multiply
estimates; errors multiple (1 +
/n)n
- At
each step, need to random sample from Si
P
- Sample
method: random walk forbidden to leave Si
P
- eigenvalues
show rapid mixing
Method
- Run two
copies of Markov chain Xt,
Yt
- Each
considered in isolation is a copy of MC (that is, both have MC
distribution)
- but they are not independent: they make dependent
choices at each step
- in
fact, after a while they are almost certainly the same
- Start
Yt in
stationary distribution, Xt
anywhere
- Coupling
argument:
|
Pr[Xt = j]
|
=
|
Pr[Xt = j | Xt
= Yt]Pr[Xt = Yt] + Pr[Xt = j | Xt Yt]Pr[Xt Yt]
|
|
|
|
=
|
Pr[Yt = j]Pr[Xt
= Yt] + Pr[Xt = j | Xt Yt]
|
|
- So
just need to make
(which is
r.p.d.) small enough.
n-bit Hypercube walk: at each step, flip
random bit to random value
- At
step t, pick a random bit b,
random value v
- both
chains set but b to value v
- after
O(n log n) steps,
probably all bits mathched.
Counting k colorings when k > 2
+ 1
- The
reduction from (approximate) uniform generation
- compute
ratio of coloring of G to coloring of G - e
- Recurse
counting G - e
colorings
- Base
case kn
colorings of empty graph
¬… Bounding
the ratio:
o note
G - e colorings
outnumber G colorings
o By
how much? Let L colorings in difference (u and v same color)
o to
make an L coloring a G
coloring, change u to one of k -
=
+ 1
legal colors
o Each
G-coloring arises at most one way from this
o So
each L coloring has at least
+ 1 neighbors unique to them
o So
L is 1/(
+
1) fraction of G.
¬… The
chain:
o Pick
random vertex, random color, try to recolor
o loops,
so aperiodic
o Chain
is time-reversible, so uniform distribution.
¬… Coupling:
o choose
random vertex v (same for both)
o based
on Xt and Yt, choose
bijection of colors
o choose
random color c
o apply
c to v in Xt (if can), g(c)
to v in Yt (if can).
o What
bijection?
§ Let
A be vertices that agree in color, D that disagree.
§ if
v
D, let g
be identity
§ if
v
A, let N
be neighbors of v
§ let
CX be colors
that N has in X
but not Y (X
can't use them at v)
§ let
CY similar,
wlog larger than CX
§ g should swap each CX with some CY, leave other colors fixed. Result: if X doesn't change, Y doesn't
¬… Convergence:
o Let
d'(v) be number of neighbors of v
in opposite set, so
d'(v) =
d'(v) = m'
- Let
= | D|
- Note
at each step,
changes by 0,±1
- When
does it increase?
- v must be in A,
but move to D
- happens
if only one MC accepts new color
- If
c not in CX or CY, then g(c) = c and both
change
- If
c
CX,
then g(c)
CY so neither moves
- So
must have c
CY
- But
| CY|
d'(v), so probability this happens is

.
= 
- When
does it decrease?
- must
have v
D, only one moves
- sufficient
that pick color not in either neighborhood of v,
- total
neighborhood size 2
, but that counts the d'(v) elements of A twice.
- so
Prob.

.
= 
+ 
- Deduce
that expected change in
is difference of
above, namely
- 
= - a
.
- So
after t steps, E[
]
(1 - a)t
(1 - a)tn.
- Thus,
probability
> 0 at
most (1 - a)tn.
- But
now note a > 1/n2, so n2log n steps reduce to one over
polynomial chance.
Note: couple depends on state, but who cares
- From
worm's eye view, each chain is random walk
- so,
all arguments hold
Another example and application: (n, d, c)-Expanders.
- bipartite
- n vertices, regular degree d
- |
(S)|
(1 + c(1 - 2| S|/n))| S|
- factor
c more neighbors, at least until S near n/2.
- Add
self loops (with probability 1/2 to deal with
periodicity.
- What
is stationary distribution? Uniform.
- Intuition
on convergence: because neighborhoods grow, position becomes unpredictable
very fast.
- Theorem:

1 - 
- Converse
theorem: if

1 -
, get expander
with
c
4(
-
)
Gabber-Galil expanders:
- Do
expanders exist? Yes! proof: probabilistic method.
- But
in this case, can do better deterministically.
- Gabber
Galil expanders.
- Let
n = 2m2. Vertices are (x, y) where x, y
Zm (one set per side)
- 5
neighbors: (x, y),(x, x + y),(x, x + y + 1),(x + y, y),(x + y + 1, y) (add mod m)
- or
7 neighbors of similar form.
- Theorem:
this d = 5 graph
has c = (2 -
)/4, degree 7 has twice the
expansion.
- in
other words, c and d
are constant.
- meaning
= 1 -
for some constant 
- So
random walks on this expander mix very fast: for polynomially
small r.p.d., O(log n) steps of random walk suffice.
- Note
also that n can be huge, since only need to
store one vertex (O(log n) bits).
Application: conserving randomness.
- Consider
an BPP algorithm (gives right answer with probability 99/100
(constant irrelevant) using n bits.
- t independent trials with majority rule reduce
failure probability to 2-O(t)
(chernoff), but need tn bits
- in
case of RP, used 2-point
sampling to get error O(1/t) with 2n bits and t trials.
- Use
walk instead.
- vertices
are N = 2n
(n-bit) random strings for algorithm.
- edges
as degree-7 expander
- only
1/100 of vertices are bad.
- what
is probability majority of time spent there?
- in
limit, spend 1/100 of time there
- how
fast converge to limit? How long must we run?
- Power
the markov chain so

1/10 (constant number of steps)
- use
random seeds encountered every
steps.
- number
of bits needed:
- O(n) for stationary starting point
- 3
more per trial,
- Theorem:
after 7k samples, probability majority
wrong is 1/2k. So error 1/2n with O(n) bits!
- Let
B be powered transition matrix
- let
p(i) be
distribution of sample i, namely p0Bi
- Let
W be indicator matrix for good witnesses, namely 1
at diagonal i if i
is a witness.
completmentary set I - W.
- | piW|1 is probability pi is witness set. similar for nonwitness.
- Consider
a sequence of 7k results ``witness or
not''
- represent
as matrices S = (S1,..., S7k)
{W,
}7k
- claim
Pr[S] = | p(0)(BS1)(BS2) ... (BS7k)|1.
- defer:
| pBW|2
| p|2
and | pB
|2
| p|2
- deduce
if more than 7k/2
bad witnesses,
|
| p0 BSi|1
|

|
| p0 BSi|
|
|
|
|

|
( )7k/2| p0|
|
|
|
|

|
= ( )7k/2
|
|
- At
same time, only 27k bad sequences, so
error prob. 27k5-7k/2
2-k
- proof
of lemma:
- write
p =
ciei
- obviously
| pBW|
| pW|
since W jiust zeros some stuff out.
- write
p =
+ y as before where y .
= 0
- argue
that |
B
|
|
|/10 and yB
|
| y|/10, done.
- First
:
- recall
B
=
is uniform vector, all coords 1/
has only 1/100
of coordintes nonzero, so
- | e1
| =
=
1/10
- Now
y: just note | yB|
| y|/10
since 
1/10. Then
zeros out.
- summary:
part unlikely to be in witness set, y part unlikely to be relevant.