Markov chains:
- Powerful
tool for sampling from complicated distributions
- rely
only on local moves to explore state space.
- Many
use Markov chains to model events that arise in nature.
- We create
Markov chains to explore and sample from problems.
2SAT:
- Fix
some assignment A
- let f (k) be expected time to get all n
variables to match A if n
currently match.
- Then f (n) = 0, f (0) = 1 + f (1), and f (k) = 1 +
(f (k + 1) + f (k - 1).
¬… Rewrite:
f (0) - f (1) = 1 and f (k) - f (k + 1) = 2 + f (k - 1) - f (k)
¬… So
f (k) - f (k - 1) = 2k + 1
¬… deduce
f (0) = 1 + 3 + ... +
(2n - 1) = n2
¬… so,
find with probability 1/2 in 2n2 time.
¬… With
high probability, find in O(n2log n.
More general formulation: Markov chain
- State
space S
- markov
chain begins in a start state X0,
moves from state to state, so output of chain is a sequence of states X0, X1,...= {Xt}t
= 0
- movement
controlled by matrix of transition probabilities pij =
probability next state will be j given
current is i.
- thus,
pij = 1 for every i
S
- implicit
in definition is memorylessness property:
Pr[Xt + 1 = j | X0
= i0, X1 = i1,..., Xt
= i] = Pr[Xt + 1 = j | Xt
= i] = pij.
- Initial
state X0
can come from any probability distribution, or might be fixes (trivial
prob. dist.)
- Dist
for X0
leads to dist over sequences {Xt}
- Suppose
Xt has
distribution q (vector, qi is prob. of state i).
Then Xt + 1
has dist qP. Why?
- Observe
Pr[Xt + r = j | Xt = i] = Prij
Graph of MC:
- Vertex
for every state
- Edge (i, j) if pij > 0
- Edge
weight pij
- weighted
outdegree 1
- Possible
state sequences are paths through the graph
Stationary distribution:
- a
such that
P = 
- left
eigenvector, eigenvalue 1
- steady
state behavior of chain: if in stationary, stay there.
- note
stationary distribution is a sample from state space, so if can get right
stationary distribution, can sample
- lots
of chains have them.
- to
say which, need definitions.
Things to rule out:
- disconnected
graph
- infinite
directed line
- 2-cycle
Irreducibility
- any
state can each any other state
- i.e.
path between any two states
- i.e.
single strong component in graph
Persistence/Transience:
- rij(t)
is probability first hit state j at t, given start state i.
- fij is
probability eventually reach j
from i, so
rij(t)
- expected
time to reach is hitting time hij =
trij(t)
- If fij < 1
then hij =
since might never reach. Converse not always true.
- If fii < 1,
state is transient. Else persistent. If hii =
, null persistent.
Periodicity:
- Periodicity
of a state is max T such that some state
only has nonzero probability at times a + Ti for integer i
- Chain
periodic if some state has periodicity > 1
- In
graph, all cycles containing state have length multiple of T
- Easy
to eliminate: add self loops
- slows
down chain, otherwise same
Ergodic:
- aperiodic
and non-null persistent
- means
might be in state at any time in (sufficiently far) future
Fundamental Theorem of Markov chains: Any irreducible, finite, aperiodic
Markov chain satisfies:
- All
states ergodic (none transient)
- unique
stationary distribution
, with all
> 0
- fii = 1 and hii = 1/
- number
of times visit i in t
steps approaches t
in limit of t.
Justify all except
uniqueness here.
Finite irreducible aperiodic implies ergodic
- graph
has strong components
- final
strong component has no outgoing edges
- once
leave nonfinal component, cannot return
- If
two vertices in same strong component, have path between them
- so
nonzero probability of reaching in (say) n
steps.
- if
nonfinal, nonzero probability of leaving in n
steps.
- so,
vertices in nonfinal components are transient
- if
final, will eventually reach every vertex
- so,
vertices in final components are persistent
- geometric
distribution on time to reach, so expected time finite. Not
null-persistent
- Thus:
If number of states finite, no null persistent states.
- Markov
chain ireducible if graph is one strong component.
- so
all states persistent.
Intuitions for quantities:
- hii is
expected return time
- So
hit every 1/hii
steps on average
- So hii = 1/
- If
in stationary dist, t
visits follows from linearity of expectation
- general
Markov chains are directed graphs. But undirected have some very nice
properties.
- take
a connected, non-bipartite undirected graph on n
vertices
- states
are vertices.
- move
to uniformly chosen neighber.
- So puv = 1/d (u) for every neighbor v
- stationary
distribution:
= d (v)/2m
- unqiqueness
says this is only one
- deduce
hvv = 2m/d (v)
Definitions:
- Hitting
time huv
is expected time to reach u from v
- commute
time is huv + hvu
- Cu(G) is expected time to visit all vertices of G, starting at u
- cover
time is
Cu(G) (so in
fact is max over any starting distribution).
- let's
analyze max cover time
Examples:
- clique:
commute time n, cover time
(n log n)
- line:
commute time betweek ends is
(n2)
- lollipop:
huv =
(n3)
while hvu =
(n2)
(big difference!)
- also
note: lollipop has edges added to line, but higher cover time: adding
edges can increase cover time even though improves connectivity.
general graphs: adjacent vertices:
- lemma:
for adjcaent (u, v), huv + hvu
2m
- proof:
new markov chain on edge traversed following vertex MC
- transition
matrix is doubly stochastic: column sums are 1 (exactly d (v) edges can transit to edge (v, w), each does so with
probability 1/d (v))
- In
homework, show such matrices have uniform stationary
distribution.
- Deduce
= 1/2m.
Thus hee = 2m.
- So
consider suppose original chain on vertex v.
- suppose
arrived via (u, v)
- expected
to traverse (u, v) again in 2m steps
- at
this point will have commuted u to v and back.
- so
conditioning on arrival method, commute time 2m
(thanks to memorylessness)
General graph cover time:
- theorem:
cover time O(mn)
- proof:
find a spanning tree
- consider
a dfs of tree-crosses each edge once in each direction, gives order v1,..., v2n - 1
- time
for the vertices to be visited in this order is upper bounded by commute
time
- but
vertices adjacent, so commute times O(m)
- total
time O(mn)
- tight
for lollipop, loose for line.
Tighter analysis:
- analogue
with electrical networks
- Assume
unit edge resistance
- Kirchoff's
law: current (rate of transitions) conservation
- Ohm's
law
- Gives
effective resistance Ruv
between two vertices.
- Theorem:
Cuv = 2mRuv
- (tightens
previous theorem, since Ruv
1)
- Proof:
- Suppose
put d (x) amperes into every x,
remove 2m from v
voltage at u
with respect to v
- Ohm:
Current from u to w
is
- 
- Kirchoff:
d (u) =

-
= d (u)
- 

- Also,
huv =
(1/d (u))(1 + hwv)
- same
soln to both linear equations, so
= huv
- By
same arg, hvu
is voltage at v wrt u,
if insert 2m at u
and remove d (x) from every x
- add
linear systems, find huv + hvu is voltage difference when insert 2m at u and
remove at v.
- now
apply ohm.
Testing graph connectivity in logspace.
- Deterministic
algorithm (matrix squaring) gives log2n
space
- Smarter
algorithms gives log4/3n space
- log n open
- Randomized
logspace achieves one-sided error
universal traversal sequences.
- Define
labelled graph
- UTS
covers any labelled graph
- deterministic
construction known for cycle only
- we
showed cover time O(n3)
- so
probability takes more than 2n3 to cover is 1/2
- repeat
k times. Prob fail 1/2k
- How
many graphs? (nd )O(nd )
- So
set k = O(nd log nd )
- probabilistic
method