Outline
- Set
has size u, contains n
``special'' elements
- goal:
count number of special elements
- sample
with probability p = c(log n)/
n
- with
high probability, (1±
)np special elements
- if
observe k elements, deduce n
(1±
)k.
- Problem:
what is p?
Related idea: Monte Carlo simulation
- Probability
space, event A
- easy
to test for A
- goal:
estimate p = Pr[A].
- Perform
n trials (sampling with replacement).
- expected
outcome pn.
- estimator

Ii
- prob
outside
< exp(-
np/3) (
< 1)
- for
prob.
, need
n = O

- what
if p unknown?
- What
if p is small?
Handling unknown p
- Sample
n times till get
=
O(log
/
) hits
- w.h.p,
p
(1±
)
n
Min-cut
- saw
RCA,
(n2)
time
- Another
candidate: Gabow's algorithm:
(mc) time on m-edge graph
with min-cut c
- nice
algorithm, if m and c
small. But how could we make that happen?
- Similarly,
for those who know about it, augmenting paths gives O(mv) for max flow. Good
if m, v
small. How make happen?
- Sampling!
What's a good sample? (take suggestions, think about them.
- Define
G(p)--pick each edge with probability p
Intuition:
- G has m edges,
min-cut c
- G(p) hss pm edges, min-cut pc
- So
improve Gabow runtime by p2
factor!
What goes wrong? (pause for discussion)
- expectation
isn't enough
- so
what, use chernoff?
- min-cut
has c edges
- expect
to sample
= pc of
them
- chernoff
says prob. off by
is at most 2e- 
/4
- so
set pc = 8 log n
or so, deduce with high probability, no min-cut deviates.
- (pause
for objections)
- yes, a
problem: exponentially many cuts.
- so
even though Chernoff gives ``exponentially small'' bound, accumulation of
union bound means can't bound probability of small deviation over all
cuts.
Surprise! It works anyway.
- Theorem:
if min cut c and build G(p), then ``min expected
cut'' is
= pc.
Probability any cut deviates by more than
is O(n2e- 
/3).
- So,
if get
around 12(log n)/
, all cuts within
of expectation with high probability.
- Do
so by setting p = 12(log n)/c
- Application:
min-cut approximation.
- Theorem
says a min-cut will get value at most (1 +
)
whp
- Also
says that any cut of original value (1 +
)c/(1 -
) will get value at most (1 +
)
- So,
sampled graph has min-cut at most (1 +
)
, and whatever cut
is minimum has value at most (1 +
)c/(1 -
)
(1 + 2
)c in original graph.
- How
find min-cut in sample? Gabow's algorithm
- in
sample, min-cut O((log n)/
) whp, while number of edges is O(m(log n)/
c)
- So,
Gabow runtime
(m/
c)
- constant
factor approx in near linear time.
Proof of Theorem
- Suppose
min-cut c and build G(p)
- For
midterm, you had to prove bound on number of
-minimum cuts.
- I
assume you all did that
- well,
maybe not, but proof will be in solutions
- So
we take as given: number of cuts of value less than
c is at most n2
(this is true,
though probably slightly stronger than what you proved. If use O(n2
), get same result
but messier.
- First
consider n2
smallest cuts. All have expectation at least
,
so prob any deviates is e- 
/4 = 1/n2 by choice
of 
- Write
larger cut values in increasing order c1,...
- Then
cn2
>
c
- write
k = n2
, means
= log k/log n2
- What
prob ck
deviates? e-
pck/4 = e-


/4
- By
choice of
, this is k-2
- sum
over k > n2, get O(1/n)
Problem outline
- databases
want size
- matrix
multiply time
- compute
reachibility set of each vertex, add
Sampling algorithm
- generate
vertex samples until

reachable from v
- deduce
size of v's reachibility set.
- reachability
test: O(m).
- number
of sample: n/size.
- O(mn) per vertex--ouch!
Pipeline for all vertices simultaneously
- increase
mean to O(log n/
),
- so 1/n2 failure
- O(mn) for all vertices (still ouch).
Avoid wasting work
- after
O(n log n) samples,
every vertex has log n hits. No more
needed.
- Send
at most log n samples over an edge:
(m)