We introduce fast algorithms for selecting a random sample of nrecords without replacement from a pool of N records, where
the value of N is unknown beforehand. The main result of the
paper is the design and analysis of Algorithm Z; it does the
sampling in one pass using constant space and in
expected time, which is optimum, up to a constant
factor. Several optimizations are studied that collectively
improve the speed of the naive version of the algorithm by
an order of magnitude. We give an efficient Pascal-like
implementation that incorporates these modifications and
that is suitable for general use. Theoretical and empirical
results indicate that Algorithm Z outperforms current
methods by a significant margin.
For sampling methods where n is known in advance, see a related paper.