next up previous
Next: An Efficient Algorithm for Up: SAMPLING, HISTOGRAMS, AND RANDOM Previous: Faster Methods for Random

   
Random Sampling with a Reservoir

J. S. Vitter. ``Random Sampling with a Reservoir,'' ACM Transactions on Mathematical Software, 11(1), March 1985, 37-57.

We introduce fast algorithms for selecting a random sample of nrecords without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in $ O(n(1 + \log(N/n)))$ expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.

For sampling methods where n is known in advance, see a related paper.

Full text (Adobe pdf format)



Jeff Vitter
2009-11-09