next up previous
Next: Random Sampling with a Up: SAMPLING, HISTOGRAMS, AND RANDOM Previous: SAMPLING, HISTOGRAMS, AND RANDOM

   
Faster Methods for Random Sampling

J. S. Vitter. ``Faster Methods for Random Sampling,'' Communications of the ACM, 27(7), July 1984, 703-718. Several new methods are presented for selecting n records at random without replacement from a file containing N records. Each algorithm selects the records for the sample in a sequential manner--in the same order the records appear in the file. The algorithms are online in that the records for the sample are selected iteratively with no preprocessing. The algorithms require a constant amount of space and are short and easy to implement. The main result of this paper is the design and analysis of Algorithm D, which does the sampling in O(n) time, on the average; roughly n uniform random variates are generated, and approximately n exponentiation operations (of the form a b, for real numbers a and b) are performed during the sampling. This solves an open problem in the literature. CPU timings on a large mainframe computer indicate that Algorithm D is significantly faster than the sampling algorithms in use today.

For an improved and optimized version of the random sampling method, see a related paper. For reservoir methods, where n is not known in advance, see a related paper.

Full text (Adobe pdf format)


next up previous
Next: Random Sampling with a Up: SAMPLING, HISTOGRAMS, AND RANDOM Previous: SAMPLING, HISTOGRAMS, AND RANDOM
Jeff Vitter
2009-11-09