Next: Random Sampling with a
Up: SAMPLING, HISTOGRAMS, AND RANDOM
Previous: SAMPLING, HISTOGRAMS, AND RANDOM
Faster Methods for Random Sampling
J. S. Vitter.
``Faster Methods for Random Sampling,'' Communications of the ACM,
27(7), July 1984, 703-718.
Several new methods are presented for selecting n records
at random without replacement from a file containing N records. Each algorithm selects the records for the sample
in a sequential manner--in the same order the records
appear in the file. The algorithms are online in that the
records for the sample are selected iteratively with no
preprocessing. The algorithms require a constant amount of
space and are short and easy to implement. The main result
of this paper is the design and analysis of Algorithm D,
which does the sampling in O(n) time, on the average;
roughly n uniform random variates are generated, and
approximately n exponentiation operations (of the form
a b, for real numbers a and b) are performed during
the sampling. This solves an open problem in the literature.
CPU timings on a large mainframe computer indicate that
Algorithm D is significantly faster than the sampling
algorithms in use today.
For an improved and optimized version of the random
sampling method,
see a related paper.
For reservoir methods, where n is not known in advance,
see a related paper.
Full text (Adobe pdf format)
Next: Random Sampling with a
Up: SAMPLING, HISTOGRAMS, AND RANDOM
Previous: SAMPLING, HISTOGRAMS, AND RANDOM
Jeff Vitter
2009-11-09