Full text (gzip-compressed postscript)
A memory-based learning system is an extended memory management system
that decomposes the input space either statically or dynamically into
subregions for the purpose of storing and retrieving functional
information. The main generalization techniques employed by memory-based
learning systems are the nearest-neighbor search, space decomposition
techniques, and clustering. Research on memory-based learning is still
in its early stage. In particular, there are very few rigorous
theoretical results regarding memory requirement, sample size, expected
performance, and computational complexity. In this paper, we propose a
model for memory-based learning and use it to analyze several
methods--
-covering, hashing, clustering, tree-structured
clustering, and receptive-fields--for learning smooth functions. The
sample size and system complexity are derived for each method. Our
model is built upon the generalized PAC learning model of Haussler and
is closely related to the method of vector quantization in data
compression. Our main result is that we can build memory-based learning
systems using new clustering algorithms of Lin and Vitter to PAC-learn
in polynomial time using only polynomial storage in typical situations.
Keywords: Memory-based learning, PAC learning, clustering, approximation, linear programming, relaxation, covering, hashing.