Computational Methods for Spatial Scan Statistics
Jeff M Phillips
Summary
Spatial scan statistics are used to find anomalous regions among spatial data by comparing a background population data set to a reported data set. A region is said to be more anomalous when the contained reported data is more different from the contained population data. This difference is measured using a likelihood ratio test. For instance, the population data could represent the locations and population of households on a map, and the reported data could represent locations of people diagnosed with a particular disease. The region with the largest spatial scan statistic corresponds to the region that most likely describes an outbreak of that disease, and the degree to which it might be an outbreak can be quantified. Although there are multiple statistical techniques to handle these types of problems, spatial scan statistics are superior in that they have empirically been shown to be the most powerful, and they do not fix a particular size and shape of a region, but rather allow for a class of regions which can vary in size.
Our contributions generalize and formalize these techniques.
We show how to extend the basic Poisson and Bernoulli models to all 1-parameter exponential families, notably including Gaussian and gamma models.
We expand the set of families of shapes to those with constant VC-dimension.
We apply these techniques to graph clustering to give statistical models for identifying graph clusters.
We give bounds on computing these measures under a streaming model.
We also consider different approaches to approximating these measures with provable error bounds. These approaches compare favorably under several empirical tests to heuristic techniques.
Software for many of these approaches is available upon request.
Visuals
Example of population data (blue), measured data (red), and anomalous region (black box).
Publications (and other documents)
Spatial Scan Statistics for Graph Clustering. (to appear)
    
Bei Wang, Jeff M. Phillips, Robert Schrieber, Dennis Wilkinson,
Nina Mishra, Robert Tarjan.
8th SIAM Intenational Conference on Data Mining.
April 2008.
Spatial Scan Statistics: Approximations and Performance Study.
    
Deepak Agarwal, Andrew McGregor, Jeff M. Phillips, Suresh Venkatasubramanian, Zhengyuan Zhu.
12th Annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
August 2006.
The Hunting of the Bump: On Maximizing Statistical Discrepancy.
    
Deepak Agarwal, Jeff M. Phillips, Suresh Venkatasubramanian.
17th Annual ACM-SIAM Symposium on Discrete Algorithms.
January 2006.
    
abstract for Fall Workshop
on Computational Geometry. November 2005.
Algorithms for eps-Approximations of Terrains.
    
Jeff M. Phillips.
Manuscript (available upon request).
April 2007.
jeff's home page