Duke Computer Science Colloquium
Statistical Sampling for Big Data Logistic Regression
||Monday, November 28, 2016
||12:00pm - 1:00pm
||D106 LSRC, Duke
||Pizza will be served at 11:45.
Many modern big-data machine learning problems encountered in the industry involve optimization problems so large that traditional optimization methods are difficult to handle. In this talk, I will present a novel statistical sampling method for multi-class logistic regression that can be used to select a small number of the most effective data points. Asymptotically we show that the proposed method can achieve variance no more than s times that of the full-data MLE with no more than 1/s of the full data in the worst case; moreover the required sample size can be significantly smaller than 1/s of the full data when the classification accuracy is relatively high. We demonstrate how to use such sampling methods in real applications.
Joint work with Lei Han and Ting Yang
Dr. Tong Zhang is affiliated with Rutgers University. Previously he has worked at IBM T.J. Watson Research Center in Yorktown Heights, New York, Yahoo Research in New York City, and Baidu Inc in Beijing. His research interests include machine learning, big data and their applications.
Hosted by: Cynthia Rudin