Duke Computer Science Colloquium

Statistical Sampling for Big Data Logistic Regression

Speaker:Tong Zhang
Date: Monday, November 28, 2016
Time: 12:00pm - 1:00pm
Location: D106 LSRC, Duke
Pizza will be served at 11:45.

Abstract

Many modern big-data machine learning problems encountered in the industry involve optimization problems so large that traditional optimization methods are difficult to handle. In this talk, I will present a novel statistical sampling method for multi-class logistic regression that can be used to select a small number of the most effective data points. Asymptotically we show that the proposed method can achieve variance no more than s times that of the full-data MLE with no more than 1/s of the full data in the worst case; moreover the required sample size can be significantly smaller than 1/s of the full data when the classification accuracy is relatively high. We demonstrate how to use such sampling methods in real applications. Joint work with Lei Han and Ting Yang

Biography

Dr. Tong Zhang is affiliated with Rutgers University. Previously he has worked at IBM T.J. Watson Research Center in Yorktown Heights, New York, Yahoo Research in New York City, and Baidu Inc in Beijing. His research interests include machine learning, big data and their applications.

Hosted by:
Cynthia Rudin