Fall 2015 - COMPSCI 590.7 - Algorithimic Aspects of Machine Learning

Many machine learning problems (like sparse coding or topic modeling) are hard in the worst-case, but nevertheless solved in practice by algorithms whose convergence properties are not understood.
In this course we will see many instances where we can design algorithms for machine learning problems that have rigorous guarantees. The course will cover topics such as: nonnegative matrix factorization, topic models, matrix completion and tensor decomposition. Although all these problems are NP-hard in the worst-case, we will see what properties real-life instances may satisfy and why these properties allow us to design efficient algorithms with provable guarantees.

Announcement:

Homework 3 is posted. Due 11/17/2015 in class.

Homework 2 is posted. Due 10/27/2015 in class.

Homework 1 is posted. Due 9/24/2015 in class.

Course Information

Instructor
Rong Ge
D226 LSRC
Email: rongge@cs.duke.edu

Lectures: Tuesdays and Thursdays 4:40-5:55 pm in LSRC D106

Textbook: Ankur Moitra's monograph. Lecture notes for new topics will be provided.

Prerequisites: Familiarity with algorithms, probability and linear algebra.

Evaluation:
The course grades will be determined based on:

Homework assignments (50%): students are expected to solve 3 problem sets.

Course project (50%): students are expected to complete a final course project in groups. The project could be either a survey or original research.

Course Outline

Date	Contents	References	Future Reading
8/27	Intro: Machine Learning Problems	Slides
Nonnegative Matrix Factorization and Topic Models
9/1	Matrix Factorizations, Nonnegative rank	Section 2.1 in textbook Short Slides/ illustrative example	Section 2.2 in textbook
9/3	New algorithms for separable NMF	Section 2.3 in textbook	[3]
9/8	Topic Models	Section 2.4 in textbook	[4]
9/10	Implementing the Topic Modeling Algorithm	[5] Slides	See links in the slides
[1] D. Lee and S. Seung. Learning the Parts of Objects by Nonnegative Matrix Factorization, Nature 1999. [2] S. Vavasis. On the Complexity of Nonnegative Matrix Factorization, SIOPT 2009. [3] S. Arora, R. Ge, R. Kannan and A. Moitra. Computing a Nonnegative Matrix Factorization -- Provably, STOC 2012. [4] S. Arora, R. Ge and A. Moitra. Learning Topic Models -- Going Beyond SVD, FOCS 2012. [5] S. Arora et al. A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013.
Spectral Clustering
9/15	Stochastic block model, McSherry's algorithm	[1]
9/17	Provable guarantee for Lloyd's algorithm	[2]	[2] Especially Section 6 [3] semi-random models
[1] F. McSherry. Spectral Partitioning of Random Graphs, FOCS 2001 [2] A. Kumar, R. Kannan. Clustering with spectral norm and the k-means algorithm, FOCS 2010 [3] U. Feige, J. Kilian. Heuristics for finding large independent sets, with applications to coloring semi-random graphs FOCS 1998
Tensor Decompositions
9/22	Tensor basics	Section 3.1 in textbook
9/24	Tensor Decompositions	Section 3.1 in textbook	Section 3.2 in textbook
9/29	Power Method	[4]	[4]
10/1	Applications: topic models, mixture of Gaussians	Section 3.5 in textbook	[3]
10/6	HMMs, Phylogenetic tree reconstruction, stochastic block model	Sections 3.3, 3.4 in textbook	[2]
10/8	Independenct Component Analysis, sample projects	Section 3.6 in textbook	[5]
[1]C. Hillar and L. Lim. Most Tensor Problems are NP-hard, JACM 2013. [2]E. Mossel and S. Roch. Learning Nonsingular Phylogenies and Hidden Markov Models, STOC 2005. [3]A. Anandkumar, D. Foster, D. Hsu, S. Kakade and Y. Liu A Spectral Algorithm for Latent Dirichlet Allocation, NIPS 2012. [4]A. Anandkumar, R. Ge, D. Hsu, S. Kakade, M. Telgarsky. Tensor decompositions for learning latent variable models, JMLR 2014. [5]N. Goyal, S. Vempala and Y. Xiao. Fourier PCA, STOC 2014.
(non-convex) Optimization
10/13	Fall break
10/15	Basic optimization techniques		[1] Chapter 2
10/20	Strong convexity, approximate gradient descent	[3] Section 2	[2]
10/22	Approximate gradient descent for sparse coding	[3] Section 4	[3]
10/27	Second order optimization, cubic regularization	[4]	[4], Section 5
10/29	Saddle point problem, stochastic gradient	[5]
[1]Y. Nesterov. Introductory Lectures on Convex Optimization. [2]A. Rakhlin, O. Shamir, K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization, ICML 2012. [3]S. Arora, R. Ge, T. Ma and A. Moitra. Simple, Efficient and Neural Algorithms for Sparse Coding, COLT 2015. [4]Y. Nesterov and B.T. Polyak. Cubic regularization of Newton method and its global performance. Mathematical Programming 2006. [5]R. Ge, F. Huang, C. Jin, Y. Yuan. Escaping from Saddle-Points --- Online Stochastic Gradient for Tensor Decomposition, COLT 2015.
Matrix Completion
11/3	Matrix Completion, nuclear norm, Rademacher complexity	[1]
11/5	Rademacher complexity (continued), matrix concentrations	[1]	[6]
11/10	Matrix Completion via Convex Optimization	[2], Section 7 in textbook
11/12	Matrix Completion via Convex Optimization (continued)	[2], Section 7 in textbook	[3]
11/17	Matrix Completion via Alternating Minimization	[5]
11/19	Matrix Completion via Alternating Minimization (continued)	[5]	[4]
[1] N. Srebro and A. Shraibman. Rank, Trace-Norm and Max-Norm, COLT 2005. [2] E. Candes and B. Recht. Exact Matrix Completion via Convex Optimization, FOCM 2009. [3] V. Chandrasekaran, P. Parrilo, B. Recht and A. Willsky. The Convex Geometry of Linear Inverse Problems, FOCM 2012. [4] P. Jain, P. Netrapalli and S. Sanghavi. Low-rank Matrix Completion using Alternating Minimization, STOC 2013. [5] M. Hardt. Understanding Alternating Minimization for Matrix Completion, FOCS 2014. [6] J. Tropp. User-friendly tail bounds for sums of random matrices. FOCM 2012.
11/22	Summary: what we learned and what we didn't cover	Slides