# |
Date |
Topic |
Readings |
Sampling |
|||
1 |
Thu, Jan 10 |
Reservoir Sampling (slides) |
J. Vitter, “Random Sampling with a Reservoir”, ACM Transactions on Mathematical Software, 1985 |
2 |
Tue, Jan 15 |
Sampling from databases (slides) |
N. Dalvi, R. Kumar, A. Machanavajjhala, V. Rastogi,
"Sampling hidden objects using nearest-neighbor oracles",
KDD 2011. Optional: Frank Olken "Random Sampling from Databases", PhD Thesis |
3 |
Thu, Jan 17 |
Monte Carlo Estimation, DNF counting (slides) |
R. Karp, M. Luby, N. Madras, "Monte Carlo Estimation Algorithm for Enumeration Problems", Journal of Algorithms 10(3) 1989 |
4 |
Tue, Jan 22 |
Markov Chain Monte Carlo (slides) | L. Lovasz "Random Walks on Graphs: A Survey", Combinatorics 1993 |
5 |
Thu, Jan 24 |
Coupling and Mixing Times (notes) Assignment 1 is posted |
|
Streaming Algorithms | |||
6 |
Tue, Jan 29 |
Filtering & Estimating the number of distinct objects (slides) |
Chapter 4: Section 4.3, 4.4, Ullman & Rajaraman textbook |
7 |
Thu, Jan 31 |
Frequency Moments (notes) |
Chapter 3: Section 4.5, Ullman & Rajaraman textbook Alon, Mathias, Szegedy, "The space complexity of approximating the frequency moments", STOC 96 |
8 |
Tue, Feb 5 |
Heavy-Hitters & Count min sketch (Andrew McGregor's slides) |
Cormode, Muthukrishnan, "An improved data stream summary: The count min sketch and its applications" |
9 |
Thu, Feb 7 |
Online Aggregation (slides) |
Hellerstein, Haas, Wang, "Online Aggregation", SIGMOD '97 Haas, Hellerstein, "Ripple Joins", SIGMOD '99 Chaudhuri, Motwani, Narasayya, "On Random Sampling over Joins", SIGMOD'99 |
10 |
Tue, Feb 12 | Weighted Majority & online learning (slides) | |
Parallel Architechtures |
|||
11 |
Thu, Feb 14 |
Map Reduce basics (slides) |
Chapter 2: Sections 2.1, 2.2, 2.3, Ullman & Rajaram textbook |
Tue, Feb 19 |
NO CLASS |
||
12 |
Thu, Feb 21 |
Evaluating Joins on Map Reduce (slides) Project Proposal is Due |
Okcan, Reideweld, "Processing Theta Joins using Map Reduce", SIGMOD '11 |
13 |
Tue, Feb 26 |
Graphs on Map Reduce (slides) |
Rastogi, Machanavajjhala, Chitnis, Das Sarma, "Finding Connected Components in Map Reduce in Logarithmic Rounds", ICDE 2013 Kang, Tsourakis, Faloutsos, "Pegasus: a Peta Scale Graph Mining System", ICDM 2009 |
14 |
Thu, Feb 28 |
Graph Processing & Iterations (slides) Assignment 2 is posted |
Malewicz et al, "Pregel: A system for large scale graph processing", SIGMOD 2010 |
15 |
Tue, Mar 5 |
Asynchronous Processing (slides) |
Low, Gonzales, Kyrola, Bickson, Guestrin, Hellerstein, "Distributed Graphlab", VLDB 2012 Wang, Xie, Demers, Gehrke, "Asynchronous Large Scale Graph Processing Made Easy", CIDR 2013 |
16 |
Thu, Mar 7 |
Feed Following (slides) | Silberstein, Terrace, Cooper, Ramakrishnan, "Feeding Frency: Selectively Materializing User's Event Feeds", SIGMOD 2010 Silberstein, Machanavajjhala, Ramakrishnan, "Feed Following", DBSocial 2011 |
Tue, Mar 12 |
NO CLASS Springbreak |
||
Thu, Mar 14 |
NO CLASS Springbreak | ||
Thu, Mar 21 |
Buffer |
||
Clustering & Deduplication | |||
17 |
Tue, Mar 26 |
Clustering (slides) |
Chapter 7: Ullman & Rajaraman textbook McCallum, Nigam, Ungar, "Efficient Clustering of High Dimensional Data Sets", KDD 2000 |
18 |
Thu, Mar 28 |
Deduplication (slides) | Chapter 3: Ullman & Rajaraman textbook |
19 |
Tue, Apr 2 |
Efficient identification of near duplicates & Locality Sensitive Hashing (slides) | Chapter 3: Ullman & Rajaraman textbook |
20 |
Thu, Apr 4 |
Correlation Clustering (slides) | Bansal, Chawla, Blum, "Correlation Clustering", ML 2004 Elsner, Schudy, "Bounding and comparing methods for correlation clustering", ILP-NLP 2009 |
21 |
Tue, Apr 9 |
Deduplication in Linked Data (slides) | Bhattacharya, Getoor, "Collective Entity Resolution in relational data", TKDD 2007 |
22 |
Thu, Apr 11 |
Collective Entity Resolution & Scalability (slides) | Rastogi, Dalvi, Garofalakis, "Large Scale Collective Entity Matching", VLDB 2011 |
23 |
Tue, Apr 16 |
Active Learning | Sarawagi, Bhamidipaty, "Iterative Deduplication using Active Learning", KDD 2002 Balcan, Beygelzimer, Langford, "Agnostic Active Learning", ICML 2006 |
24 |
Mon, Apr 29 |
Project Presentations |