Day | Topic | Paper(s) | Slides | Comments | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 8/25 | Introduction and Review of SQL |
Lecture-1 | As a guest lecturer | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Classical Topics(OLAP Cube - Data Warehouse - Data Mining) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | 8/27 | Data Cube Overview and Implementation (1) |
Main reading: Agarwal-Agrawal-Deshpande-Gupta-Naughton-Ramakrishnan-Sarawagi, VLDB 1996 On the Computation of Multidimensional Aggregates (link) Optional reading: Gray-Chaudhuri-Bosworth-Layman-Reichart-Venkatrao-Pellow-Pirahesh, ICDE 1996 Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals (link) |
Lecture-2 | As a guest lecturer | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | 9/1 | Cube Implementation (2) |
Main reading: Harinarayan-Rajaraman-Ullman, SIGMOD 1996 Implementing data cubes efficiently (link) |
Lecture-3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | 9/3 | Data warehousing and Iceberg Queries |
Main reading: Chaudhuri-Dayal, SIGMOD Record 1997 An Overview of Data Warehousing and OLAP Technology (link) Fang-Shivakumar- Garcia-Molina -Motwani-Ullman, VLDB 1998 Computing Iceberg Queries Efficiently (link) |
Lecture-4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | 9/8 |
Index for ROLAP and a MOLAP cube (array-based) implementation |
Main reading: Gupta-Harinarayan-Rajaraman-Ullman, ICDE 1997 Index Selection for OLAP (link) Zhao-Deshpande-Naughton, SIGMOD 1997 An Array-Based Algorithm for Simultaneous Multidimensional Aggregates (link) |
Lecture-5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | 9/10 | Mining Association Rules |
Main reading: Agrawal-Srikant, VLDB 1994 Fast Algorithms for Mining Association Rules in Large Databases (link) |
Lecture 6 | Presentation by the authors in VLDB'04 (Test-of-time award) slides |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9/13 |
Project Proposal Due
Review Question-1: Review (0.5 to 1 page) due for "OLAP - Data Cube - Data Warehouse": Write a short review of one of the papers. or Can you think about a research direction related to your own research interests that will benefit from the Data Cube operator (provided by a DBMS or a new implementation)? Write about it. Mention the technical challenges, and if any of the approaches covered in the class can possibly help. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Provenance |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | 9/15 | Provenance Overview and Semirings |
Main reading: Green-Karvounarakis-Tannen, PODS 2007 Provenance Semirings (link) Optional reading: Buneman-Khanna-Tan, ICDT 2001 Why and where: A Characterization of Data Provenance (link) Cheney-Chiticariu-Tan, Foundations and Trends in Databases 2009 Provenance in Databases: Why, How, and Where (link) |
EDBT/ICDT 2010 keynote by Dr. Val Tannen (link) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 9/17 | Why-Not Queries (Query Based) |
Main reading: Chapman-Jagadish, SIGMOD 2009 Why Not? (link) Optional reading: Tran-Chan, SIGMOD 2010 How to Conquer Why-Not Questions (link) |
Lecture-8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 9/22 | Explanations for Databases ("Detour" lecture for projects) |
Main reading: Roy-Suciu, SIGMOD 2014 A Formal Approach to Finding Explanations for Database Queries (link) Optional reading: Wu-Madden, PVLDB 2013 Scorpion: Explaining Away Outliers in Aggregate Queries (link) |
Lecture-9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 9/24 | Deletion Propagation + Why-Not Queries (Data Based) |
Main reading: Buneman-Khanna-Tan, PODS 2002 On Propagation of Deletions and Annotations through Views (link) Huang-Chen-Doan-Naughton, PVLDB 2008 On the Provenance of Non-Answers to Queries over Extracted Data (link) Optional reading: Kimelfeld-Vondrak-Williams, PODS 2011 Maximizing Conjunctive Views in Deletion Propagation (link) Herschel-Hernandez, PVLDB 2010 Explaining Missing Answers to SPJUA Queries (link) |
Lecture-10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9/27 | Review Question-2: Review (0.5 to 1 page) due for "Provenance":
Write a short review of one of the papers. or Write about a potential research direction that involves tracing provenance (of data or query answers) and is related to your own research interests. Also mention the technical challenges, and if any of the approaches covered in the class can possibly help. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Uncertain Data |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
11 | 9/29 | Probabilistic Databases |
Main reading: (Chapters 3-5) Suciu-Olteanu-Re-Koch, 2011 Book: Probabilistic Databases (link) Optional reading: Dalvi-Suciu, JACM 2012 The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries (link) Dalvi-Suciu, VLDB 2004 Efficient Query Evaluation on Probabilistic Databases (link) |
Lecture-11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12 | 10/1 | Probabilistic Databases (Continued) |
Continued | Lecture-12 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
13 | 10/6 | Incomplete Databases |
Main reading: (Chapters 19) Abiteboul-Hull-Vianu Book: Foundations of Databases (link) Optional reading: Imielinski-Lipski, JACM 1984 Incomplete Information in Relational Databases (link) |
Lecture-13 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
14 | 10/8 | Inconsistent Databases |
Reading: Chomicki, ICDT 2007 (Invited talk) Consistent Query Answering: Five Easy Pieces (link) Bertossi, SIGMOD Record 2006 Consistent Query Answers in Databases (link) Arenas-Bertossi-Chomicki, PODS 1999 Consistent Query Answers in Inconsistent Databases (link) |
Lecture-14 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10/11 | Review Question-3:
Submit homework on uncertain data.
|
10/13 |
Fall Break |
Causality15 |
10/15 |
Causality in AI |
Midterm Project Progress Report Due
Optional reading: |
Halpern-Pearl, UAI 2001 Causes and Explanations: A Structural-Model Approach - Part I : Causes (link) Pearl, UCLA Tech Report 1999 Probabilities of Causation: Three Counterfactual Interpretations and their Identification (link) Meliou-Roy-Suciu, VLDB 2014 Tutorial: Causality and Explanations in Databases (link) Lecture-15 |
| 16 |
10/20 |
Causality in Databases |
Main reading: |
Meliou-Gatterbauer-Moore-Suciu, PVLDB 2010 The Complexity of Causality and Responsibility for Query Answers and Non-Answers (link) Optional reading: Meliou-Gatterbauer-Nath-Suciu, SIGMOD 2011 Tracing Data Errors with View-Conditioned Causality (link) Lecture-16 |
|
17 |
10/22 |
Causality in Statistics |
Optional reading: |
Rubin, Journal of the American Statistical Association, 2005 Causal Inference Using Potential Outcomes: Design, Modeling, Decisions (link) Lecture-17 |
Student presentations
|
|
| No review for this topic. Work on your project! | Data Analysis with Humans18 |
10/27 |
Database Usability |
Main reading: |
Jagadish-Chapman-Elkiss-Jayapandian-Li-Nandi-Yu, SIGMOD 2007 Making Database Systems Usable (link) Optional reading: Li-Chan-Maier, VLDB 2015 Query From Examples: An Iterative, Data-Driven Approach to Query Construction (link) Lecture-18 |
Student presentations |
19 |
10/29 |
Crowdsourcing Systems |
Main reading: |
Franklin-Kossmann-Kraska-Ramesh-Xin, SIGMOD 2011 CrowdDB: Answering Queries with Crowdsourcing (link) Marcus-Wu-Karger-Madden-Miller, PVLDB 2011 Human-powered Sorts and Joins (link) Optional reading: Parameswaran-Park- Garcia-Molina -Polyzotis-Widom, CIKM 2012 Deco: Declarative Crowdsourcing (link)
| Slides available online |
| 11/1 |
Review Question-4: Review (0.5 to 1 page) due for "Data Analysis with Humans":
| Write a short review of one of the papers. or Write about a potential research direction that includes data analysis and human interaction (either as a user or as the crowd), and is related to your own research interest. Also mention the technical challenges, and if any of the approaches covered in the class can possibly help. 20 |
11/3 |
Crowdsourcing Operators |
(Max)
Main reading: |
Guo-Parameswaran- Garcia-Molina, SIGMOD 2012 So Who Won? Dynamic Max Discovery with the Crowd (link) Davidson-Khanna-Milo-Roy, TODS 2014 Top-k and Clustering with Noisy Comparisons (link) Optional reading: Feige-Raghavan-Peleg-Upfal, SIAM Journal on Computing, 1994 Computing with Noisy Information (link) Lecture-20 |
Slides available online |
Systems for Data Analysis21 |
11/5 |
Tools for ML in Databases |
Overview (one as main): |
Hazy, MAD Skills, MLbase
| Slides available online |
22 |
11/10 |
Tools for Large-scale Analytics |
Overview (one as main): |
Dremel, Shark, Spark
| Student presentations |
23 |
11/12 |
Tools for Visualization |
Overview (one as main): |
Tableau, Graph Lab, Data Wrangler
| Slides available online |
| 11/15 |
Review Question-5:
| Hands-on experience on one data-analytics system of your choice (listed or not-listed): Install the system, choose a dataset, run queries, write about your observation, and attach the graphs/tables (< 1 page). You can try to use it for your project. You may want to start early. This is the last review. 24 |
11/17 |
Feature Selection in Analytics |
Main reading: |
Zhang-Kumar-Re, SIGMOD 2014 Materialization Optimizations for Feature Selection Workloads (link) Lecture-24 |
| 25 |
11/19 |
Efficient Feature Selection Algorithms |
|
| Student presentations |
Conclusions26 |
11/24 |
Project Demonstrations |
and Presentations Final Project Report Due
|
|
| |