Duke DBGroup Logo

Data-intensive Computing Systems: Course Schedule

Course information
Course schedule and notes
Assignments
Readings
Project
Extra Materials
The course schedule will be posted here.
WeekDateTopicLecture slides and reference
108-29Introduction and overview Notes 1: pptx, pdf
08-31Introduction to MapReduce and Hadoop Chapter 2 in Tom White's book
209-05Working with MapReduce Notes 2: pdf
09-07Working with MapReduce (contd.) Notes 3: ppt, pdf
309-12How Hadoop Works Notes 4: ppt, pdf
09-14How Hadoop Works (contd.) Notes 4: ppt, pdf
Exercise 1
409-19 Overview of query processing Notes 5: ppt, pdf
09-21Query rewrites, Pipelining (iterators) and Materialization Notes 5: ppt, pdf
509-26Guest Lecture by Herodotos Herodotou
Starfish: A Self-Tuning System for Big Data Analytics
Slides: pptx, pdf
09-28Costing query plans, Introduction to Pig Latin Notes 6: ppt, pdf,
Reading 1 on Pig
610-03 Processing Pig Latin queries Notes 6: ppt, pdf
10-05 Processing Pig Latin queries Notes 6: ppt, pdf,
Reading 2 on Pig
710-10 No class (Fall Break)
10-12 Processing Pig Latin queries Reading 2 on Pig
810-17Block-based data storage Notes 7: ppt, pdf
10-19 Index-based access Notes 8: ppt, pdf
910-24 Index-based access (contd.) Notes 9: ppt, pdf
10-26 Midterm
1010-31 No class
11-02 Sort processing Notes 10: ppt, pdf
1111-07Introduction to Join processing Notes 10: ppt, pdf
11-09Sort-merge joins, Block and Index nested-loop joins, Hash joins Notes 10: ppt, pdf
1211-14Cost-based Query Optimization Notes 11: ppt, pdf
11-14Talk by Jeffrey Krone Slides: ppt, pdf
11-15Talk by Alan Gates Slides: pptx, pdf
11-16Failure recovery, Logging Notes 12: ppt, pdf
1311-21 Checkpointing, Concurrency control, and Serializability Notes 13: ppt, pdf, Exercises
11-23Thanksgiving break
1411-28 Concurrency control, locking Notes 14: ppt, pdf
11-30 Discussion on readings