CompSci 516

Data Intensive Computing Systems

Fall 2016

News



      Day/Time: Wednesdays and Fridays, 4:40 pm - 5:55 pm
      PlaceLSRC A247

      Instructor: Sudeepa Roy
  • Email: sudeepa AT cs.duke.edu
  • Office Hour: LSRC D325, Mondays 1:30 pm - 2:30 pm
      TA: Junghoon Kang
  • Email: jungkang AT cs.duke.edu
  • Office Hour: North N303B, Thursdays 3 pm - 4 pm

    Overview

    This is the graduate database course. This course will cover principles and design of database management systems at an advanced level.

    Topics will include:
    SQL/Relational Algebra/Relational Calculus, Database Normalization, DBMS Architecture/Storage, Indexing/Hashing, Query Algorithms and Optimizations, Transactions and Recovery, Parallel DB/Map Reduce/Distributed query processing, NOSQL/Column store, Datalog, Advanced and Research Topics in Databases (TBD).

    Textbooks:
    1. [RG] (Main) Database Management Systems (third edition); Raghu Ramakrishnan and Johannes Gehrke.
    2. [GUW] (Additional) Database Systems: The Complete Book (second edition); Hector Garcia-Molina, Jeffrey Ullman, and Jennifer Widom

    Prerequisites:
    An introductory database course (CompSci 316 or equivalent) or consent of the instructor. Some background in Algorithms, Data Structure, and Discrete Maths will be assumed as well. Undergraduate students with the necessary background and interests are welcome.

    Grading
    • Homework (3): 30%
    • Project: 20%
    • Midterm: 20%
    • Final: 30%

    Homework
    There will be three homework assignments. They have to solved strictly individually by every student (see the honor code below). There are no late days. We will use Sakai for homework submission and Piazza for discussions.

    Project
    There will be a semester-long project on topics chosen by the students in groups of 1, 2, or 3 (bigger groups will be expected to do more work). Students are encouraged to choose a research project of their own research interests that is related to data management / processing / visualization / applications / theory. Some ideas of the projects will be posted later.

    The deliverables of the project will be (1) a project proposal (1-3 pages), (2) a midterm project report (3-5 pages), (3) the final project report (4-8 pages), and (4) a short (~10 minutes) class presentation at the end. The same document will be updated through the semester as the project progresses. A template of the project report will be posted on sakai later.

    Exams
    Midterms and finals are closed book and closed notes, and in class. No electronic devices are allowed.

    Honor Code:
    Under the Duke Honor Code, the students are expected to submit their own work in this course in the homework and exams (note that the students can work on the project in groups). The students are allowed (and are encouraged) to discuss the course material with other students, but need to solve problems in the homeworks and exams on their own. Any assistance received must be clearly indicated in the solutions -- failure to do so will be considered a violation of the Honor Code. In any event, the students are responsible for understanding and being able to explain on their own all solutions that they submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

    What is allowed/not allowed

    Schedule

    (subject to change)

      Day Topic Slides      Reading
    1 8/31 (W) Introduction and Data Models Lecture-1 [RG] 1.1, 1.3, 1.4, 1.5
    2 9/2 (F) SQL Lecture-2 [RG] 3, 5 (also see 4.2.4)
    [GUW] 6
    3 9/7 (W) Map-Reduce and Spark
    Guest Lecture by Junghoon Kang
    Lecture-3 Spark_RDD
    Google_File_System
    Google_MapReduce
    4 9/9 (F) SQL and Relational Algebra/Calculus Lecture-4 [RG] 4
    [GUW] 2.4, 5.1, 5.2
    5 9/14 (W) Storage and Indexing Lecture-5 [RG] 9.4-9.7
    [GUW] 13.5-13.8
    6 9/16 (F) Indexing Lecture-6 [RG] 8.1-8.5, 10.1-10.7, 11
    [GUW] 8.3, 14.1-14.4
    7 9/21 (W) External Sorting Lecture-7 [RG] 13
    8 9/23 (F) Query Evaluation, Operator and Join Algorithms Lecture-8 [RG] 14

    Optional reading:
    (1) "Architecture of a Database System"
    by Joseph M. Hellerstein, Michael Stonebraker, and James Hamilton [pdf], Chapters 1.1 and 4.1-4.5

    (2) "Query Evaluation Techniques for Large Databases"
    by Goetz Graefe [pdf]
    9 9/28 (W) Query Optimization Lecture-9 [RG] 15
    14.2-14.7

    Optional reading:
    (1) "Access Path Selection in a Relational Database Management System"
    by Selinger et al. [pdf]

    (2) "An Overview of Query Optimization in Relational Systems"
    by Chaudhuri et al. [pdf]
    10 9/30 (F) Database Normalization Lecture-10 [RG] 19.1-19.5, 19.6.1, 19.8 (overview only)
    [GUW] 3
    11 10/5 (W) Transactions Lecture-11 [RG] 16.1-16.3, 16.4.1, 17.1-17.4
    12 10/7 (F) Transactions: Concurrency Control Lecture-12 [RG] 17.5.1, 17.5.3, 17.6
    [GUW] 18.8, 18.9
    10/12 (W) Midterm (in class)
    13 10/14 (F) Transactions: Recovery Lecture-13 [GUW] 17.2-17.4
    14 10/19 (W) Transactions: Recovery Lecture-14 [GUW] 17.2-17.4
    15 10/21 (F) Transactions: Recovery (ARIES) Lecture-15 [RG] 18.1-18.6

    "Concurrency Control and Recovery" [pdf]
    Michael Franklin, 1997
    2.2, 3.2
    16 10/26 (W) Parallel Databases Lecture-16 [RG] 22.1-22.5
    [GUW] 20.1-20.2
    17 10/28 (F) Distributed Databases Lecture-17 RG] 22.6-22.14
    [GUW] 20.3, 20.4.1-20.4.2, 20.5-20.6
    18 11/2 (W) NOSQL and Column Stores Lecture-18
    19 11/4 (F) Data Warehousing and Decision Support Lecture-19
    20 11/9 (W) Data Mining Lecture-20
    21 11/11 (F) Datalog Lecture-21
    22 11/16 (W) Acyclic joins, query hypergraphs, and worst case joins Lecture-22
    and slides from Ashwin on
    worst case joins here
    23 11/18 (F) Data Integration Lecture-23
    11/23 (W) No class - Thanksgiving Recess
    11/25 (F) No class - Thanksgiving Recess
    24 11/30 (W) Review and wrap up Lecture-24
    25 12/2 (F) Project Presentations
    (last class)
    12/19 (M) Final Exam (9:00 AM - 12:00 NOON), LSRC A247


    Homework

    Homework Topic Posted on Due on
    HW1 SQL and Postgres 08/31 (Wed) 09/16 (Wed), 11:55 pm
    HW2 Spark and AWS 08/31 (Wed) 10/17 (Mon), 11:55 pm
    HW3 NOSQL 10/30 (Sun) 11/16 (Wed), 11:55 pm

    Project Milestones

    Milestone Due on
    Project Proposal (1-3 pages) 09/28 (Wed), 11:55 pm
    Please send an email with group member and an informal project description by 09/21 (Wed).
    Midterm Report (3-5 pages) 10/28 (Fri), 11:55 pm
    Final Report (4-8 pages) 11/28 (Mon), 11:55 pm