CompSci 516

Database Systems

Fall 2019

News



      Day/Time: Tuesdays and Thursdays, 1:25 pm - 2:40 pm
      PlaceLSRC B101

      Instructor: Sudeepa Roy
  • Email: sudeepa AT cs.duke.edu
  • Office Hour: LSRC D325, Tuesdays 2:45 pm - 3:45 pm
We have three amazing half-TAs to offer you full help!

      (half-)TA:Yuchao Tao
  • Email: yuchao.tao AT duke.edu
  • Office Hour: LSRC D309, Thursdays 4:00 pm - 5:00 pm
      (half-)TA: Yanlin Yu
  • Email: yanlin.yu AT duke.edu
  • Office Hour: LSRC D309, Wednesdays 3:30 pm - 4:30 pm
      (half-)TA: Tianrui Zhang
  • Email: tanrui.zhang AT duke.edu
  • Office Hour: North 306, Mondays 3:30 pm - 4:30 pm


    Overview

    This is the graduate database course. This course will cover principles and design of database management systems at an advanced level.

    Topics will include:
    SQL/Relational Algebra/Relational Calculus, Database Normalization, DBMS Architecture/Storage, Indexing/Hashing, Query Algorithms and Optimizations, Transactions and Recovery, Parallel DB/Map Reduce/Distributed query processing, NOSQL/Column store, Datalog, Advanced and Research Topics in Databases (TBD).

    Textbooks:
    1. [RG] (Main) Database Management Systems (third edition); Raghu Ramakrishnan and Johannes Gehrke.
    2. [GUW] (Additional) Database Systems: The Complete Book (second edition); Hector Garcia-Molina, Jeffrey Ullman, and Jennifer Widom

    Prerequisites:
    An introductory database course (CompSci 316 or equivalent) or consent of the instructor. Some background in Algorithms, Data Structure, and Discrete Maths will be assumed. Undergraduate students with the necessary background and interests are welcome.

    Grading
    • Three Homework: 30%
    • Project: 10%
    • Midterm: 20%
    • Final: 30%
    • In-class labs: 5%
    • In-class quizzes: 5%

    Homework
    There will be three homework assignments. They have to solved strictly individually by every student (see the honor code below). There will be two "late days" with penalty : first late day will have a 25% penalty for the entire assignment (even if you are late on one problem), and the second late day will have a 50% penalty. You do not get any credit if you are more than 2 days (i.e. 48 hours) late. Note that if you submit 1 min after the deadline, it counts as the first late day (similarly 24 hours 1 min counts as the second late day). After the solution is posted (even if within the 2 late days after the deadline), you do not get any credit. So do not rely on late days and submit your solutions on time!

    Project
    There will be a semester-long project on topics chosen by the students in groups of 3-4. Students are encouraged to choose a project of their own research interests that is related to data management / processing / visualization / applications / theory. Some ideas of the projects will be posted later.



    Exams
    Exams are closed book and closed notes, and in class. No electronic devices are allowed.

    Honor Code:
    Under the Duke Honor Code, the students are expected to submit their own work in this course in the homework and exams (note that the students will work on the project in groups). The students are allowed (and are encouraged) to discuss the course material with other students, but need to solve problems in the homeworks and exams on their own. Any assistance received must be clearly indicated in the solutions -- failure to do so will be considered a violation of the Honor Code. In any event, the students are responsible for understanding and being able to explain on their own all solutions that they submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

    What is allowed/not allowed

    Schedule

    (subject to change)

    "Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures.

      Day Topic Slides Reading
    1 8/27 (T) Introduction and SQL Lec1-notes

    Lecture-1
    SQL: [RG] 3, 5 (also see 4.2.4), [GUW] 6
    2 8/29 (Th) In-class lab (SQL) Lab-instructions

    3 9/3 (T) Data model, more SQL Lec-3-notes

    Lecture-3
    [RG] 1.1, 1.3, 1.4, 1.5
    XML (optional reading): [RG] 27.6, [GUW] 11.1, 11.2
    4 9/5 (Th) SQL, Relational Calculus/Algebra Lec-4-notes

    Lecture-4
    [RG] 4, [GUW] 2.4, 5.1, 5.2
    5 9/10 (T) Relational Algebra, Design Theory and Normalization

    (Guest Lecture by Junyang Gao)
    Lec-5-notes

    Lecture-5
    [RG] 19.1-19.5, 19.6.1, 19.8 (overview only)
    [GUW] 3
    6 9/12 (Th) In-class lab (RA) Lab2-instructions
    7 9/17 (T) RC (revisit) and Functional Dependencies Lec-7-notes

    Lecture-7
    8 9/19 (Th) BCNF and Storage Lec-8-notes

    Lecture-8
    [RG] 8.1-8.5, 9.4-9.7, 10.1-10.7, 11
    [GUW] 8.3, 14.1-14.4
    9 9/24 (T) Storage and Index Lec-9-10-notes

    Lecture-9
    10 9/26 (Th) Tree and Hash Index Lecture-10
    11 10/1 (T) External Sorting and Index Selection Lec-11-notes

    Lecture-11
    [RG] 13
    [GUW] 15.4.1
    12 10/3 (Th) Contd.
    10/8 (T) No class- Fall break
    13 10/10 (Th) Map-Reduce and Spark Lecture-13 Spark_RDD
    Google_File_System
    Google_MapReduce
    14 10/15 (T) Query Evaluation and Join Algorithms Lecture-14 [RG] 14

    Optional reading:
    (1) "Architecture of a Database System"
    by Joseph M. Hellerstein, Michael Stonebraker, and James Hamilton [pdf], Chapters 1.1 and 4.1-4.5

    (2) "Query Evaluation Techniques for Large Databases"
    by Goetz Graefe [pdf]
    15 10/17 (Th) Query optimization Lecture-15 [RG] 15
    [GUW] 14.2-14.7

    Optional reading:
    (1) "Access Path Selection in a Relational Database Management System"
    by Selinger et al. [pdf]

    (2) "An Overview of Query Optimization in Relational Systems"
    by Chaudhuri et al. [pdf]
    16 10/22 (T) Contd.

    HW2-Part1 due
    10/24 (Th) Midterm (in class)
    17 10/29 (T) Transactions Lecture-17 [RG] 16.1-16.3, 16.4.1, 17.1-17.4
    18 10/31 (Th) contd.
    11/4 (M) HW2-Part2 due
    19 11/5 (T) Transactions: Concurrency control

    Midterm project report due
    Lecture-19 [RG] Chapter 17.5.1, 17.5.3, 17.6
    [GUW] Chapter 18.8, 18.9
    20 11/7 (Th) Transactions: Recovery Lecture-20 [GUW] 17.2-17.4
    21 11/12 (T) HW3 as In-class lab (MongoDB-Part1) part1 instructions
    22 11/14 (Th) HW3 as In-class lab (MongoDB-Part2) part2 instructions
    23 11/19 (T) Recovery Contd.
    24 11/21 (Th) Distributed, parallel databases, NOSQL Lecture-24
    25 11/26 (T) Recursive query and Data mining Lecture-25
    12/11 (Wed) Final report and slides due
    12/14 (Sat) Final Exam (2:00 PM - 5:00 PM), LSRC B101


    Homework

    --> --> -->
    Homework Topic
    HW1 SQL and Postgres
    HW2 Spark and AWS
    HW3 NOSQL
    Midterm Report (3-5 pages) Final Report (4-8 pages) -->