u CompSci 516 - Databases Systems - Spring 2022

CompSci 516

Database Systems

Spring 2022

Home
Overview
Time/Day
Staff
Grading
Resources / Toolkits
Course Policy
Help
Schedule

Overview

This is the graduate database course at Duke CS. This course will cover principles and design of database management systems at an advanced level.

Topics will include:

SQL/Relational Algebra/Relational Calculus, Database Normalization, DBMS Architecture/Storage, Indexing/Hashing, Query Algorithms and Optimizations, Transactions and Recovery, Parallel DB/Map Reduce/Distributed query processing, NOSQL/Column store, Datalog, Advanced and Research Topics in Databases (TBD).

Prerequisites:

CompSci 316 or an undergraduate database course is **not** a hard prerequisite as we start from basics in databases and SQL in this course. However, having a background in CS (e.g., algorithms, data structures, sets, programming, basic systems knowledge etc.) and experience in some programming knowledge helps. If you have taken 316 before, there may be some overlap of topics. In 516, there will be a group project and individual assignments in old and new database systems (SQL, Spark, NOSQL) -- quite a bit of these topics you are expected to learn yourself with help from TAs/online tutorials (along with any other topic that is needed as a background knowledge). If you do not have a CS background, you are encouraged to contact the instructor to check if the course is suitable for you.

Time/Day

      Lectures: Tuesdays and Thursdays, 10:15 am - 11:30 am

      Place: French Science 2231


Course Staff

Instructor

      Sudeepa Roy
            Office hour: Mondays 4-5 pm (Zoom link on Ed) and after class TTh (in-class / lecture Zoom).

TAs

  • Joon Young Lee (UTA)                   Office hour: Thursdays 8 - 9 am (Zoom link on Ed).

  • Yuxi Liu (grad TA)                   Office hour: Tuesdays 1-2 pm (Zoom link on Ed).

  • Shweta Patwa (grad TA)          Office hour: Wednesdays 5-6 pm (Zoom link on Ed).

Grading

You have a considerable amount of control over the final grade you would receive in this class if you work hard, and should have a fair idea where you approximately stand at any point of time!
  • Grading is done on an absolute, but adjustable scale. Anyone earning
    • 90% or more of the total number of points guarantees a grade in the A range (A+, A, A-);
    • 80% or more guarantees a grade in the B range (B+, B, B-);
    • 70% or more guarantees a grade in the C range (C+, C, C-);
    • 60% or more guarantees a D (no +/- in this range).

  • At the discretion of the instructor, the grading scale may slide down (i.e., grades go higher), but it will not slide up.

  • Assignment of +/- in the letter grades will be decided by the instructor based on the performance of the entire class. The highest overall score in the class, and only the exceptional performances will receive the A+ grade.

Weights of each component:

See details in Workload below.

Component Weight
Homeworks 30%
           HW1: SQL (Traditional DBMS) 10%
           HW2: Map-Reduce / Spark (Big Data) 10%
           HW3: MongoDB / NOSQL (New DBMS) 10%
Exams 45%
           Midterm 20%
           Final exam 25%
Project 13%
Class participation 12%
           In-class labs and quizzes (equal weight for each, lowest score dropped) 10%
           Communication / Surveys (equal weight for each) 2%


Workload

  • Homeworks (30%): There will be three homework assignments. They have to solved strictly individually by every student, although discussions with acknowledgement are allowed (see the honor code below). For each homework, there will be at least two weeks time in total although some parts may be posted in phases based on our pace in the class (there will be at least one week time for any problem).

    Late Policy: Homework problems that are submitted late will receive an automatic deduction of 5% per hour late (for the entire homework). After the solution is posted, you do not get any credit.

    Late penalty will be waivered only in exceptional cases. For undergraduates, you must submit an Incapacitation Form (STINF), Religious Observance Notification Form, or Notification of Varsity Athletic Participation Form -- these forms will give you 24 hours of extension without late penalty. For graduate students, a valid reason can be emailed to the instructor for a 24-hours extension (another exam or assignment does not count as a valid reason). Any student is unlikely to get a penalty waiver for more than one assignments.

    For any other reasons or extensions beyond 24 hours, undergraduate students might have to get your academic dean email the instructor regarding your circumstances (Dean's Excuse). For graduate students, an email from the authority (like the office of DGS) might be needed. In all cases, you must have an email from the instructor granting the extension - otherwise the standard late penalty would apply. Do not rely on getting an extension and start/finish homeworks as early as possible. Note that you will have at least 1-2 weeks time for each homework, so you should prepare to meet the deadline even if you get busy when the assignment is due.

    Warning: all assignments would take significant amount of time and effort, and significant amount of self-learning is expected for learning the platforms/languages and troubleshooting.

  • Project (13%): There will be a semester-long project on topics chosen by the students in groups of 4 students (any other group size has to be approved by the instructor and will only be allowed only when more groups of size 4 cannot be formed). Students are encouraged to choose a project of their own research interests that is related to data management / processing / visualization / applications / theory, or it can be building a database-backed web application. Some ideas of the projects will be posted later.

  • Midterm (20%) and final (25%): Exams are closed book and closed notes, and in class. No electronic devices or collaborations are allowed. Final is comprehensive (includes everything covered in the lectures) but may focus more on materials not already covered by the midterm. More details would be posted later.

  • Class Participation (12%): Every student is strongly encouraged to ask questions, share their thoughts, and actively participate in the lectures and on the discussion forum.

    In-class labs and quizzes (10%): There will be some in-class labs and some in-class quizzes, sometimes with a short deadline after class (equal weights, and one lowest score will be dropped). These have to be solved individually.

    Communication (2%): We will be regularly contacting you about your progress, feedback, concerns, etc. These points are reserved for your response within the time limit provided in the emails.

Resources / Communication / Toolkits

Book:The textbooks are optional and may be found in the library. Lecture slides and regular class participation should be sufficient for this class.

We will use the following two books in this class:
  1. [RG] (Main) Database Management Systems (third edition); Raghu Ramakrishnan and Johannes Gehrke.
  2. [GUW] Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2nd Edition. Prentice Hall. 2008.

Relevant chapters for reading are posted under Schedule. In a typical semester, textbooks for this course are available for 3-hour checkouts at the Duke Libraries. Search the Libraries' Top Textbooks program here: https://library.duke.edu/course-support/course-reserves/textbooks. Please consult the library for options during the COVID-19 situation.

Gradescope: We will mostly use Gradescope for submission and grading of homeworks and project work.

Communication: You should check your email regularly for important course-related announcements. All important announcements will be sent through Sakai. Old email messages can be found under "Email Archive" on Sakai.

We will use the Sakai course management system for posting sample solutions (under "Resources") and for checking grades (under "Gradebook").

Computing: You will need access to a computer (any major OS will do) on which you are allowed to install new software. We will also use cloud-based virtual machines - see Help for details.

Course Policy and Honor Code

Under the Duke Honor Code, the students are expected to submit their own work in this course in the homework and exams (note that the students will work on the project in groups). The students are allowed (and are encouraged) to discuss the course material with other students, but need to solve problems in the homeworks and exams on their own. Any assistance received must be clearly indicated in the solutions -- failure to do so will be considered a violation of the Honor Code. In any event, the students are responsible for understanding and being able to explain on their own all solutions that they submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels. If you are unsure of a policy, please ask the instructor and do not assume anything.

What is allowed/not allowed



Help

TBA

Schedule

(subject to change)

"Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures.

  Day Topic                 Slides            Additional material / reading Release/Due dates
1 1/6 (Th) Introduction and SQL Lecture-1-notes

Lecture-1
(up to slide 46)
Instructions to install Postgres and load MovieLens dataset: slides

2 1/11 (T) Data model Lecture-2-notes

Lecture-2 (up to slide 36)
HW1-Part I (SQL) released
3 1/13 (Th) SQL: intro, semantics, modifications aggregation, subqueries, NULL, outerjoin, constraints, triggers, views Lecture-3-notes

Lecture-3
(up to slide 38)
Check out some optional slides on SQL Programming at the end of the lecture slides
4 1/18 (T) SQL: aggregation, subqueries, NULL, outerjoin, constraints, triggers, views (see Lec 3 notes, from p7, slide 38)

Lecture-4
(up to slide 28)
5 1/20 (Th) Relational Algebra and Calculus (See Lecture-4 slides 27 to 42)

Lecture-5-notes

Lecture-5
(up to slide 24)
6 1/25 (T) Contd. Lecture-6-notes

(finished Lecture-5)
Team members and tentative topic of project due
7 1/27 (Th) Normalization, FDs, BCNF Lecture-7
(up to slide 38)
Quiz1-RA posted
8 2/1 (T) a. Map-Reduce and Spark

b. Storage
Lecture-8a

Lecture-8b
HW1 (SQL) - Part I and II - due
9 2/3 (Th) Tree and Hash Index Lecture-9-11
(up to slide 11)
Project proposal due (MOVED to Monday 2/7)

10 2/8 (T) Contd. B+-Tree Index
(up to slide 46)
Quiz1-RA due
11 2/10 (Th) Contd. Hash Index
(up to slide 74)
HW2 (AWS/Spark) released
2/14 (M) Quiz-2 and 3 due on Gradiance - 12 NOON
12 2/15 (T) Midterm exam
13 2/17 (Th) Transactions Lecture-13
(up to slide 37)
14 2/22 (T) Transactions - concurrency control Lecture-14
(up to slide 13)
15 2/24 (Th) Contd.
16 3/1 (T) Transactions - recovery Lecture-16
17 3/3 (Th) External Sorting and Index Selection Lecture-17 HW2 (AWS/Spark) due at noon
3/8 (T) Spring Break - no class
3/10 (Th) Spring Break - no class
18 3/15 (T) Query Evaluation and Join Algorithms Lecture-18 Midterm project report due at noon
19 3/17 (Th) Contd. HW3 (NOSQL/MongoDB) released (tentative)
20 3/22 (T) Distributed Databases, NOSQL, MongoDB Lecture 20 MongoDB & JSON Basics
21 3/24 (Th) Contd.
(up to slide 35)
Quiz-4-Transactions due on Gradiance
22 3/29 (T) Query Optimization Lecture 22
(up to slide 22)
23 3/31 (Th) Contd.
24 4/5 (T) Parallel Databases Lecture 24 HW3 (NOSQL/MongoDB) due
25 4/7 (Th) Recursive query evaluation and Datalog Lecture 25
26 4/12 (T) Data mining and data cube Lecture 26
4/13 (W) Project final report and video due
(can be submitted by Friday 4/15 noon)
4/27 (Wed) Final Exam, 9am-12 pm EST, in class