- Welcome to CompSci 516: Database Systems in Spring 2022!
The first class is on 01/06/2022 (Thursday). Classes will be virtual until 01/18/2022 (Tuesday) following Duke's guidelines. The Zoom link for the virtual classes will be available on Sakai. All enrolled students are already added to Sakai. If you are not enrolled and would like to attend the first few classes, please send the instructor an email. - Have questions? Please email the instructor (sudeepa at cs.duke.edu) for questions on logistics. All other questions should be discussed on the Ed discussion forum (linked from Sakai).
- A properly-worn mask is required for everyone in the classroom. Please see the Duke COVID response webpage for all relevant information and updates.
- This webpage is under construction and may change before the class starts.
Overview
This is the graduate database course at Duke CS. This course will cover principles and design of database management systems at an advanced level.Topics will include:
SQL/Relational Algebra/Relational Calculus, Database Normalization, DBMS Architecture/Storage, Indexing/Hashing, Query Algorithms and Optimizations, Transactions and Recovery, Parallel DB/Map Reduce/Distributed query processing, NOSQL/Column store, Datalog, Advanced and Research Topics in Databases (TBD).
Prerequisites:
CompSci 316 or an undergraduate database course is **not** a hard prerequisite as we start from basics in databases and SQL in this course. However, having a background in CS (e.g., algorithms, data structures, sets, programming, basic systems knowledge etc.) and experience in some programming knowledge helps. If you have taken 316 before, there may be some overlap of topics. In 516, there will be a group project and individual assignments in old and new database systems (SQL, Spark, NOSQL) -- quite a bit of these topics you are expected to learn yourself with help from TAs/online tutorials (along with any other topic that is needed as a background knowledge). If you do not have a CS background, you are encouraged to contact the instructor to check if the course is suitable for you.
Time/Day
      Lectures: Tuesdays and Thursdays, 10:15 am - 11:30 am      Place: French Science 2231
Course Staff
Instructor
      Sudeepa Roy             Office hour: Mondays 4-5 pm (Zoom link on Ed) and after class TTh (in-class / lecture Zoom).TAs
- Joon Young Lee (UTA)
                 
Office hour: Thursdays 8 - 9 am (Zoom link on Ed).
- Yuxi Liu (grad TA)                   Office hour: Tuesdays 1-2 pm (Zoom link on Ed).
- Shweta Patwa (grad TA)          Office hour: Wednesdays 5-6 pm (Zoom link on Ed).
- Yuxi Liu (grad TA)                   Office hour: Tuesdays 1-2 pm (Zoom link on Ed).
Grading
You have a considerable amount of control over the final grade you would receive in this class if you work hard, and should have a fair idea where you approximately stand at any point of time!- Grading is done on an absolute, but adjustable scale. Anyone earning
- 90%
or more of the total number of points guarantees a grade in the A range (A+, A, A-);
- 80% or more guarantees a grade in the B range (B+, B, B-);
- 70% or more guarantees a grade in the C range (C+, C, C-);
- 60% or more guarantees a D (no +/- in this range).
- 80% or more guarantees a grade in the B range (B+, B, B-);
- At the discretion of the instructor, the grading scale may slide down (i.e., grades go higher), but it will not slide up.
- Assignment of +/- in the letter grades will be decided by the instructor based on the performance of the entire class. The highest overall score in the class, and only the exceptional performances will receive the A+ grade.
- 90%
or more of the total number of points guarantees a grade in the A range (A+, A, A-);
Weights of each component:
See details in Workload below.Component | Weight |
---|---|
Homeworks | 30% |            HW1: SQL (Traditional DBMS) | 10% |
           HW2: Map-Reduce / Spark (Big Data) | 10% |
           HW3: MongoDB / NOSQL (New DBMS) | 10% |
Exams | 45% |
           Midterm | 20% |
           Final exam | 25% |
Project | 13% |
Class participation | 12% |
           In-class labs and quizzes (equal weight for each, lowest score dropped) | 10% |
           Communication / Surveys (equal weight for each) | 2% |
Workload
-
Homeworks (30%):
There will be three homework assignments. They have to solved strictly individually by every student, although discussions with acknowledgement are allowed (see the honor code below). For each homework, there will be at least two weeks time in total although some parts may be posted in phases based on our pace in the class (there will be at least one week time for any problem).
Late Policy: Homework problems that are submitted late will receive an automatic deduction of 5% per hour late (for the entire homework). After the solution is posted, you do not get any credit.
Late penalty will be waivered only in exceptional cases. For undergraduates, you must submit an Incapacitation Form (STINF), Religious Observance Notification Form, or Notification of Varsity Athletic Participation Form -- these forms will give you 24 hours of extension without late penalty. For graduate students, a valid reason can be emailed to the instructor for a 24-hours extension (another exam or assignment does not count as a valid reason). Any student is unlikely to get a penalty waiver for more than one assignments.
For any other reasons or extensions beyond 24 hours, undergraduate students might have to get your academic dean email the instructor regarding your circumstances (Dean's Excuse). For graduate students, an email from the authority (like the office of DGS) might be needed. In all cases, you must have an email from the instructor granting the extension - otherwise the standard late penalty would apply. Do not rely on getting an extension and start/finish homeworks as early as possible. Note that you will have at least 1-2 weeks time for each homework, so you should prepare to meet the deadline even if you get busy when the assignment is due.
Warning: all assignments would take significant amount of time and effort, and significant amount of self-learning is expected for learning the platforms/languages and troubleshooting.
-
Project (13%):
There will be a semester-long project on topics chosen by the students in groups of 4 students (any other group size has to be approved by the instructor and will only be allowed only when more groups of size 4 cannot be formed). Students are encouraged to choose a project of their own research interests that is related to data management / processing / visualization / applications / theory, or it can be building a database-backed web application. Some ideas of the projects will be posted later.
- Midterm (20%) and final (25%): Exams are closed book and closed notes, and in class. No electronic devices or collaborations are allowed. Final is comprehensive (includes everything covered in the lectures) but may focus more on materials not already covered by the midterm. More details would be posted later.
- Class Participation (12%): Every student is strongly encouraged to ask questions, share their thoughts, and actively participate in the lectures and on the discussion forum.
In-class labs and quizzes (10%): There will be some in-class labs and some in-class quizzes, sometimes with a short deadline after class (equal weights, and one lowest score will be dropped). These have to be solved individually.
Communication (2%): We will be regularly contacting you about your progress, feedback, concerns, etc. These points are reserved for your response within the time limit provided in the emails.
- Midterm (20%) and final (25%): Exams are closed book and closed notes, and in class. No electronic devices or collaborations are allowed. Final is comprehensive (includes everything covered in the lectures) but may focus more on materials not already covered by the midterm. More details would be posted later.
Resources / Communication / Toolkits
Book:The textbooks are optional and may be found in the library. Lecture slides and regular class participation should be sufficient for this class.We will use the following two books in this class:
- [RG] (Main) Database Management Systems (third edition); Raghu Ramakrishnan and Johannes Gehrke.
-
[GUW] Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and
Jennifer Widom. 2nd Edition. Prentice Hall. 2008.
Gradescope: We will mostly use Gradescope for submission and grading of homeworks and project work.
Communication: You should check your email regularly for important course-related announcements. All important announcements will be sent through Sakai. Old email messages can be found under "Email Archive" on Sakai.
We will use the Sakai course management system for posting sample solutions (under "Resources") and for checking grades (under "Gradebook").
Computing: You will need access to a computer (any major OS will do) on which you are allowed to install new software. We will also use cloud-based virtual machines - see Help for details.
Course Policy and Honor Code
Under the Duke Honor Code, the students are expected to submit their own work in this course in the homework and exams (note that the students will work on the project in groups). The students are allowed (and are encouraged) to discuss the course material with other students, but need to solve problems in the homeworks and exams on their own. Any assistance received must be clearly indicated in the solutions -- failure to do so will be considered a violation of the Honor Code. In any event, the students are responsible for understanding and being able to explain on their own all solutions that they submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels. If you are unsure of a policy, please ask the instructor and do not assume anything.Help
TBASchedule
(subject to change)"Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures.
Day | Topic |                 Slides            | Additional material / reading | Release/Due dates | |
---|---|---|---|---|---|
1 | 1/6 (Th) | Introduction and SQL |
Lecture-1-notes
Lecture-1 (up to slide 46) |
Instructions to install Postgres and load MovieLens dataset: slides | |
2 | 1/11 (T) | Data model |
Lecture-2-notes
Lecture-2 (up to slide 36) |
HW1-Part I (SQL) released | |
3 | 1/13 (Th) | SQL: intro, semantics, modifications aggregation, subqueries, NULL, outerjoin, constraints, triggers, views | Lecture-3-notes
Lecture-3 (up to slide 38) |
Check out some optional slides on SQL Programming at the end of the lecture slides | |
4 | 1/18 (T) | SQL: aggregation, subqueries, NULL, outerjoin, constraints, triggers, views | (see Lec 3 notes, from p7, slide 38) Lecture-4 (up to slide 28) |
||
5 | 1/20 (Th) | Relational Algebra and Calculus | (See Lecture-4 slides 27 to 42)
Lecture-5-notes Lecture-5 (up to slide 24) |
||
6 | 1/25 (T) | Contd. | Lecture-6-notes
(finished Lecture-5) |
Team members and tentative topic of project due | |
7 | 1/27 (Th) | Normalization, FDs, BCNF |
Lecture-7
(up to slide 38) |
Quiz1-RA posted | |
8 | 2/1 (T) | a. Map-Reduce and Spark
b. Storage |
Lecture-8a
Lecture-8b |
HW1 (SQL) - Part I and II - due | |
9 | 2/3 (Th) | Tree and Hash Index | Lecture-9-11
(up to slide 11) |
Project proposal due (MOVED to Monday 2/7)
|
|
10 | 2/8 (T) | Contd. | B+-Tree Index
(up to slide 46) |
Quiz1-RA due | |
11 | 2/10 (Th) | Contd. | Hash Index (up to slide 74) |
HW2 (AWS/Spark) released | |
2/14 (M) | Quiz-2 and 3 due on Gradiance - 12 NOON | ||||
12 | 2/15 (T) | Midterm exam | |||
13 | 2/17 (Th) | Transactions | Lecture-13
(up to slide 37) |
||
14 | 2/22 (T) | Transactions - concurrency control | Lecture-14 (up to slide 13) |
||
15 | 2/24 (Th) | Contd. | |||
16 | 3/1 (T) | Transactions - recovery | Lecture-16 | ||
17 | 3/3 (Th) | External Sorting and Index Selection | Lecture-17 | HW2 (AWS/Spark) due at noon | |
3/8 (T) | Spring Break - no class | ||||
3/10 (Th) | Spring Break - no class | ||||
18 | 3/15 (T) | Query Evaluation and Join Algorithms | Lecture-18 | Midterm project report due at noon | |
19 | 3/17 (Th) | Contd. | HW3 (NOSQL/MongoDB) released (tentative) | ||
20 | 3/22 (T) | Distributed Databases, NOSQL, MongoDB | Lecture 20 | MongoDB & JSON Basics | |
21 | 3/24 (Th) | Contd.
(up to slide 35) |
Quiz-4-Transactions due on Gradiance | ||
22 | 3/29 (T) | Query Optimization | Lecture 22
(up to slide 22) | ||
23 | 3/31 (Th) | Contd. | |||
24 | 4/5 (T) | Parallel Databases | Lecture 24 | HW3 (NOSQL/MongoDB) due | |
25 | 4/7 (Th) | Recursive query evaluation and Datalog | Lecture 25 | ||
26 | 4/12 (T) | Data mining and data cube | Lecture 26 | ||
4/13 (W) | Project final report and video due (can be submitted by Friday 4/15 noon) |
||||
4/27 (Wed) | Final Exam, 9am-12 pm EST, in class |