u CompSci 316 - Introduction to Databases -Fall 2020

CompSci 316

Introduction to Databases

Fall 2020

Home
Fall'20 (new)
Time/Day
Staff
Grading
Workload
Resources / Toolkits
Course Policy
Help
Schedule
Have questions? Please email both Sudeepa (sudeepa at cs.duke.edu) and Yesenia (yvelasco at cs.duke.edu) for questions on logistics. All other questions should be discussed on Piazza.

Overview

We intend this course to give you a solid background in database systems as well as managing and processing "big data" in general. Topics include data modeling, database design theory, data definition and manipulation languages (SQL and NoSQL), database application programming interfaces, storage and indexing, query processing and optimization, parallel and distributed data processing, transaction processing, as well as a sample of other advanced topics.

Prerequisites:

CompSci 201 or equivalent, or consent of the instructor.

Updates for Fall'20:

  • The first class is on Tuesday 08/18. There are no discussion sessions on Monday 08/17. See Zoom links on sakai.

  • The class will be "online synchronous" due to the COVID-19 situation -- the lectures will be delivered live and recorded for viewing later. Irrespective of their time zone and locations, students should be able to take this class, but students who cannot attend the live lectures for most of the classes due to time zone differences, should take prior permission from the instructor.

  • NEW! There will be online synchronous discussion sessions every Monday to practice problems, collaborate on homework assignments, and collaborate on projects once they start. One discussion section is in the morning and the second one is in the afternoon to cover all timezones. Please select a discussion section that is suitable for your time zone. All students will be expected to attend these online discussion sessions. Check out grading criteria.

  • There will be a semester-long team project with 5 students in each team to build a database-backed website. All team members must belong to the same discussion section to reduce conflicts regarding time-zone differences.

  • Although the class would be online, we expect the class to be highly interactive, both in the lecture and in the discussion sessions. Also note that the class would have a fast and rigorous workload with weekly assignments based on every week's lecture material! Once the projects start, everyone would also has to give weekly project updates on piazza to their project TAs and the groups would be expected to meet every week. In other words, do not rely on being out of touch for 1-2 weeks and catching up later! There will be several TA office hours scattered around the week to help you with the class material and assignments.

  • The textbooks are optional as many students would not have access to the library -- lecture slides and other material provided would be sufficient for this class. We will discuss problems and give additional information on the topics covered in the lecture slides during the lectures and discussion sessions, so attending these regularly is important.

  • The grading criteria will put more emphasis on weekly homeworks and class participation, and less on the exams. The details of the exams will be posted later.

  • As it is difficult to meet and interact with other students online, we plan to form study groups from the same discussion sessions with about 10 students who you would interact frequently for discussing assignments. We might reshuffle the students in the study groups once in a while. There will be TAs assigned to each study group to facilitate discussions and answer questions you have. Please make sure to acknowledge all the help and discussions in your assignments. See Course Policy for details.

  • In other semesters, we interact with students in in-person classes and office hours, and get to know many of them if not all! As it is not possible this semester, we encourage you to turn your video on whenever possible in the online office hours and discussion sessions. During the online lectures, we encourage you to have a profile photo on your Zoom account so that we know who you are! We would try our best to replicate experience in in-person classes in this online class!

  • We hope you would have fun in learning database systems in this class!


Time/Day

      Lectures: Tuesdays and Thursdays, 1:45 pm - 3:00 pm

      Discussion-01: Mondays, 8:30 am to 9:45 am
      Discussion-02: Mondays, 3:30 pm to 4:45 pm
       (Students need to register for one discussion session suitable for their time zone.)

     Place: The class is completely online in Fall'20.
       Please see Sakai website for the Zoom links to the lectures, discussion sessions, office hours, etc. (not available yet)


     Office Hours: See below.


Course Staff

Instructor

      Sudeepa Roy Office hour: Zoom, After class TTh and TBD.

Teaching Associate

      Yesenia Velasco.

Grad TAs

  • Yihao Hu.

  • Jingxian Huang

  • Xiangchen Shen.

UTAs

  • Meredith Brown.

  • David Chen.

  • Kathleen Chen.

  • Kevin Day.

  • Ankit Jajoo.

  • Jane Li.

  • Florence Liu.

  • Rebecca Shu

  • Calleigh Smith.

  • Jeevan Tewari.

  • Runxin (Rebecca) Wang.

Not sure who your TAs are? Check out first few slides of Lecture 1 once it is posted!

Weekly Office Hour Calendar:

Office Hour Calendar

Grading

You have a considerable amount of control over the final grade you would receive in this class if you work hard, and should have a fair idea where you approximately stand at any point of time!
  • Grading is done on an absolute, but adjustable scale. Anyone earning
    • 90% or more of the total number of points available will receive a grade in the A range (A+, A, A-);
    • 80% or more guarantees a grade in the B range (B+, B, B-);
    • 70% or more guarantees a grade in the C range (C+, C, C-);
    • 60% or more guarantees a D (no +/- in this range).

  • At the discretion of the instructor, the grading scale may slide down (i.e., grades go higher), but it will not slide up.

  • Within each range (for As, Bs, Cs), any one getting more than "mean + standard deviation" of the scores in that range would get the + grade, and anyone getting less than "mean - standard deviation" would get the - grade. e.g., if among the scores in the range of [80, 90), the mean is 85.3 and the standard deviation is 1.6, students getting [80, 83.7) would get B-, [83.7, 86.9] would get B, and (86.9, 90) would get B+.

  • Note that for the A range, all grades >= 90 would be considered to decide A-, A, A+, not just [90, 100]. Getting >= 100 is not impossible in this course thanks to the extra credit problems!

Weights of each component:

See details in Workload below.

Component Weight
Homeworks 40%
           Written problem solving /programming 30%
           Gradiance exercises 10%
Exams 25%
           Midterm 12%
           Final exam 13%
Project 20%
Class participation 15%
           Attending discussion sessions (quizzes and collaboration) 7%
           Attending lectures/watching videos 3%
           Quizzes from lectures 3%
           Communication 2%


Workload

  • Homeworks (40%): There will be weekly homeworks (due in 7 days). They will be based on the last 1-2 lectures. They are of two types:

    1. Written problem solving and programming assignments (30%):
      Start early and allocate enough time to solve these problems!

      • Late policy: Homework problems that are submitted late will receive an automatic deduction of 5% per hour late (per problem, not per sub-problem, so even if one sub-problem is late, the problem is late). You will receive no credit after the sample solution becomes available. We will drop the (one) lowest score at the end.

        Exceptions will only be made in the case of documented excuses; follow the standard university procedure for filing them - in other words, you must submit an Incapacitation Form (STINF), Religious Observance Notification Form, or Notification of Varsity Athletic Participation Form -- these forms will give you two days (48 hours) of extensions. For any other reason or extension beyond two days, you must arrange for your academic dean to email the instructor regarding your circumstances (Dean's Excuse). You must have an email from the instructor granting the extension - otherwise the standard late penalty would apply. Do not rely on getting an extension and start/finish homeworks as early as possible!

    2. Gradiance exercises (10%):
      Gradiance is an online service pioneered by one of the authors of the textbook, Prof. Jeffrey Ullman at Stanford. One of the best features of Gradiance is that you are permitted to test yourself on a particular topic as many times as you like. You receive immediate feedback for each attempt, which avoids the shortcoming of the traditional submit-and-then-wait-for-grades assignments where one error in understanding can permeate solutions to multiple problems and does not get rectified until much later. We encourage you to continue testing on each topic until you complete the part of the assignment with a 100% score. The highest score will be recorded. The questions will be the same in every attempt, but the answer choices will be selected at random. We will drop the lowest two scores at the end.

      • Late policy: There are no late days for gradiance assignments (under all circumstances). It will automatically close after the deadline. The website might have occasional downtime for maintenance. Make sure to start early and finish them by the deadline.



    3. Project (20%): The course projects are to be done in groups of five members. All project members must be chosen from the same discussion section.Details would be posted later.

    4. Midterm (12%) and final (13%): Details would be posted later.

    5. Class Participation (15%): For attending lectures and the discussion sessions every week. We will check attendance as recorded by zoom. (No, we cannot detect whether you are actually attending or watching a video, and it is not our goal either! Our goal is to help you keep pace with the course.)

      Attending/watching lectures + answering quizzes after lectures (3% + 3%): For attending live lecture or watching videos within the next day after each class ends (e.g., for a Tuesday's lecture, watch it by Wednesday night 11:59 pm Eastern, so you have about 24 + 9 = 33 hours to watch a video). There are no late days. We will drop the lowest two scores to account for days when you cannot attend a lecture and forgot to watch the video.

      For lectures with quizzes, the quizzes have to answered within the next day after each class ends. There are no late days but we will drop the lowest two scores. They will be discussed in the next lecture.

      Attending and participating in discussion sessions (7%): Each discussion session may have points for attendance, solving quizzes, collaborating with others etc. All discussion sessions have the same weight. We will drop the lowest three scores for discussion sessions to account for the days when you cannot attend it (so you need to attend about 75% of the discussions to get full credit.).

      Communication (2%): We will be regularly contacting you about your progress, feedback, concerns, etc. These points are reserved for your response within the time limit provided in the emails.

Note: Almost all types of assignments have options for late days or removal of some lowest scores. To ensure fairness to everyone, no additional late days would be granted by email requests before the deadline. Please start early!

How much time should I allocate to get work done in CompSci 316?

You should plan for the following every week. You should approximately allocate 10-12 hours every week for the class, although it may vary from student to student.

Activity Hours/Week Days of Weeks
Online live lectures 1.25 x 2 = 2.5 Tues, Thurs
(for time zone differences, lecture videos to be watched within the next day after each class ends)
Discussion session 1.25 Mon
Assignments 3-5 (varies) Tues (typically)
+
Lecture quizzes within the next day
Project 2-4 (heavier work likely later in the course,
if you do an "open project", expect 1.5x or so more work (and more fun!))
Mon (weekly updates)
+
Two milestones and a final project


Resources / Communication / Toolkits

Book: The textbooks are optional as many students would not have access to the library. Lecture slides and regular class participation would be sufficient for this class.

If you would like to consult a textbook, we use the following book:

Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2nd Edition. Prentice Hall. 2008.

See the publisher's book page and the Amazon book page. Relevant chapters for reading are posted under Schedule. In a typical semester, textbooks for this course are available for 3-hour checkouts at the Duke Libraries. Search the Libraries' Top Textbooks program here: https://library.duke.edu/course-support/course-reserves/textbooks. Please consult the library for options during the COVID-19 situation.

Gradescope: We will use Gradescope (for submission and grading of (non-Gradiance) homeworks and project work, as well as grading of exams.

Communication and Piazza: You should check your email regularly for important course-related announcements. All important announcements will be sent through both Piazza and Sakai. Old email messages can be found under "Email Archive" on Sakai.

All questions that may be of general interest to the class should be directed to Piazza. do not use the mailing list. You will get your questions answered faster on Piazza than via personal emails to the course staff, because Piazza is monitored closely by everybody in the class, not just the course staff. You are highly encouraged to answer each others' questions on Piazza and the course staff would endorse/add to those answers.

You would be added to Piazza the day before the class starts.

Sakai: We will use the Sakai course management system for posting sample solutions (under "Resources") and for checking grades (under "Gradebook").

Computing: You will need access to a computer (any major OS will do) on which you are allowed to install new software. We will also use cloud-based virtual machines - see Help for details.

Course Policy

Standards of Conduct: Under the Duke Community Standard, you are expected to submit your own work in this course, including homeworks, projects, and exams.
  • On many occasions when working on homeworks and projects, it is useful to ask others (the instructor, the TA, or other students) for hints or debugging help, or to talk generally about the written problems or programming strategies. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received (including help from the course staff). Any assistance received that is not given proper citation will be considered a violation of the Standard.

  • In any event, you are responsible for writing, understanding, and being able to explain on your own all written and programming solutions that you submit.

  • Copying solutions to any problem in any assignment from other students in the class, even if you have discussed those problems with them, is strictly prohibited.

  • It is strictly not allowed to seek help outside your TAs and classmates for solving the assignments, so you CANNOT search for answers on the Web, ask students from previous semester taking this course or anyone else for help and material, or search for solutions from previous semesters.

  • You can use online tutorial and resources for your project, but the entire code must be written by your team members. Please acknowledge all websites that you have consulted in your project milestone reports.

  • Exam policy: TBA

  • The course staff will pursue aggressively all suspected cases of violations, and they will be handled through official University channels. Any proven violation of course policy would result in a zero in the entire assignment (not just the problem with violated policy) and may result in strict disciplinary actions.

  • If you are unsure of a policy, please ask the instructor and do not assume anything.


Help

The help section will be updated when the class starts.

Schedule

(subject to change)

"Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures.

D = Discussion session

A = Assignments

-->
  Day Topic                 Slides            Assignments / Remarks Optional Reading
1 8/18 (T) Introduction Lecture-1-notes

Lecture-1
(up to slide 35)
pgweb instructions (we will use this in class).

2.1, 2.2, 6.1, 6.2
2 8/20 (Th) Relational model,
Basic SQL,
and Relational algebra
Lecture-2-notes

Lecture-2
(up to slide 25)
2.3, 2.4
D1 8/24 (M) Setup and basic SQL Discussion-1
A 8/25 (T) None (but finish your setup for SQL)
3 8/25 (T) contd.
4 8/27 (Th) Database design in E/R model Lecture-3-notes

Lecture-3
4.1-4.4
D2 8/31 (M) RATest and RADB
A 9/1 (T) HW-1 (SQL-1) is due
5 9/1 (T) contd. 4.5, 4.6
6 9/3 (Th) Project mixer Guest lecture by
Danai Adkisson (slides-colab)

and by

Meredith Brown and Florence Liu
D3 9/7 (M) RA Lecture Quiz 0903 is due
A 9/8 (T) HW-2 (RA) + Gradiance-1 is due
7 9/8 (T) E/R to relational translation Lecture-4-notes

Lecture-4

(Up to slide 10)
Names of 5 members for each project team due
8 9/10 (Th) SQL: aggregation, subqueries, NULL, outerjoin, modifications, constraints, triggers, views Lecture-5-notes

Lecture-5

(up to slide 8)
2.3, 6.1.1-6.1.7, 6.2-6.5, 7.1-7.5, 8.1-8.3
D4 9/14 (M) Demo of RA
9 9/15 (T) contd. (up to slide 32) 3.1-3.4, 3.6, 3.7
A 9/16 (W) HW-3 (ERD) + Gradiance-2 is due
10 9/17 (Th) contd. (up to slide 62)
D5 9/21 (M) SQL LecQuiz-0917-SQL is due +

Project update due on piazza threads
A 9/22 (T) HW-4 (SQL-2) is due
11 9/22 (T) Database design theory: FD, BCNF Lecture-6

(up to slide 7)
9.1, 9.3, 9.4, 9.6, 10.2
12 9/24 (Th) contd. Up to slide 21 included in the midterm Project MS1 is due.
D6 9/28 (M) Midterm practice problems
A 9/29 (T) NO HOMEWORK DUE
13 9/29 (T) Midterm exam
14 10/1 (Th) Storage & Index Lecture-7

(up to slide 17 of Lecture-7, Finished Lecture-6)
14.1, 14.2
D7 10/5 (M) BCNF + Project
15 10/6 (T) contd. (up to slide 46 of Lecture-7)
16 10/8 (Th) Query Processing Lecture-8

(up to slide 2)
A 10/9 (F) HW-5 and Gradiance-3 (Constraints and Triggers) is due
D8 10/12 (M) Project
17 10/13 (T) Join Algorithms and external sorting (up to slide 14) 15.1-15.6, 15.8
16.1, 16.7.3-16.7.5
18 10/15 (Th) contd. (up to slide 28)
D9 10/19 (M) Index/Cost/(Un)Clustered Project MS2 is due (along with a basic version of your website).
19 10/20 (T) Contd. (Finished Lecture-8) 16.2-16.6
20 10/22 (Th) XML Lecture-9

(up to slide 35)
11, 12.1, 12.2
A 10/22 (Th) HW-6a (Index + Sorting) is due
D10 10/26 (M) Keys/BCNF/Join Algo
A 10/27 (T) HW-6b (Join Algorithms) is due
21 10/27 (T) XML-relational mapping,
NoSQL: JSON and MongoDB
Lecture-10

(Finished Lecture-9 and Lecture-10 (overview))
22 10/29 (Th) Transaction Lecture-11

(up to slide 17)
A note on SQL Programming (slides) for the class projects
D11 11/2 (M) TBD
23 11/3 (T) Contd. (Up to slide 35)
24 11/5 (Th) Transaction: Recovery Lecture-12

(Up to slide 8,
Finished Lecture-11)
Gradiance4-XML is due.
D12 11/9 (M) TBD
A 11/10 (T) HW-7 (JSON/MongoDB) is due by each project group
25 11/10 (T) Contd.
26 11/12 (Th) Map-Reduce, Parallel DBMS Lecture-13 Gradiance5-Transactions is due.
D13 11/16 (M) TBD
A 11/16 (M, last day of class) Final project report and demo video are due
11/22 (Sun) Online Final Exam, 2-5 pm EST, details TBA