CompSci 316

Introduction to Databases

Spring 2020

Home
News
Info
Grading
Help
Schedule

News



Course Staff

      Day/Time: Tuesdays and Thursdays, 3:05 pm - 4:20 pm
      PlaceLSRC B101

UPDATE: The course will be conducted online after the spring break to the end of the semester. Please see updates below and announcements on sakai.

Some office hours might be updated after spring break, we will announce on sakai.

      Instructor: Sudeepa Roy
  • Email: sudeepa AT cs.duke.edu
  • Office Hours: LSRC D325, Mon and Fri, 2 pm - 3 pm
  • Office Hours: By appointment (see link on sakai announcements), ask questions during or after TTh classes, or submit your questions on piazza
Grad TAs
  • Tiangang Chen. Office hour: LSRC D316, Thu 10 am - 11 am, Fri 10 am - 11 am

    Office hour: Zoom, Thu 9 pm - 10 pm, Fri 10 am - 11 am (EST)


  • Xiangchen Shen. Office hour: LSRC D316, Mon 12 pm - 1 pm, Wed 3:30 pm - 4:30 pm

    Office hour: Zoom, Mon 10 am - 11 am, Wed 9 pm - 10 pm (EST)


  • Yuchao Tao (half-TA). Office hour: LSRC D316, Tue 1:30 pm - 2:30 pm

    Office hour: Zoom, Tue 9 am - 10 am (EST)


UTAs
  • David Chen. Office hour: Soc Psych 128, Sun 7 pm - 9 pm

    Office hour: Zoom, Sun 8 pm - 10 pm (EST)


  • Jane Li. Office hour: Soc Psych 128, Mon 7 pm - 9 pm

    Office hour: Zoom, Mon 8 pm - 10 pm (EST)


  • Runxin (Rebecca) Wang. Office hour: Soc Psych 128, Tue 7 pm - 9 pm

    Office hour: Zoom, Tue 8 pm - 10 pm (EST)


    Office Hour Calendar: Office Hour Calendar

    Not sure how your TAs look like? Check out first few slides of Lecture 1!

    Overview

    We intend this course to give you a solid background in database systems as well as managing and processing "big data" in general. Topics include data modeling, database design theory, data definition and manipulation languages (SQL and NoSQL), database application programming interfaces, storage and indexing, query processing and optimization, parallel and distributed data processing, transaction processing, as well as a sample of other topics such as data mining and web data. Programming projects are required.

    Prerequisites:

    CompSci 201 or equivalent, or consent of the instructor. You will need familiarity (or ability to quickly become familiar) with the Unix command line (such as "Terminal" in Mac OS).

    Grading

    Updates in grading due to unprecedented change of circumstances under COVID-19:

    1. Please see this website for remote learning and other resources in this challenging time: https://keeplearning.duke.edu/.

    2. As you should have heard from Duke, the default grading of this class has changed to S/U. You have to submit a form to the registrar if you decide to opt for the letter grade for the course. We expect everyone who work on all components of the course (note that some lowest scores would be dropped) to the best of their capacity to receive an S grad at the end.

    3. There might be unexpected special circumstances, and we would consider such scenarios after the spring break to decide both the S/U grades and letter grades for such students. If you are concerned about your situation regarding any of the assignments in the class or in general, please send the instructor an email copying your academic dean.

    4. The thresholds for the letter grades remain the same for now, but as earlier, the thresholds may go down based on performance of the entire class at the end.

    5. NEW: We will reweigh the total 40% weight of the exams in the following way: The higher score will have 70% weight (i.e. 28% of the course grade) weight and the lower score will have 30% weight (i.e. 12% of the course grade).

    6. NEW: The lowest homework grade will be dropped.

    7. NEW: There will be two days of extension without penalty for each remaining homework. But the next homework might be posted, so you are encouraged to work on the parts of the homeworks as soon as the corresponding concept is covered in class.

    8. NEW: The 5% + 5% = 10% total weight of in-class labs and quizzes will be updated as follows. We have had 2 quizzes and 2 labs so far. As announced earlier, we will drop the lower quiz and lower lab and include the higher ones toward this 10% total weight (100 points each, scaled up to 100 from 10 for the quiz). The rest will be coming from watching the videos posted for each class (each class will have 100 points). You will submit on gradescope whether you watched all the videos of the class. If there are multiple short videos of the same class, the 100 points will be equally divided among them. We will drop the lowest score (from one quiz, one lab, and all videos) at the end.

    9. The details of the online final exam will be announced later. We will take into account time-zone differences and potential problems with network connectivity while designing the exam format.

    10. There are concerns about collaborations to complete the group projects in the survey. The good thing is that, as you already have finished HW4 as a team, you should have a basic implementation of a webpage already. We will consider this disruption in collaboration and lost time while grading the projects. We suggest each team to find common times to meet weekly using when2meet and work on the project together using ZOOM at least for some time each week.

    11. We plan to use when2meet for homework collaboration where students can post their weekly availability to discuss homework problems. Links will be posted on sakai. Then you should form small groups and discuss homeworks on ZOOM. Note that the same collaboration policy as before still applies, i.e., you can only discuss with course staff and students in the class, you must acknowledge all help in your submission, you need to write your solution and should be able to explain your solution, you cannot search online, from previous semesters' solutions, orf discuss with anyone outside the class. Any violation of course policy will be aggressively pursued and actions will be taken.

    12. Hope you all stay safe and healthy wherever you are!



    Grading is done on an absolute, but adjustable scale. In other words, there is no curve. Anyone earning
    • 90% or more of the total number of points available will receive a grade in the A range (A+, A, A-);
    • 80% or more guarantees a grade in the B range (B+, B, B-);
    • 70% or more guarantees a grade in the C range (C+, C, C-);
    • 60% or more guarantees a grade in the D range (D+, D, D-).
    At the discretion of the instructor, the grading scale may slide down (i.e., grades go higher), but it will not slide up (e.g., based on the class performance of the midterm or the final exams these thresholds can get lower). The topper of the class and only the exceptional performances will receive the A+ grade.
    • Homeworks (30%): There will be short weekly homeworks (due in 7 days), with a mix of written, programming, and gradiance problems. They will be based on the last 1-2 lectures. All homeworks will have the same weight.

      Late policy: There are no late days for gradiance assignments (it will automatically close after the deadline).

      Parts of the other homework problems that are submitted late will receive an automatic deduction of 5% per hour late, and will receive no credit after the sample solution becomes available.

      Exceptions will only be made in the case of documented excuses; follow the standard university procedure for filing them - in other words, you must submit an Incapacitation Form, Religious Observance Notification Form, or Notification of Varsity Athletic Participation Form (as well as send an email to the instructor copying your academic dean requesting the extension); or you must arrange for your academic dean to email the instructor regarding your circumstances. You must have an email from the instructor granting the extension - otherwise the standard late penalty would apply.

    • Project (20%): The course projects are to be done in groups of about five members. Details can be found later in the project description under Schedule.

    • Midterm (20%) and final (20%): Both midterm and final exams are open-book and open-notes. Final is comprehensive but will focus on materials not already covered by the midterm.

    • In-class labs (5%): To practice and learn the material covered in some of the lectures, there will be some in-class labs on SQL, other programming assignments, and sometimes conceptual questions too for 15-45 mins (depending on the task). They will be announced in advance (at least on the previous lecture). All the labs will have equal weight and we will drop the lowest lab grade while calculating your final score. You have to finish the problems in the labs within the next day after the class ends (i.e. by 11:59 pm of the next day). If you correctly finish and upload all the problems before the class ends (by 4:20 pm), you will get 10% bonus points (you can attempt multiple times until they are correct). There will be TAs present in class to help you in the labs. For the labs in class, you can work in groups of 2-3, but need to submit your own solution.

      What if you miss a class with a lab? Well, you can submit by the same day or within the class time for the bonus points even if you miss a class, but you would miss the fun of discussing with others and TAs help. The lowest score will be dropped as well.

      Late policy: After the deadline, like homework assignments, there will be a 5% per hour penalty (or you need to follow the formal procedure for extension requests as mentioned above).

    • In-class pop-up quiz (5%): Again, to practice and learn the material covered in some of the lectures, there will be some in-class pop-up quizzes that will remain open for 5-10 mins. They will be announced in advance too (at least on the previous lecture) but the schedule may change based on the pace of the lecture. All the quizzes will have equal weight and we will drop the lowest quiz grade while calculating your final score. For each quiz, there will be 50% grade just for attempting while it is open, and the other 50% for answering it correctly. For the quizzes in class, you can work in groups of 2-3, but need to submit your own solution.

      What if you miss a class with a quiz? Well, you can still submit the solution while the quiz is open, but you would miss the fun of discussing with others. Further, we would drop 25% (ceiling) of the lowest grades while calculating your final score for quiz, i.e. if we have 4 quizzes 1 dropped, 5-8 quizzes 2 dropped, and so on.

      Late policy: The answers will be discussed in the same class right after the quiz closes, so late submissions are not allowed.

    Information

    Book: We use the following book: Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2nd Edition. Prentice Hall. 2008. See the publisher's book page and the Amazon book page. Relevant chapters for reading are posted under Schedule. Textbooks for this course are available for 3-hour checkouts at the Duke Libraries. Search the Libraries' Top Textbooks program here: https://library.duke.edu/course-support/course-reserves/textbooks.

    Gradescope: We will use Gradescope (follow the link from the navigation bar) for submission and grading of (non-Gradiance) homeworks and project work, as well as grading of exams. Be sure to read Help/Submitting Non-Gradiance Work for instructions.

    Communication: You should check your email regularly for important course-related announcements. All questions that may be of general interest to the class should be directed to Piazza (follow the link from the navigation bar); do not use the mailing list. You will get your questions answered faster on Piazza than via personal emails to the course staff, because Piazza is monitored closely by everybody in the class, not just the course staff.

    Sakai: We will use the Sakai course management system for posting sample solutions (under "Resources") and for checking grades (under "Gradebook").

    Computing: You will need access to a computer (any major OS will do) on which you are allowed to install new software. We will also use cloud-based virtual machines - see Help for details.

    Standards of Conduct: Under the Duke Community Standard, you are expected to submit your own work in this course, including homeworks, projects, and exams. On many occasions when working on homeworks and projects, it is useful to ask others (the instructor, the TA, or other students) for hints or debugging help, or to talk generally about the written problems or programming strategies. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the Standard. In any event, you are responsible for understanding and being able to explain on your own all written and programming solutions that you submit. It is strictly not allowed to seek help outside your TAs and classmates for solving the assignments, so you cannot search for answers on the Web, students from previous semester taking this course, or search for solutions from previous semesters. The course staff will pursue aggressively all suspected cases of violations, and they will be handled through official University channels.

    Help

    Schedule

    (subject to change)

    "Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures.

      Day Topic Slides Assignments / Remarks Reading
    1 1/9 (Th) Introduction and SQL Lecture-1-notes

    Lecture-1
    Try the queries from the lecture notes on pgweb!

    How to access online SQL interface (for simple tasks): pgweb (pgweb instruction) (we will use this in class).

    Also check out a similar pgAdmin4 (pgAdmin4 instruction) (you would need pgadmin for assignments)

    2.1, 2.2, 6.1, 6.2
    2 1/14 (T) Relational model and algebra Lecture-2-notes

    Lecture-2
    (up to slide 25)
    HW1 is posted. 2.3, 2.4
    3 1/16 (Th) contd. (from slide 26 to end)
    4 1/21 (T) Database design in E/R model Lecture-3-notes

    Lecture-3
    Lab-1 on RA in class LAB1 (in groups).

    HW1 is due.

    HW2 is posted.
    4.1-4.4
    5 1/23 (Th) contd. Quiz-1 in class (in groups).
    6 1/28 (T) Project mixer Guest lecture by
    Jane Li
    and
    Danai Adkisson (slides)
    Project details is posted on Saturday 1/25.
    7 1/30 (Th) ER to relational translation Lecture-4-notes

    Lecture-4
    HW2 is due.

    Lab-1 is due.

    HW3 is posted (Q1-Q5).
    4.5, 4.6
    8 2/4 (T) SQL: aggregation, subqueries, NULL, outerjoin, modifications, constraints, triggers, views Lecture-5-notes

    Lecture-5

    (up to slide 26)
    2.3, 6.1.1-6.1.7, 6.2-6.5, 7.1-7.5, 8.1-8.3
    9 2/6 (Th) contd. (up to slide 46 of Lecture-5)
    10 2/11 (T) contd. (finished Lecture-5) HW3 - Q1, Q2, Q3 are due.
    11 2/13 (Th) Database design theory - FD, BCNF Lecture-6-notes

    Lecture-6

    (up to slide 13 -- up to here included in midterm)
    3.1-3.2
    2/15 (Sat) HW3 - Q4 and Q5 are due **AT 12 NOON**.
    2/18 (T) Midterm exam
    12 2/20 (Th) BCNF (contd.)

    (end of Lecture 6)
    2/24 (M) Project MS1 is due.
    13 2/25 (T) Storage Lecture-7-notes

    Lecture-7

    (up to slide 20)
    Quiz-2 in class (in groups).
    14 2/27 (Th) Index Lecture-8-notes

    Lecture-8

    (up to slide 15 of Lecture-8,
    finished Lecture-7)
    HW4 is posted (due on 03/04 Wed).
    15 3/3 (T) contd.
    16 3/5 (Th) Query Processing Lecture-9-notes

    Lecture-9

    (up to slide 14)
    Lab-2 in class (index)
    3/10 (T) No class- Spring break
    3/12 (Th) No class- Spring break
    3/17 (T) Class canceled - Extended spring break due to COVID-19
    3/19 (Th) Class canceled - Extended spring break due to COVID-19
    The rest of the class will be conducted online. We would be exploring what would work best for the class taking into account responses to our survey on remote teaching, so our teaching strategy might be updated as we go. The class is meeting on TTh usual time by ZOOM as regular lectures. The class meetings will be recorded for those who cannot join during the class times. The new video-watch assignments require that either you attend the class or watch the videos. The following schedule is tentative. Please check sakai for all announcements.
    17 3/24 (T) Join Algorithms and external sorting Lecture-10 Watch video for 10a and 10b on sakai
    3/25 (W) Lab2 is due.
    HW5 to be posted.
    18 3/26 (Th) contd.
    19 3/31 (T) XML Lecture-11
    20 4/2 (Th) contd. (up to XPATH, slide 27 in Lec-11, last two XPATH slides have examples/functions that can be useful in HW)
    Project MS2 is due.
    21 4/7 (T) contd. (finished Lecture 11) HW5 (Q1ab, Q2, Q3) due.
    22 4/9 (Th) Transaction Lecture-12
    4/13 (M) HW5 - Q4 is due.
    23 4/14 (T) contd.
    +
    A note on SQL injection attack
    SQL-injection-attack

    HW5 - Q5 (Gradiance) is due.
    24 4/17 (Th) Transaction Recovery Lecture-13
    25 4/21 (T) Map-Reduce, Parallel DBMS Lecture-14 HW6 - Q1 (Gradiance) is due.
    4/22 (W) HW6 - Q2 (Gradiance) is due.

    Video-watch assignments due.
    4/24 (F) Final project report, and Project demo link, code, and video by each team due.
    4/27 (Mon) Online Final Exam (9 am to 12 noon), details TBA