Duke DBGroup Logo

CPS 196.03: Information Management and Mining
(Spring 2009, Shivnath Babu)

Course information
Course schedule and notes

Course Description

The amount of data generated by businesses, science, Web, and social networks is growing at a very fast rate. This course will cover the algorithms and database techniques required to extract useful information from this flood of data. Data mining, which is the automatic discovery of interesting patterns and relationships in data, is a central focus of the course. Topics covered in data mining include association discovery, clustering, classification, and anomaly detection. Special emphasis will be given to techniques for data warehousing where extremely large datasets (e.g., many terabytes) are processed. The course also covers Web mining. Topics covered include analysis of Web pages and links (like Google) and analysis of large social networks (like Facebook). Programming projects are required.

Time and Place

2:50pm-4:05pm on Tuesdays and Thursdays; D106 LSRC


  1. The textbook for this class is: Data Mining: Concepts and Techniques, 2nd ed. by Jiawei Han and Micheline Kamber.
  2. Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar will be used as a reference.


Shivnath Babu
Email: shivnath at cs dot my_univ. Replace my_univ with duke.edu.
Office: D338 LSRC, Phone: 919-660-6579
Office hours: After class on Tuesdays and Thursdays, or by appointment. It is a good idea to let the instructor know ahead of time, either in class or via email, that you will be coming during office hours. The office hours will be held in the instructor's office: D338 LSRC


Homework Assignments15%
Programming Projects40%

There will be 3-4 written homework assignments. Late homeworks will not be accepted, unless there are documented excuses from a physician or dean.

There will be programming assignments (done individually) and a longer course project (done either individually or in groups of at most two). Details will be presented in class.

Both midterm and final exams are open-book and open-notes.


  1. What are the prerequisites for the course?
    A good understanding of algorithms, data structures, and programming. CPS 100 or equivalent will suffice for sure. If you are unsure, feel free to contact the instructor.
  2. What is the course syllabus?
    This course is new, so the syllabus will evolve as the class progresses. Here are some related classes at other universities. 50% of the material that we cover will overlap with some of these courses:
  3. Is the course mainly about learning theory in depth (e.g., like CPS 130) or is it more about learning basic concepts and then applying them in projects (e.g., like CPS 116)?
    The latter, similar to CPS 116 taught by Prof. Jun Yang.
  4. How will the class be graded?
    Homeworks, exams, and programming projects partitioned roughly as: 15% for homeworks, 40% for projects, 20% for midterm, and 25% for final. There will be a semester-long course project that involves programming. The project may be split into smaller modules for ease of grading.
  5. What programming languages will I have to know?
    You should know one programming language that will enable you to do the semester-long course project. Any of Java, C++, a scripting language like Perl or Python, or Matlab will be enough.
  6. What is the level of effort required?
    Similar to CPS 116 taught by Prof. Jun Yang.

Honor Code

Under the Duke Honor Code, you are expected to submit your own work in this course, including homeworks, projects, and exams. On many occasions when working on homeworks and projects, it is useful to ask others (the instructor or other students) for hints or debugging help, or to talk generally about the written problems or programming strategies. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding and being able to explain on your own all written and programming solutions that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.