Position paper for the "Directions of CS 2" workshop
Mike Clancy, University of California, Berkeley
clancy@cs.berkeley.edu

I won't be able to attend the workshop, but I want to get my two cents in anyway.

I first taught the data structures course in 1979. The data structures we covered then--arrays, linked lists, trees, and hash tables--are those we cover now. Course activities, however, have changed in a variety of ways. Computing resources have improved dramatically; in particular, it takes a much larger problem or data set for a "good" data structure to make a difference, and I have attempted to adjust my assignments appropriately. Activities also include more experimentation, analysis, and use of code we supply. We move through the semester from studying and analyzing the standard data structures--implementations are provided--to using them in large (1500-2000 lines) projects. Where possible, we use case studies to model problem solving and complexity management.

Two recent textbooks aimed at the CS 2 market offer interesting contrasts. Data Abstraction and Problem Solving with C++: Walls and Mirrors, by Frank M. Carrano (Benjamin/Cummings, 1995), is a more traditional treatment. It has chapters titled "Linked Lists", "Stacks", "Queues", "Trees", "Tables and Priority Queues", and "Advanced Implementations of Tables". Its coverage lumps interface, implementation, and application together; there is little if any treatment of applications where more than one data structure might be appropriately chosen. Algorithms, Data Structures, and Problem Solving with C++, by Mark A. Weiss (Addison-Wesley, 1996), has a chapter called "Data Structures" in which interfaces to classes for stacks, queues, linked lists, trees, hash tables, and priority queues are presented in 25 pages. This is followed by five chapters of "Applications" that desribe the design of programs that use the classes. Only later, in several "Implementations" sections, does Weiss deal with the details of how the classes do their things.

My view of the modern CS 2 leans more toward the Weiss model. The important points of a data structures course should not be the details of individual algorithms or representations. Students should also learn to identify efficiency constraints on problem solutions, to understand the tradeoffs between storing and recomputing information and the tradeoffs of using linked vs. contiguous storage to store it, and to identify appropriate data types whose implementations satisfy the problem requirements. Applications where students must wrestle with these issues are a necessary part of such a course.

Our CS 1 courses at Berkeley have for the past several years been organized along the following lines. First, students work in closed lab sections on exercises that cover low-level details of new primitive operations or techniques. Later, they see the primitives assembled or techniques illustrated in the context of a case study. Finally, they apply the techniques in project assignments. These three steps are repeated through the course, and culminate in a project program that pulls everything together. This course organization also seems appropriate for a modern CS 2. Lab activities would involve the use and analysis of container classes and comparison of their behavior with representations covered earlier. Some example exercises would be the following:

Appropriate applications for treatment in case studies might include the design of an inventory manager, a game player, or the boundary tags technique for storage management. Examples of case studies for object-based Pascal programming appear in Designing Pascal Solutions: Case Studies using Data Structures, by Michael J. Clancy and Marcia C. Linn (W.H. Freeman, 1996).

What's the downside of such a course organization? With lab exercises that focus on experimentation and comparison of data structures, plus homework activities that involve choice among a number of implementation alternatives, students don't have a lot of time for broad "coverage" of less central topics. For instance, my course treats techniques for maintaining balance in a binary search tree and for managing a hash table using open addressing only in passing. Basically, depth of coverage wins out over breadth.