Compsci 100, Fall 2011, DNA Splicing

You should snarf assignment dna from http://www.cs.duke.edu/courses/fall11/cps100/snarf or browse the code directory for code files provided. See the dna howto file for help and more instructions.

restriction enzyme
(source: http://www.astbury.leeds.ac.uk/gallery/leedspix.html)

Genesis of Assignments Linked to DNA/Genomics

Rich Pattis created an assignment based on the whole genome shotgun reassembly algorithm as part of the Nifty Assignments Archive, but there's nothing in the archive about the assignment and he hasn't used it in a while. We used that assignment at Duke in Compsci 4G for several years.

This current assignment was developed in 2002, used for two years, then resurrected in the Spring of 2008 and used until 2010. It's back in a very different form though it has evolved over time. These differences leverage discussion of tradeoffs more than the original assignment, and were motivated in part by opportunities (and differences) provided by Java compared to C++. The assignment is meant to illustrate pragmatically the benefits of using a linked list. This version became a nifty assignment in 2009.

Background on DNA, Restriction Enzymes, and PCR

Restriction Enzymes

This background is interesting, but not really needed to do the assignment. There are some good stories here, but if you want to get to the assignment, you can skip this stuff.

In this assignment you'll experiment with different implementations of a simulated restriction enzyme cutting (or cleaving) a DNA molecule. Three scientists shared the Nobel Prize in 1978 for the discover of restriction enzymes. They're also an essential part of the process called PCR polymerase chain reaction which is one of the most significant discoveries/inventions in chemistry and for which Kary Mullis won the Nobel Prize in 1993.

Kary Mullis, the inventor of PCR, is an interesting character. To see more about him see this archived copy of a 1992 interview in Omni Magazine, this 1994 interview as part of virus myth, his personal website which includes information about his autobiography Dancing Naked in the Mind Field, though you can read this free Nobel autobiography as well.

You can see animations and explanations of both restriction enzymes and PCR at DnaTube and Cold Spring Harbor Dolan DNA Learning Center.

The simulation is a simplification of the chemical process, but provides an example of the utility of linked lists in implementing a data structure. The linked list code you'll write and reason about is an example of a chunk list. You can do more work with chunk lists for extra credit.

What You do for This Assignment

In this assignment you're given an interface that represents a strand of DNA. You're also given a simple implementation of the interface that is based on the Java classes String and StringBuilder. You'll test and implement a new class, described below and in the howto, that uses a linked-list rather than strings/stringbuilders.

The interface and your classes will simulate two genomic ideas: reversing a strand of DNA and creating a recombinant strand by inserting new DNA where a restriction enzyme cuts/cleaves the existing DNA. This process is explained conceptually in the howto -- the code models the process. You'll experiment with the existing strand class and then create a new strand class that's much more efficient in simulating cleaving/cutting/joining DNA. You'll run experiments to understand the tradeoffs in the implementation of these classes. In summary, you'll run simulated experiments to reason about the tradeoffs in alternative implementations.

In particular you'll need to do four things. These are described in more detail in the howto -- be sure to look there for details.

  1. Benchmark the code you're given in SimpleStrand.cutAndSplice that correctly finds all occurrences of an enzyme and replaces the enzyme with another strand of DNA. The benchmarking is done by the class DNABenchMark and is explained in more detail in the howto. Your benchmarking report must show that the algorithm/code in SimpleStrand.cutAndSplice is O(N) where N is the size of the recombined strand returned.

  2. Test your benchmarking and O(N) simulation by running out of memory and then re-running the simulation with more memory. See the howto for details.

  3. Design, code, and test LinkStrand that implements the IDnaStrand interface so that your implementation passes the JUnit tests in TestStrand. See the DNA howto for help with JUnit and a detailed description of the linked-list implementation.

  4. You must run virtual experiments to show that your LinkStrand linked-list implementation is O(B) for a strand with B breaks as described below.


Grading

This assignment is weighted by a factor of 2.0. Markov was 1.5, Jotto was 1.0. For this assignment you get analysis points for the writeup, which should be extensive. You'll get engineering points for the implementation of LinkStrand and you'll get algorithmic/correctness points for ensuring things are done correctly.

Submit your project using the assignment name dna.