A Proposal to the National Science Foundation

NSF Program Officer: John Lehmann, 703-306-1910

Technical topic areas: Ultrascale Computing, Parallel Computing

Proposal title: Prototyping Biomolecular Computations:

Optional Large Scale Implementations

Technical point of contact and PI: Prof. J. H. Reif

Department of Computer Science, Duke University, Durham, N.C. 27707

reif@cs.duke.edu, (919) 660-6568 FAX: (919) 660-6519

Administrative point of contact: Patricia Kimbrough

Office of Research Support, Duke University, Durham, N.C. 27707

patricia.kimbrough@duke.edu, (919) 684-3030, FAX: (919) 684-2418

Existing NSF/DARPA Grant Number: CCR-9725021 (DARPA AO: F359)

Contractor's type of Business: Educational

Current Grant Start Date:01 SEPT 1997, END DATE:31 AUG 2000

Current Budget:

Year 1: (currently expending), $916,006, with $635,221 as subcontracts to 8 Universities. Design of experiments for biomolecular computers; identify applications; laboratory testing of supporting biotechnology for key steps; prototype software simulation system.

Year 2 $883,719, with $634,846 as subcontracts to 8 Universities. Small scale experiments for biomolecular computers and applications; refinement of supporting biotechnology; complete testing of software simulation systems and redesign.

Year 3: $948,292 ($1,276,168 with option), with $634,995 as subcontracts to 8 Universities. Completion of laboratory experiments for biomolecular computers and applications; final reports on prototype demonstrations; completion of software simulation systems.

TOTAL CURRENTLY FUNDED BASE COST: $2,748,017

(represents direct cost + 54% indirect cost)

Summary of Proposed Additional Options to Budget

Year 2: Proposed Additional Option: $236,783 additional base cost ($1,120,502 total with current budget), for further experiments for biomolecular computers emphasizing problems arising from increasing scale.

Year 3: Proposed Additional Option: $327,876 additional base cost ($1,276,168 total with current budget) for further experiments of biomolecular computers solving problems of large scale.

TOTAL OF ALL PROPOSED OPTIONS: $564,659

TOTAL WITH ALL PROPOSED OPTIONS: $3,312,676

Section A: PROJECT SUMMARY

The overall goal of the current grant is to develop and demonstrate Biomolecular Computation (BMC) employing bio-technology to do massive parallel computing at the molecular scale (also known as "DNA and RNA based computing"). The key tasks of the project are: experimental demonstrations, construction of new 2D and 3D nano-structures, applications to biology, and software tools. Milestones achieved to date include: Designed and completed small scale experiments for BMC, constructed moderate scale nano-structures consisting of DNA crossover molecules that self assemble into moderate size lattices, identified applications: e.g., "wet" DNA databases, completed laboratory testing of supporting biotechnology for key steps, prototyped software simulation system. The approach of most of our BMC experiments is to encode large amounts of binary data (e.g., possible solutions to search problems) within distinct DNA or RNA strands, and then process this data in massively parallel fashion to execute arithmetic and logical operations on the data. Potentially, this approach provides for the solution of computationally hard problems that are beyond the most optimistic extrapolations of current (silicon-based) systems. But the current support provides for only small scale experiments.

We propose to exercise options in year 2 and 3 (written into the existing grant) which will allow the scale of our on-going prototype implementations to be increased to large scale. The extension of our BMC experiments from small scale to large scale potentially involves expenditure of considerable resources:

We therefor propose to introduce automation into DNA and RNA based computing, to eliminate human error from the liquid handling operations that are essential to any form of DNA or RNA based computing in vitro. Existing protocols for DNA and RNA based computing will be fully implemented as automated operations in molecular computation of solutions to small NP-search problems and beyond. The reduction and eventual elimination of steps that are prone to human error will be a great step forward in the fidelity of DNA computing. The throughput that can be achieved by an automated BIOMEK workstation and operation of DNA synthesis machinery will greatly accelerate the pace of DNA computing, far beyond its present limits. In particular, we request additional funding to be used as follows:

The automation and synthesis machinery will be used to do integrated large scale BMC experiments and demonstrations for:

Section B: Table of Contents

Section A: Project Summary

Section B: Table of Contents

Section C: Project Description

Section D: References Cited

Section E: Biographical Sketches

Section F: Summary Proposal Budget

Section G: Current and Pending Support

Section H: Facilities, Equipment and Other Resources

 

Section C: Project Description

Previous Accomplishments by PI: The Current Project

The overall goal of the current grant is to develop and demonstrate Biomolecular Computation (BMC) employing bio-technology to do massive parallel computing at the molecular scale (also known as "DNA or RNA based computing").

Key Tasks

The key tasks of the project are:

Our Approach

The approach of most of our BMC experiments is to encode large amounts of binary data (e.g., possible solutions to search problems) within distinct DNA or RNA strands, and then process this data in massively parallel fashion to execute arithmetic and logical operations on the data. Potentially, this approach provides for the solution of computationally hard problems that are beyond the most optimistic extrapolations of current (silicon-based) systems. But the current support provides for only small scale experiments.

Milestones Achieved to Date

These include:

For Further Details of Our NSF/DARPA Grant on Prototyping Biomolecular Computations, See web pages:

http://www.cs.duke.edu/~reif/BMCA.html

http://www.darpa.mil/ito/Summaries97/F359_0.html

 

Requested Optional Support for Large Scale Implementations

We propose to exercise options in year 2 and 3 (written into the existing grant) which will allow the prototype implementations to be increased to large scale. The extension of our BMC experiments from small scale to large scale potentially involves expenditure of considerable resources:

We therefor propose to introduce automation into DNA and RNA based computing, to eliminate human error from the liquid handling operations that are essential to any form of DNA or RNA based computing in vitro. Existing protocols for DNA and RNA based computing will be fully implemented as automated operations in molecular computation of solutions to small NP-search problems and beyond. The reduction and eventual elimination of steps that are prone to human error will be a great step forward in the fidelity of DNA computing. The throughput that can be achieved by an automated BIOMEK workstation and operation of DNA synthesis machinery will greatly accelerate the pace of DNA computing, far beyond its present limits. In particular, we request additional funding to be used as follows:

The automation and synthesis machinery will be used to do integrated large scale BMC experiments and demonstrations for:

Organization of the Project Description

This Project Description is now partitioned into a Section 1 describing the using the automation and synthesis equipment, and a further Section 2 describing the research which will use the automation and synthesis equipment to solve large scale BMC experiments.

Section 1: Automation and Synthesis Machinery

Here we describe the proposed automation and synthesis machinery and it’s operation. The automation and synthesis machinery will be used to do large scale BMC experiments for:

Section 1.1: Automation of BMC Computations

Summary: We request additional funding to be used for purchase of an automated BIOMEK workstation to be located at Princeton University) for laboratory automation of large BMC experiments for computationally hard problems, and for additional personnel and material costs in support for these large BMC experiments.

Introducing Automation: We plan to introduce automation into BMC computing, to eliminate human error from the liquid handling operations that are essential to any form of BMC computing in vitro. Existing protocols for DNA and RNA based computing will be fully implemented as automated operations in molecular computation of solutions to small NP-search problems and beyond. The reduction and eventual elimination of steps that are prone to human error will be a great step forward in the fidelity of DNA computing, and the throughput that can be achieved by an automated BIOMEK workstation will greatly accelerate the pace of DNA computing, far beyond its present limits.

Previous Use of Automation: Andrew Ellington and colleagues at the University of Texas at Austin (formerly at Indiana University) have developed a prototype device for the automation of in vitro selection experiments. They have already successfully used this robotic machinery to identify high affinity ligands (aptamers) from large combinatorial pools of nucleic acids (Cox and Ellington, to appear). More importantly, this instrumentation makes use of existing Biomek technology that can be purchased from Beckman, and the programs and protocols are readily implemented into RNA or DNA based computing. The key difference between existing protocols for solving the knight problem, for example, and the automated aptamer selection protocols is that the robotic arm is not capable of performing separation of short DNA oligomers from full-length RNAs by spin chromatographjy. Magnetic bead extraction procedures, which Adleman used in his original experiment (1994) and which others have used in a variety of manual and automated selection systems, are therefore the separation method of choice to use with automation, as magnetic bead separation requires no moving parts (such as centrifugation steps) to separate bound from unbound nucleic acids. Therefore these existing protocols will be fully implemented as automated operations in molecular computation of solutions to small NP-search problems and beyond.

 

Section 1.2: Automation of Synthesis of DNA for BMC Computations

Summary: We request additional funding to be used for the operating cost (salary, benefit, and overhead for a technician) of equipment (a MerMade DNA synthesizer already purchased and located at University of Delaware) for automated synthesis of DNA oligonucleotides (oligos) to be used by our BMC experiments. As a result, the project members will have oligos synthesized at about a quarter of the usual price, and we will be able to significantly scale up our on-going experiments with economically, minimizing by costs.

The DNA synthesizer

The University of Delaware Center for Agricultural Biotechnology (UDCAB) has purchased a MerMade DNA synthesizer developed at the Southwestern Medical Center in Dallas (http://gestec.swmed.edu/mermade.htm). The instrument is a basic liquid handling, with 2-D motion of a set of 12 valves under the control of Macintosh based - Labview networked software with a simple user interface. It uses a standard 96 well format for parallel synthesis of up to 192 primers and XY table addressing of individual wells. The system operates under an Argon atmosphere similar to glove box environment used in microbiological research with anaerobic organisms. It has trietyl monitoring as well as capability for using one modified base in addition to the usual four. Vacuum aspiration from below is used to purge reagents. Processing time is 8 to 22 hours (depending on the number and length of oligos). Synthesis scale is adjustable down to a minimum of 15to 20 nM. The UDCAB also has a Beckman MDQ capillary electrophoresis (CE) system with a diode array detector a detector and a UV detector for quality control (this step would add one more day to the delivery time once the facility is in full operation).

Background: The Stanford facility

The Stanford University Protein and Nucleic Acid (PAN) facility operates a similar instrument. The Stanford facility is exclusively operated for Stanford faculty, but it provides us with a yardstick to plan our proposed operation. At PAN they charge $0.25/base for orders of 96 20-25mer primers and a rate of $0.30/base for smaller orders. PAN does not include a quality control in its fee-for-service operation. We feel this is an unacceptable omission.

Operating costs The MerMade instrument requires that synthesis supports be manually delivered into 96-well plates with filter bottoms. One full time person is required to operate the instrument. This person is required to have a significant level of mechanical aptitude as well as experience in organic chemistry. Assuming 100% capacity and 15 nMole scale synthesis of 20-mers, the annual reagent consumption of this instrument is expected to be in the range of $120,000. Assuming no primer longer than 20 bases is produced, the maximum throughput of the instrument is expected to be in the range of 40,000 oligonucleotides/year. Note, this cost does not include labor, quality control, modified bases or oligonucleotide purification. The cost of labor & overhead distributed over 40,000 primers is estimated to come to $1.50/primer. Quality control (one CE run for each oligonucleotide) is also expected to come to $1/primer. All of these costs would suggest that at 100% efficiency, our costs would be about $0.28/base.

Use of the Synthesis Facility

The University of Delaware will provide a flat rate of $0.25/base to the members of the DARPA/NSF grant for Prototyping Biomolecular Computations, the same rate as the in-house PAN facility at Stanford. This cost takes into consideration that we will not be 100% efficient but it requires that we reach an efficiency of 80% (i.e., minimize our chemical loses to that level). It also accounts for prepaid labor for the next two years.

Some modifications will be needed to the MerMade system in order to accommodate operation of a DNA synthesis core facility operation (e.g., tubing with frits to protect against clogging etc..). These modifications as well as starting reagents would be required to get the facility operational sometime in the fall of 1998.

This will required partial funding of a Oligo synthesis , startup mods and reagents.

Proposed policies

Considering the characteristics of the instrumentation, we propose to follow a set of oligonucleotide production policies in order to guarantee the optimal operation of this resource.

(P1) A minimum of 96 oligonucleotides will be required before a synthesis is initiated (1/2 of the instruments capacity). These may come from a combination of many orders.

(P2) A web-site will be created with 96 slots for the cumulative insertion of the sequences of the primers to be synthesized and their names using cut & paste. Network members will get a userID and password to enter the web-site.

(P3) Orders will be filled as groups of 96 oligos are submitted.

(P4) Primers longer than 25 bases will be reallocated to special syntheses by the technician in order to avoid delays in the production of oligos with normal lengths. There will be a special synthesis web-page for these orders.

(P5) The University of Delaware Center for Agricultural Biotechnology (UDCAB), which owns the facilities to be used, will have priority in filling orders.

(P6) Groups that do not belong to the Consortium for Prototyping Biomolecular Computations will be charged a start-up fee that covers the labor cost ($1.50/oligo), and will have a lower priority.

(P7) Shipping charges will be paid by the clients.

Proposed Delivery of Services.

In turn we (University of Delaware) will:

(D1) Perform basic quality control on each oligo (cost included in the pricing structure). Oligos that conform to the QC test will not be resynthesized at our cost. (D2) Certify the length distribution of the products

(D3) Insure that the initial nucleotide at the 3' end of the oligos corresponds to the requested sequences by using the Universal CPG synthesis support.

(D4) Up to one modified base in addition to the usual A, C, G or T may be requested on orders of 96 primers or more. Orders with modified bases will be considered as special syntheses and the modified base will have an additional cost (to be estimated on a case-by-case basis, to reflect our additional costs).

(D5) Although we will produce amino-blocked primers, we will not be able to perform any post-synthesis modifications of the amino-blocked moiety That is, all requested modifications must be possible as on-instrument protocols. These can be found on the web site http://gestec.swmed.edu/mermade.htm

Supervisory Board . The operation of the facility will be directed by a supervisory board:

Bertrand Lemieux (administrator of the facility).

Junghuei Chen (a trained chemist & molecular biologist on-site).

Harvey Rubin (the closest biomedical scientist co-PI)

Greg Rumsey (Assist. Dean of Agriculture for operations).

Section 2. Proposed Large Scale BMC Experiments

Section 2.1: Towards Large Scale DNA and RNA Based Computing

Landweber, at Princeton University has expanded the field of "DNA Computers" to RNA. We propose to use these and other tools to tackle a wide class of mathematical search problems. The versatility of RNA is that it allows 'bit'-operations to be performed using specific ribonuclease, ribozyme, or deoxyribozyme digestion of target RNA molecules, leading ultimately to the construction of a purely nucleic acid "computer". We have already developed and synthesized a ten-bit combinatorial RNA pool and used it to find solutions to a nine-bit SAT problem. This represents a pool of significant complexity, containing 1,024 (210) different strands; however here we propose to scale up to the construction of larger nucleic acid libraries encoding 25 bits or more, using the existing ten-bit library as a scaffold. We plan to introduce automation into DNA and RNA based computing, to eliminate human error from the liquid handling operations that are essential to any form of DNA or RNA based computing in vitro.

Using RNA in DNA Based Computing

In 1994, Adleman introduced DNA based computing as an approach to solving mathematical problems which uses DNA as a data carrier and techniques of molecular biology to operate on DNA. The goal of much of the on-going work has been to tackle problems with DNA that contain the solution complexity larger than Adleman's initial experiment (Ouyang et al., 1997). Elimination of contaminating DNA presents a challenge. We therefore developed a destructive algorithm (Amos et al., 1996) to hydrolyze strands that do not fit the constraints of a mathematical problem. By introducing complementary DNA oligonucleotides, we can selectively mark RNA strands for specific Ribonuclease (RNase) H digestions, as this enzyme requires a double-stranded RNA-DNA duplex as its substrate. Use of a thermostable RNase H enzyme maximizes the specificity of hybridization between DNA and RNA strands and minimizes incorrect marking of non-specific strands.

RNA libraries of Restriction Enzymes. Recently, Ouyang et al. (1997) implemented a restriction enzyme approach to solve a 6-vertex instance of the maximum clique problem (another NP-complete problem). Unique restriction enzyme digestion sites were embedded in all bits set to one. Sequential digestions of this library removed all DNA strings that did not satisfy the constraints of the problem. The use of restriction enzymes in the computing scheme is logical as they implement a binary-like digestion operation. Each restriction enzyme cleaves only in the presence of its recognition site, and cleaves the double stranded DNA sufficiently to completion. However, information can only be encoded using the set of available restriction enzymes. An RNase H approach, on the other hand, offers a "universal RNA restriction enzyme" which can be generalized to cleave almost any RNA sequence in combination with the correct complementary DNA.

Large RNA based combinatorial libraries. The use of an RNA based combinatorial library could also facilitate construction of an entirely nucleic acid based computer. Many DNA and RNA enzymes (deoxyribozymes or ribozymes, respectively) exist that can specifically cleave RNA, and the recognition sites are generally governed by a set of simple hybridization rules. Thus we can engineer nucleic acid enzymes with novel substrate specificity. In such a DNA or RNA computer, all of the computing steps (both data storage and operations) would be performed by nucleic acid oligonucleotides. For example, Santoro and Joyce created a general purpose RNA cleaving DNA enzyme, whose RNA cleaving site can be changed by modifying the RNA binding "arms" of the DNA (1997). They reported 98-99% cleavage maximums, which should be sufficient to allow molecular computations on nucleic acids; however, use of RNase H proved most efficient and versatile in our case, because it requires the presence of only one protein enzyme in combination with almost any complementary DNA sequence.

Design of a Large Combinatorial RNA Library and Constraints on the Pool

In order to test the utility of the RNase H approach for computing, we first developed and synthesized an appropriate RNA pool. A ten-bit pool size was prepared for initial experiments. Each strand of RNA follows the template for an n-bit library, as given in Lipton (1995).

The prefix and suffix are a standard set of PCR primers in our laboratory, with the prefix in DNA containing the T7 promotor for in vitro synthesis of RNA. Each bit can be set to an 'on' or 'off' sequence. Most importantly, this is an expandable construction. To make a larger pool by extension of the existing pool, we insert a 'stuffer library' (generated in the same fashion, as pieces containing up to 15 or 20 more bits (so 25 or 30 total) between bit 10 and the suffix, following the existing template. Thus the 25-bit library would have the organization: [PREFIX-1-2-3-4-5-...-24-25-SUFFIX]. This preserves the original library intact, and is sufficiently large to solve complicated 20-23 bit SAT problems. It also allows us to ignore any bit positions that don't digest reliably, analogous to the problem of ignoring "bad blocks" on computer disks.

Using computer simulations, we incorporated three major criteria into the design of both ten- and 25-bit combinatorial pools (with suggestions by Rothemund and Adleman):

  1. Bit sequences were selected to have maximum Hamming Distances between library strands and parts of individual strands, as well as similar melting temperatures.
  2. Strands were biased to contain minimal secondary structure by using a three letter alphabet for bits and spacers: A, C, and U. This eliminates both G-C pairs and G:U pairs as well as G stacking.
  3. Strands were controlled to avoid hybridization to themselves or to other library strands, as such interactions would interfere with operations on the RNA strands.

In order to satisfy all of these criteria, the pool sequences were initially generated using random nucleotides to satisfy the second criterion, and then a simple computer program (PERMUTE -available from www.princeton.edu/~lfl/research.html) was designed and implemented to permute the sequence until it fulfilled all of the above criteria.

Streamlined Modular Construction of 10-, 25-, and N-Bit Combinatorial Libraries . We synthesized a DNA library encoding all strands of the ten-bit RNA library, as only two halves. The first half contained bits one through six, and the second half contained the complements of bits six through ten. To insert bits 11 through 25, we will synthesize three additional sub-libraries containing bits 10 - 15, the complements of bits 15 - 20, and bits 20 through 25, each in a single step. Note that this approach scales linearly to synthesize any n-bit library from n/5 or n/6 pieces by generating six or seven bits at a time (with one-position overlap).

In order to construct each modular piece of the combinatorial library without synthesizing each strand individually, we devised a novel mix and split synthesis. Bit j set to 0 was synthesized on one column and bit j set to 1 was synthesized on another column. Both columns were then decrimped and their contents mixed. Then half of this pool was returned to each column and placed back on the DNA synthesizer. The partially synthesized molecules were further extended by synthesis of the spacer and the next bit set to 0 on one column, and the spacer and the next bit set to 1 on the other column, and the process repeated. In this way, linear mixing and splitting steps yield exponential increases in pool complexity. Primer overlap extension of the two halves created the full-length ten-bit library, since the pieces are complementary at bit position six. Direct DNA sequencing confirmed the expected degeneracy at each bit, and in vitro transcription of the DNA library yielded the RNA library. Similar annealing and extension of all five modular pieces will correspondingly generate the 25-bit library from these streamlined components.

Proposed Demonstration of Large Scale BMC Computing:

An Algorithm for Solving SAT on an RNA Computer. Our approach to solving SAT problems uses successive RNase H digestions of "illegal" combinations of bit positions, which differs from Lipton's proposed scheme of DNA hybridization "extractions" (Lipton, 1995). Each constraint on position k is executed as follows: We divide the pool molecules into two equal collections. In one test tube we select those strands that contain a 1 at position k by digesting those strands with position k set to 0 (setting bit k to 1). At the same time, we destroy any 1's at those bit positions which cannot be set to 1 at the same time as position k, by a similar procedure (setting these bit positions to 0). In the other test tube, we perform the reciprocal operation setting bit position k to 0. Afterwards the contents of the tubes are mixed and then split again for the next procedure. At the conclusion of the protocol, remaining full-length strands that have survived the bit-selection are gel-isolated and recovered by reverse transcription and PCR.

"Read out" is simple and requires a single PCR of each bit per clone. We are currently developing a multiplex procedure for readout in only two lanes (one for 0 and one for 1) which efficiently combines several PCR primers (such as all 1-bit complements) in one tube.

We demonstrated that RNase H efficiently digests RNA molecules with unique bit settings and that the short DNA oligonucleotides used to perform bit operations can be successfully removed by spin column chromotography. We assayed the extent of digestion for a zero or a one at each bit position in individual clones (0/1 at each bit). One of the bit interactions turned out to be problematic, and such results explain many of the errors reported in initial experiments (Cukras et al. 1998).

If the digestion steps work to completion, then a single execution of the above algorithm would adequately produce solutions to the SAT problem. However, one of the most important and distinguishing features of nucleic acid based computing is that algorithms that incorporate iterative cycles of selection allow enrichment of the number of solution strands in each round, a form of progressive error correction (Stemmer 1995, Landweber 1998, Boneh et al. 1998). This places a reasonable constraint on the efficiency of strand digestion in order to achieve computation. Thus, as 225 < 108 is still considerably smaller than the number of RNA molecules that in vitro selection protocols typically search (1015), the shift to an in vitro selection protocol on optimized 'bit' interactions will ultimately allow recovery of winning molecules from even more dilute solutions.

2.2 DNA Nano-Structures: Computations via Self Assembly

The Need for a Parallel DNA Synthesizer

There are a number of projects being pursued by the consortium, but the use of DNA modules as the components of cellular automata is the one that has the largest strand requirements. This work will establish an understanding of the computational aspects of molecular self-assembly and of how self- assembly can be controlled by thermodynamic parameters via DNA sequence design.

DNA Self-Assembly

Winfree has proposed that it is possible to do computing by self-assembly [Winfree,1995]. Self-assembly is an important mechanism for molecular computation for two reasons. First, self-assembly is capable of performing Turing- universal computations in a single experimental procedure, potentially eliminating costly sequential processing steps from previous approaches (Adleman, 1994). Second, the ability to control the self-assembly process for synthetic DNA molecules allows us to create novel chemical nanostructures. This research proposal therefore suggests exploration into molecular computation by self-assembly along two lines: first, an experimental demonstration of a 2-symbol cellular automaton, and second, theoretical investigations of self-assembly guided by the experimental findings. Critical to both endeavours will be careful thermodynamic analysis of DNA double-crossover molecule self-assembly.

Summary of Previous Experimental demonstration of programmable self-assembly. Previous research has taken a first step toward programming molecular systems by showing how rational sequence design can be used to program DNA to self-assemble into specified 2D crystalline lattices. [Winfree,95] developed a model of computation by self-assembly, established the validity of the model by simulations of chemical kinetics, and experimentally demonstrated the programmed self-assembly of DNA. The self-assembly of DNA molecules bears the hallmarks of a universal computer. On the one hand, DNA self-assembly is predictable: associations are determined by the logic of Watson-Crick complementarity. On the other hand, DNA self-assembly is potentially very complex: in addition to linear double helices, DNA can form non- canonical structures such as hairpins, 3-way junctions, Holliday (4- way) junctions, and double-crossover molecules. This research related DNA self-assembly to computation by studying it on three levels: models of computation show DNA self-assembly to be Turing-universal, simulations identify conditions for error-free self-assembly, and experiments produced two-dimensional DNA crystals with over one million DNA molecules arranged according to the programmed pattern.

Winfree began experiments in collaboration with Nadrian Seeman in computation by self-assembly of DNA into two dimensional lattices. Winfree designed and performed all the experiments described here, except as noted otherwise. Three goals delineate the experimental progress. The first goal was to demonstrate that a single logical step can occur, i.e. that a correct DNA unit can outcompete a partial match at the growing edge of the lattice. This goal was achieved in a model system (Winfree, Yang, Seeman, 1996). The second goal was to demonstrate that the conceived structural basis for a 2D lattice of DNA will self assemble. A system of two double-crossover (DX) units was designed to self-assemble into a periodic 2D lattice. A hairpin sequence was inserted into one of the units to provide a topographic contrast agent, resulting in 25 nm stripes in the lattice. Winfree have confirmed both lattices by atomic force microscopy (AFM), with supporting evidence from ligation studies. In Seeman's lab, a similar system of two larger DX units has produced lattices with 32 nm stripes; the design was elaborated to a four-unit system producing 64 nm stripes, clearly demonstrating the programmable nature of the DX system. These results were reported in (Winfree, Liu, Wenzler, Seeman, 1998).

DNA Lattices Constructed by Self-Assembly

The feasibility of constructing the 2-dimensional self-assemblies proposed has been demonstrated by Winfree and the Seeman laboratory [Winfree, Wenzler Seeman, 1998]. These are homogeneous lattices, simple periodic arrays that contain topographic markers visible in the AFM Winfree, Seeman and their colleagues have demonstrated that they can direct the production arrays with predictable mesoscopic structural properties by means of sticky-ended self-assembly. The essence of the experiment is illustrated below:

Antiparallel DNA double crossover molecules (DX) have been prepared, and they function as the tiles in the illustration above. DX molecules consist of two double helices, joined at two places; they have two possible topologies, and the one used in the experiment above is called DAE. The strand structure of a DAE molecule is shown below.

It is clear from this diagram that the DAE molecule consists of 5 strands of DNA. The crossover points by two turns, rather than by one, which the figure indicates.

The two helices of the DAE molecule are represented in the figure as two closed elongated objects, joined by two small lines, representing the crossovers. The ends of the elongated objects contain a variety of geometrical shapes. These represent, in geometrical forms, different sticky ends on the DNA. In the top portion of the figure, two species of DAE are shown, A and B. The sticky end on to upper right part of A is complementary to the one on the lower left part of B. Likewise, the sticky end on the lower right part of A is complementary to the upper left part of B. Similar relationships exist between the left ends of A and the right ends of B.

B is shown to be different from A in more than its sticky ends. In fact, we call it B*, because it has a large circle in the middle of it. This circle is a schematic representation of two DNA hairpins, one oriented out of the plane of the helix axes, and one oriented into that plane; this motif is called DAE+2J. Perpendicular hairpins can act as topographic labels in Atomic Force Microscope (AFM). They are a distinct feature that can be detected readily by this technique.

The individual A and B units each constitute tiles of dimension 4 x 16 nm, with a thickness of 2 nm. When the A and B* units are annealed, they form 2-D sheets. The extra hairpins are not resolved in the 4 nm direction, so they appear as stripes. The separation of stripes in the long direction is about 33 nm, approximately the length predicted, because the only the B* units contain the DAE+2J motif.

Similar to many microscopic techniques, AFM is potentially susceptible to artifacts. The 33 nm separations in the image of the AB* array is what we expect, but this result has been challenged by preparing a second array, this time containing 4 units. These are shown in the figure as A, B, C, and D*. There are 4 interfaces of complementary sticky ends, rather than 2 as in the AB* array. As suggested by the name, only D* contains the DAE+2J motif. The separation of the stripes seen in the AFM is about 65 nm, again as predicted.

Thus, it is possible to produce predictable and observable structural changes on the mesoscopic scale by altering the DNA base sequences at the sticky ends. There is no apparent limit to the number of nanofabricated periodic 2-D patterns that one could produce by this methodology. Furthermore, Seeman's laboratory has demonstrated recently that patterns can be modified; the topographic features can be restricted to remove a stripe, and they can be added to sticky ends by DNA ligase [F. Liu & N.C. Seeman, paper in preparation].

The New Triple Crossover Molecule (TX)

[LaBean , et al,98] describes the construction at Duke University of a DNA triple crossover molecule (TX) complex which extends the set of experimentally tested building blocks of DNA self-assembly. It consists of four oligonucleotides hybridized to form three double-stranded DNA helices lying in a plane and linked by strand exchange at four, immobile crossover points. Nucleotide sequence design for the synthetic strands was assisted by application of a genetic algorithm which minimized possible alternative base-pairing structures. Synthetic oligonucleotides were purified, stoichiometric mixtures were annealed by slow cooling, and the resulting DNA structures were analyzed by non-denaturing gel electrophoesis and heat-induced unfolding. Ferguson analysis and hydroxyl radical footprinting provide strong evidence for assembly of TX complex to the desired structure. Future applications of TX units are enumerated, for example construction of larger structures from multiple TX units, and for DNA-based computation. Since the TX molecules accommodate reporter strands, it makes the molecule valuable for use for output of data in DNA-based computing, as follows: (i) first a structure consisting of multiple self-assembled TX molecules is constructed, (ii) the reporter strands of contiguous TX molecules are ligated, (iii) then the structure is melted, and (iv) the ligated reporter strands are isolated. These reporter strands provide confirmation of the assembly and allow for determination of the outputs of the computation.

Proposed Research Agenda in DNA Self Assembly

Proposed Large Scale Experimental demonstration of programmable self-assembly. We propose also to do large scale experimental tests of the molecular computation by self-assembly of DNA units. Previous theoretical work has established that self- assembly is capable of universal computation, and experimental studies have shown that simple DNA systems can be designed to self-assemble into the desired 2D lattices. We will extend this work using a theory/experiment closed-loop approach in which realistic simulations are designed and laboratory experiments are performed to determine thermodynamic parameters of the system. This approach will allow us to:

Experiments Planned to Study Error-Control

The nanofabrication system described above provides a means to test Winfree's self-assembly proposals in a less stringent system. Computationally useful self-assembly will be prototyped in a periodic system. A periodic system is not as sensitive as a computation to the intermediate accumulation of unwanted intermediate products, because self-assembly can occur in the forward or backward direction. Ironically, the system proposed will have large, but not enormous strand requirements. However, one must first make control arrays (to know whether the experimental systems are working properly) and they will have large strand requirements.

Small Control Arrays: In order to make a 5 x 5 tiled pattern, one must make 25 different DAE molecules with 5 strands apiece; it goes up from there. That is 125 strands. In fact, something more like a 10 x 10 array is needed for a good test. In general, for an array n x m, 5 nm strands are needed. This is almost a prohibitive number of strands without the high capacity parallel synthesizer.

Experimental System: The key assumption in tile self-assembly is that all sticky ends have impact on the assembly result in each case. In the control system above, all sticky ends are unique. Shown below is a 5 x 5 periodic array repeated twice in each direction.

Individual DX tiles are represented as rectangles with two different colors of squares on their ends, representing a sticky end (on the left side of the tile) and its complement of the same color on the right side. Thus, the tile at the far left has a red sticky end on the upper left and its complementary red sticky end on the lower right; similarly, it has a cyan sticky end on its lower left and upper right. The red sticky ends continue on a rightward and downward course throughout the drawing until the end is reached at the bottom middle. These are all the same sticky ends. The cyan sticky ends continue on a rightward and upward course to the top middle. Each tile is unique. However, the two strands needed to make the red sticky ends are the same in all the tiles containing them. Likewise the cyan tiles. The diagram contains 10 colors, 4 parallel to red (magenta, purple, orange and yellow) and 4 parallel to cyan (light green, middle green, blue and dark green). This array could be made with 50 strands instead of 100. In general, using this approach an array can be made with 5 (n + m) strands.

This approach is basically a test of Winfree's self-assembly paradigm in a periodic system. By using this system, it will be possible to calibrate the self-assembly protocols necessary, and to resolve issues like Reif's [Reif, 1997] suggestion that cycle numbers or frames are necessary in order to achieve self-assembly [Winfree, Wenzler Seeman, 1998]. This is a practical intermediate stage on the way to computation by self-assembly: When assembly is correct, the designated pattern will appear in a periodic array. We will know what the pattern must look like from the control arrays that we confident can be assembled; this confidence is based on the success of the homogeneous lattices already produced. When periodic mesoscopic patterns can be produced routinely, we will know that computing by self-assembly is robust.

Integer Addition by Self-Assembly.

We plan to do massively parallel arithmetic by the method of self-assembly of DNA tiles. We developed in 1998 a binary addition assembly design using the TX tile described above. The size of the numbers to be added will be in the range of 12 to 16. The inputs will be constructed by random assembly, and there is expected to be in the range of 10^15 inputs (including repeated inputs). We are now developing an experimental test of massively parallel DNA assemblies for integer addition with randomly constructed inputs (Extension to parallel integer multiplication is also planned, to give solution of integer factorization problems and to be used for decryption of the RSA crypto-system.)

Experimental questions in 2D Assemblies. Our next research goal will be to experimentally demonstrate the algorithmic self-assembly of a DNA Sierpinski triangle. This will demonstrate that the correct logical sequence of self-assembly can be achieved, as necessary for arbitrary cellular automaton computations. Because the Sierpinski triangle is not a periodic arrangement of matter, its formation involves new principles beyond those required for the self-assembly of periodic matter. Critically, in addition to cooperative binding of DX units, it must be possible to seed the crystallization process by a special DX seed unit. This control is to be achieved by designing the sticky- end sequences -- and thus the binding energies -- such that at the critical temperature, well-formed crystals are stable but other aggregates are not stable and will dissolve.

Winfree has already designed an initial system of eight DX units to implement the Sierpinski triangle calculation. This system consists of over 28 oligonucleotides. Winfree will now investigate the self-assembly process experimentally, and improve the simulation and sequence design software to incorporate the new experimental data. It will be necessary to experimentally characterize the thermodynamics of the Sierpinski system, including the melting temperatures and formation kinetics of each double-crossover unit and of each desired and undesired self-assembly interaction. This will be done by taking UV melting curves and by temperature gradient gel electrophoresis (TGGE). The optical measurements will be used to study the melting of individual DX units, and will provide independent calibration of the TGGE results. The TGGE will be used to measure the strength of the associations between DX units; dissociation temperatures can be determined by this method even where the UV absorbance signal would be very weak. Once the thermodynamic parameters have been measured in the current system of DX units, they will be incorporated into the simulation program to allow quantitative comparison to experiments. Interactions between DX units will also be characterized qualitatively by AFM; pairs of DX designed to have no interactions should produce no observable aggregates; the DX designed to form the boundary of the triangle should produce linear aggregates; and so forth. The 2D DNA lattice encoding the Sierpinski calculation -- all 8 units -- will be visualized by AFM, using the same contrast mechanism employed for the periodic lattice above. This work will establish an understanding of the computational aspects of molecular self-assembly and of how self- assembly can be controlled by thermodynamic parameters via DNA sequence design.

 

Section D: References Cited

1998 Publications of John Reif

1. John H. Reif, Local Parallel Biomolecular Computation, Third Annual DIMACS Workshop on DNA Based Computers, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1998).

2. John H. Reif and Ashish Gehani, Microflow Bio-Molecular Computation, 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania , June, 1998. Invited to special issue of BIOSYSTEMS, 1999.

3. John H. Reif, Alternative Computational Models: A Comparison of Biomolecular and Quantum Computation,18th Invited paper, International Conference on Foundations of Software Technology and Theoretical Computer Science (FST&TCS98), (December, 1998).

4. John H. Reif, Parallel Molecular Computation: Models and Simulations. Algorithmica, special issue on Computational Biology, 1998.

5. John H. Reif, Paradigms for Biomolecular Computation, First International Conference on Unconventional Models of Computation, Auckland, New Zealand, January 1998. Published in Unconventional Models of Computation, edited by C.S. Calude, J. Casti, and M.J. Dinneen, Springer Publishers, January 1998, pp 72-93.

6. John H. Reif and Zheng Sun, Nano-Robotics Motion Planning and Its Applications in Nanotechnology and Biomolecular Computing, NSF Design and Manufacturing Grantees Conference, Jan 5-8, 1999.

7. John H. Reif and Hongyan Wang, The Complexity of the Two Dimensional Curvature-Constrained Shortest-Path Problem of the Curvature-Constrained Shortest Path problem, Third International Workshop on Algorithmic Foundations of Robotics, , Workshop on the Algorithmic Foundations of Robotics (WAFR98), Houston, Texas, June, 1998.

8. John H. Reif and Zheng Sun, The Computational Power of Frictional Mechanical Systems, Third International Workshop on Algorithmic Foundations of Robotics, , Workshop on the Algorithmic Foundations of Robotics (WAFR98), Houston, Texas, June, 1998.

9. John H. Reif and Hongyan Wang, Social Potential Fields: A Distributed Behavioral Control for Autonomous Robots, To appear in Robotics and Autonomous Systems, (1998).

10. John H. Reif and Sandeep Sen, Parallel Computational Geometry: An approach using randomization, A Chapter in Handbook in Computational Geometry, Edited by Jorge Urrutia and Jörg-Rudiger Sack, Elsevier Science Publishing, Amsterdam, the Netherlands. 1998.

11. John H. Reif, Robust, Adaptive and Dynamic Robotic Motion Planning, 1998 NSF Design and Manufactoring Grantees Conference, Monterrey, Mexico Jan 1998.

12. John H. Reif and James Storer, Optimal Lossless Compression of a Class of Dynamic Sources, Data Compression Conference (DCC'98) , Snowbird, UT, March, 1998.

13. John H. Reif . Synthesizing Efficient Out-of-Core Programs for Block Recursive Algorithms using Block-Cyclic Data Distributions, IEEE Transactions on Parallel and Distributed Systems, to appear 1998.

14. John H. Reif, Approximate Complex Polynomial Evaluation in Near Constant Work Per Point. Accepted to Siam Journal of Computing (SICOMP), 1998.

15. John H. Reif, Efficient Approximate Solution of Sparse Linear Systems, Computers and Mathematics with Applications, Vol. 36, No. 9, pp 37-58, 1998.

16. John H. Reif and S. Azhar, Efficient Algorithms for Learning in Group Theory, To appear Computers & Mathematics with Applications, 1998.

17. John H. Reif and H. Gazit , A Randomized Parallel Algorithm for Planar Graph Isomorphism, Journal of Algorithms, Vol. 28, pp 290-314, 1998.

 

Also to appear: Thomas H. LaBean, Hao Yan, John H. Reif and Nadrian Seeman, Construction and Analysis of a DNA Triple Crossover Molecule, submitted for publication, (November. 1998).

 

Other Recent Papers Supported by the Current Grant

Landweber and E. B. Baum, eds. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 44 pages 153-163, American Mathematical Society.

Boneh, D., Dunworth, C., Lipton, R.J., and Sgall, J. (1998). Making DNA computers Error Resistant. In DNA Based Computers II, L. F. Landweber and E. B. Baum, eds. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 44 pages 165-172, American Mathematical Society.

Cukras, A., Faulhammer, D., Lipton, R., and Landweber, L.F. (1998). Chess games: a model for RNA-based computation. DIMACS: 4th Annual Workshop on DNA Based Computers.

Junghuei Chen, David Wood , "A New DNA Separation Technique with Low Error Rate", DIMACS Workshop on DNA Based Computers, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1998).

Anthony Cukras, Dirk Faulhammer, Richard Lipton, Laura Landweber , "Chess games: A model for RNA-based computation" , 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania , June, 1998. Invited to special issue of BIOSYSTEMS, 1999.

Anthony G. Frutos, Andrew J. Thiel, Anne E. Condon, Lloyd M. Smith, Robert M. Corn, "DNA Computing at Surfaces: 4 Base Mismatch Word Design", DIMACS Workshop on DNA Based Computers, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1998).

Landweber, L. F. (1998) RNA Based Computing. In DNA Based Computers II, L. F. Landweber and E. B. Baum, eds. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 44 pages 181-189, American Mathematical Society.

Landweber, L.F. and Kreitman, M. (1993) Producing Single-Stranded DNA in Polymerase Chain Reaction for Direct Genomic Sequencing. Methods Enzymol. 218: 17-26

Laura F. Landweber, Richard Lipton (Invited paper) , "DNA 2 DNA Computations: A Potential `Killer App'?", DIMACS Workshop on DNA Based Computers, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1998).

Landweber, L.F., Simon, P.J., and T.A. Wagner. (1998) Ribozyme Design and Early Evolution. BioScience 48: 94-103. Lipton, R.J. (1995) DNA Solution of Hard Computational Problems. Science. 268: 542- 545.

Laura Landweber, Lila Kari , "The evolution of cellular computing: Nature's solution to a computational problem", 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania , June, 1998. Invited to special issue of BIOSYSTEMS, 1999.

Elizabeth Laun, Kalluru J. Reddy , "Wet Splicing Systems", DIMACS Workshop on DNA Based Computers, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1998).

Thomas H. Leete, Joshua Klein, Jerome S. Salem, Harvey Rubin , Bit Operations Using a DNA Template", DIMACS Workshop on DNA Based Computers, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1998).

Qinghua Liu, Andrew J. Thiel, Anthony G. Frutos, Robert M. Corn, Lloyd M. Smith, "Surface-Based DNA Computation: Hybridization and Destruction", DIMACS Workshop on DNA Based Computers, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1998).

Qinghua Liu, Anthony Frutos, Liman Wang, Andrew Thiel, Susan Gillmor, Todd Strother, Anne Condon, Robert Corn, Max Lagally, Lloyd Smith "Progress towards demonstration of a surface based DNA computation: A one word approach to solve a model satisfiability ", 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania , June, 1998. Invited to special issue of BIOSYSTEMS, 1999.

Martin Orlian, Frank Guarnieri, Carter Bancroft , "Parallel Primer Extension Horizontal Chain Reactions as a Paradigm of Parallel DNA-Based Computation", DIMACS Workshop on DNA Based Computers, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1998).

Harvey Rubin, Joshua Klein, Thomas Leete , "A biomolecular implementation of logically reversible computation with minimal energy dissipation", 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania , June, 1998. Invited to special issue of BIOSYSTEMS, 1999.

Erik Winfree, Furong Liu, Lisa A. Wenzler, Nadrian C. Seeman (1998) Design and Self-Assembly of Two Dimensional DNA Crystals. Nature 394: 539--544, 1998.

David Harlan Wood , "Applying error correcting codes to DNA computing", 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania , June, 1998. Invited to special issue of BIOSYSTEMS, 1999.

 

References for Proposed Work

Adleman, L.M. (1994) Molecular Computation of Solutions to Combinatorial Problems. Science. 266: 1021-1023.

Amos, M., Gibbons, A., and Hodgson, D. (1996). Error-Resistant Implementation of DNA Computations. In DNA Based Computers II, L. F.

Landweber and E. B. Baum, eds. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 44 pages 153-163, American Mathematical Society.

Boneh, D., Dunworth, C., Lipton, R.J., and Sgall, J. (1998). Making DNA computers Error Resistant. In DNA Based Computers II, L. F. Landweber and E. B. Baum, eds. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 44 pages 165-172, American Mathematical Society.

Cukras, A., Faulhammer, D., Lipton, R., and Landweber, L.F. (1998). Chess games: a model for RNA-based computation. DIMACS: 4th Annual Workshop on DNA Based Computers.

Landweber, L. F. (1998) RNA Based Computing. In DNA Based Computers II, L. F. Landweber and E. B. Baum, eds. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 44 pages 181-189, American Mathematical Society.

Landweber, L.F. and Kreitman, M. (1993) Producing Single-Stranded DNA in Polymerase Chain Reaction for Direct Genomic Sequencing. Methods Enzymol. 218: 17-26

Landweber, L.F., Simon, P.J., and T.A. Wagner. (1998) Ribozyme Design and Early Evolution. BioScience 48: 94-103. Lipton, R.J. (1995) DNA Solution of Hard Computational Problems. Science. 268: 542- 545.

Ouyang, Q., Kaplan, P.D., Shumao, L., and Libchaber, A. (1997) DNA Solution of the Maximal Clique Problem. Science. 278: 446-449.

Porter D., Curthoy N.P. (1997) Use of Thermostable and E. Coli RNase H in RNA Mapping Studies. Anal Biochem. 247:279-286. Rao, V.B. (1994) Strategies for Direct Sequencing of PCR-amplified DNA. PCR Methods Appl. 4: S15-S23.

Sambrook, J., Maniatis, T. and Fritsch, E. (1989) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press: Cold Spring Harbor.

Santoro, S.W. and Joyce, G.F. (1997) A General Purpose RNA-cleaving DNA Enzyme. Proc. Natl. Acad. Sci. U.S.A. 94: 4262-4266. Stemmer, W.P.C. 1995. The Evolution of Molecular Computation. Science 270: 1510.

Trower, M.K. (1996) A Rapid PCR-based Colony Screening Protocol For Cloned Inserts. Methods Mol. Biol. 58:329-333.

Erik Winfree (1995) On the Computational Power of DNA Annealing and Ligation. In DNA Based Computers: Proceedings of a DIMACS Workshop, April 4, 1995, Princeton University (Number 27 in DIMACS). Richard J.

Lipton and Eric B. Baum, editors. American Mathematical Society, 1996.

Erik Winfree, Xiaoping Yang, Nadrian C. Seeman (1996) Universal Computation via Self-assembly of DNA: Some Theory and Experiments. To be published in the proceedings for DNA Based Computers II, E. B. Baum, ed. DIMACS Workshop, American Mathematical Society.

Erik Winfree (1998). Simulations of Computing by Self-Assembly. In Proceedings of the Fourth Annual Meeting on DNA Based Computers, held at the University of Pennsylvania, June 16-19, 1998. In press.

Erik Winfree, Furong Liu, Lisa A. Wenzler, Nadrian C. Seeman (1998) Design and Self-Assembly of Two Dimensional DNA Crystals. Nature 394: 539--544, 1998.

 

 

 

Section E: Biographical Sketches: John H. Reif

Dept. of Computer Science, Duke University, Box 90129, Durham, N.C. 27708-0129

919-660-6568

Education

Ph.D. Applied Mathematics (1977) Harvard University
M.S. Applied Mathematics (1975) Harvard University
B.S. Applied Mathematics &CS(1973 magna cum laude)Tufts University

Academic Experience

5/86 - Professor, Department of Computer Science, Duke University, Durham, N.C.

1/94 - 4/94 Visiting Professor, CMU

1/83 - 5/86 Associate Professor, Harvard University, Cambridge, Mass.

8/84 - 1/85 Visiting Scientist, M.I.T. Laboratory for Computer Science, Cambridge, Mass.

8/79 - 1/83 Assistant Professor, Harvard University, Cambridge, Mass.

8/78 - 5/79 Assistant Professor, University of Rochester, Rochester, New York.

8/77 - 8/78 Research Associate, University of Rochester, Rochester, New York.

Honors: Fellow of Institute of Combinatorics, 1991, Fellow of IEEE, 1993, Fellow of ACM, 1997.

Research Areas: Biomolecular computing, robotics, data compression, theoretical computer science: efficient algorithms, randomized algorithms, parallel computation, and optical computing.

Research in Computer Science:

Although primarily a theoretical computer scientist, Prof. Reif also has made a number of contributions to areas of computer science including parallel architectures, robotics, data compression and optical computing. Reif organized and led a number of practical systems projects, including the BLITZEN parallel architecture and the RSIC parallel data compression hardware. Reif's research combines theory and practice. He has worked for many years on the development and analysis of parallel algorithms for various fundamental problems including data compression (as well as solution of large sparse systems, sorting, graph problems, etc.). He has invented and written a number of papers on parallel algorithms (particularly randomized parallel algorithms) and is the editor of the text [R93].

Research by Reif in Biomolecular Computation(BMC):

In [R95], Reif investigated the potential of BMC to do massively parallel computations and to simulate conventional parallel computational models. Reif recently [R97] also some BMC algorithms using local assembly, including logarithmic depth BMC algorithms for prefix computation, computer arithmetic. He also investigating the use of MEMS array systems to do certain BMC tasks such as interprocess communication, and is developing some Biomolecular models that take into account both solution concentration and the physical geometry of the biotechnology apparatus. Reif is planning some experiments of BMC algorithms for parallel computer arithmetic, in collaberation with Dr. Thom LaBean.

Reif Publications:

DNA Based Computers, University of Pennsylvania, June 23-26, 1997. Published in DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1998).

John H. Reif and Ashish Gehani, Microflow Bio-Molecular Computation, 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania , June, 1998. Invited to special issue of BIOSYSTEMS, 1999.

John H. Reif, Alternative Computational Models: A Comparison of Biomolecular and Quantum Computation,18th Invited paper, International Conference on Foundations of Software Technology and Theoretical Computer Science (FST\&TCS98), (December, 1998).

John H. Reif, Parallel Molecular Computation: Models and Simulations. Proceedings: 7th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA'95) Santa Barbara, CA, July 1995, pp. 213-223. Published in Algorithmica, special issue on Computational Biology, 1998.

John H. Reif, Paradigms for Biomolecular Computation, First International Conference on Unconventional Models of Computation, Auckland, New Zealand, January 1998. Published in Unconventional Models of Computation, edited by C.S. Calude, J. Casti, and M.J. Dinneen, Springer Publishers, January 1998, pp 72-93.

John H. Reif, editor, Synthesis of Parallel Algorithms. 1011 pages.Published by Morgan Kaufmann, Spring, 1993.

Additional Personel at Duke: Dr. Thom LaBean, Visiting Assistant Professor, Computer Science Dept, Duke University. Gave a graduate course at Duke in fall, 1998 on Biomolecular Computation. Previously postdoctoral fellow in the Department of Biochemistry laboratory of Jane Richardson; he is well experienced in recombinant DNA techniques. LaBean is supported by Reif's NSF grant "Prototyping Biomolecular Computations".

Students at Duke:Current Graduate Students (Ph.D. candidates)

Satish Govindarajan, Nabil Mustafa, Zhung(Robert) Sun

Local Collaborations at Duke: Reif consults with and collaborates with local experts in recombinant DNA technology and relevant biotechnology located in the biochemistry, microbiology and pharmacology departments at Duke.

 

Section F: Summary Proposal Budget

 

 

Section G: Current and Pending Support

John Reif: Principal Investigator or Co-Investigator

SEGR: Design of a Biomolecular Distributed Operating System, NSF Grant CCR-981000, Aug 1998-1999, $50,000.

Prototyping Biomolecular Computations , jointly funded by Defense Advanced Research Projects Agency and National Science Foundation, NSF CCR-9725021, (Aug 21-Oct 29, 1997 first increment $285,542 ) full amount July 1997-Sept 2000, $2,748,017.

Robust, Adaptive and Dynamic Robotic Motion Planning, NSF Grant NSF-IRI-9619647, 06/97-06/99, $295,000.

CURIOUS: (C)enter for (U)ndergraduate Education and (R)esearch:

(I)ntegration Thr(OU)gh Vi(S)ualization, (Co-principle investigator). NSF CDA-96-34475 09/96 - 08/99 $ 405,200

Acquisition of a Workstation Cluster Testbed for Next-Generation

Collaborative Computing (Co-principle investigator). National Science Foundation Grant contract CDA-95-12356, 09/95 - 08/98, $489,600

Multidisciplinary Research for Demining, (Co-principle investigator of subcontract with E. Gelenbe, N Schmajuk and J. Staddon,) . Army Research Office(ARO) contract DAAH-04-96-1-0448, 11/96 - 10/99, total contract: $3,000,600, Subcontract $431,000./year.

 

 

 

 

Section H: Facilities, Equipment and Other Resources

Facilities of Primary Contractor: Duke Duke Computing Facilities. The Department of Computer Science at Duke maintains state-of-the-art computing facilities to satisfy a variety of research and educational needs. General computing systems include workstations and file servers. The department has over 100 Sun workstations each with at least 32 megabytes of memory and it maintains over 10 file servers. The collective disk space on the servers is about 70 gigabytes. The collective disk space on the SPARC workstations is about 80 gigabytes. Communications facilities include ethernet connections, asynchronous communications for all terminals, and a campus-wide fiber-optic network providing high-speed data paths to computing facilities at several other sites in the state of North Carolina. The newest equipment purchased to support research in parallel computation is an SG POWER CHALLENGE L Deskside Server with 4- 75MHz R8000 processors and 2 Gigabytes RAM. The {\it Visualization Laboratory} provides the equipment needed to generate animation on video tapes. An SGI Power Challenge L 4-processor computer with 2 gb RAM, purchased in 1995--1996 through an NSF CISE infrastructure grant, provides both a powerful computing environment and a systems research platform. The {\it Workstation Cluster Laboratory} consists of six Digital Alpha 600 5/266 high end workstations, eight Digital Alpha 500 4/266 high end workstations, eight Sun Ultra 140 workstations, and a BayNetworks Lattis Switch 100 Megabit Ethernet switch.

The North Carolina Information Highway (NCIH) is the first fully operational statewide superhighway in the country. It provides high-speed fiber access to over 100 sites across the state. Additional computing power is provided to the Department through MCNC in nearby Research Triangle Park in Durham, which houses Cray supercomputers, including a Cray T3E. A high-speed connection between MCNC and the Department on the NCIH provides supercomputer access.

Duke Molecular Biology Facilities. The Duke Medical School Departments of Biochemistry and Microbiology laboratories are both well equipped for recombinant DNA experiments. The {\it Department of Biochemistry} primarily occupies the first two floors, as well as part of the fourth, of the Nanaline Duke Building (located on Research Drive), which is part of the Duke Medical Center. More than 33,000 net square feet of research space is available for the activities of the faculty, students and fellows. The Departments of Cell Biology, Genetics, Microbiology and Immunology, Neuroscience and Pharmacology are located in a cluster of adjacent buildings, facilitating interactions among members of these departments. In addition, the {\it Departments of Botany, Chemistry and Zoology} are located within five minutes walk from the Medical School, which allows an easy interchange with these departments. There are currently 32 members of the Biochemistry Graduate Faculty together with 69 graduate students and 35 postdoctoral fellows in the Department. Their research interests cover a broad and exciting area of disciplines including high resolution structural analysis, signal transduction, gene function and regulation, enzyme mechanisms, and cell surface and structure. The faculty and students have the access to state-of-the-art instrumentation, including X-ray crystallography, multidimensional nuclear magnetic resonance, computer graphics, electron microscopy, and support facilities like peptide and oligonucleotide synthesizers, large scale fermentation, optical microscopy and imaging systems. We have a support letter from Jack Keene, Chairman of the Department of Microbiology encouraging this research and allowing use of their facilities.

Facilities of Subcontractors:

Princeton Facilities Princeton Computer Science Facilities. The project will have two kinds of resources available to it. Standard resources that include many high performance servers. Also we will have available the SHRIMP projects resources. SHRIMP is an NSF/Arpa funded project that is developing high performance multi-computers. We expect that some of the more computational extensive parts of this project will be mapped onto SHRIMP. The departmental facility includes over 80 high end workstations. There is also a graphics laboratory with multiple graphics workstations, including a 6 Indigos/Indys. The department also has access to the facilities of the University's Computing for Information and Technology center, which includes an Auspex file server and its ${\approx}$30 Sparc SLC clients and 30 68040-based NeXTStations each with 20MB.

Princeton Computer Molecular Biology Facilities. A 2000 sq ft biochemistry laboratory and 250 sq ft office are in Moffett Labs. It is well equipped for the proposed research, with incubators, water baths, high-speed and preparative ultracentrifuges, bench top centrifuges, microfuges, fume hood, pH meters, agarose gel electrophoresis baths and power supplies, high-voltage power supplies and apparatuses for DNA sequencing, CHEF electrophoresis, a PCR machine, a UV illuminator box, lyophilizer, balances, microwave oven, colony counters, a tissue culture hood and CO2 incubator, dissecting and fluorescent microscopes, and a recording spectrophotometer. Our shared equipment facility in the same building contains preparative ultracentrifuges, scintillation counters, a gamma counter, 4 walk-in cold rooms, 4 darkrooms (one equipped with an automatic X-ray film developer), 2 high-performance HPLC units, DNA databases and a Wisconsin software package (``Sequence analysis software package of the Genetics Computer Group," Univ. of Wisconsin Biotechnology Center, Madison, Wisconsin), several Mac's, and an SGI and SPARC IPX workstation. Of particular importance to this proposal are the following: an Applied Biosystems 373 DNA Sequencer. These instruments are staffed by a professional group.

Wisconsin Facilities Equipment in Smith's laboratory, in which the primary DNA chemistry effort will occur, includes several commercial and custom robotic platforms, fluorescence scanners and automated gel loaders, several automated DNA synthesizers, a Molecular Dynamics FluorImager, and several thermal cyclers.

Surface attachment chemistry and characterization will be done in Corn's laboratory. Available characterization techniques include Polarization-Modulation Fourier Transform IR (PM-FTIR) Spectroscopy, optical second harmonic generation (SHG), surface enhanced Raman scattering (SERS), and surface plasmon resonance spectroscopy (SPR).

Planar surface development will occur in the Lagally labs. Equipment includes scanned probes, including STM, AFM, and NSOM. CVD, MBE, and sputter deposition are available for creating morphologically and chemically diverse surfaces. Several optical probes are available for chemical studies of surfaces. Conventional and nanolithographic tools (the latter through patterning with NSOM or STM) are available for creating arrays.

USC Facilities USC has facilities in the Laboratory for Molecular Science under the direction of Prof. Adleman, and in a second laboratory under the direction of Professor Goodman. Currently we use approximately 1000 square feet of laboratory space. We have PCR facilities, a dedicated radiation area, fume and laminar air flow hoods.

NYU Facilities The laboratory at New York University (NYU) consists of about 2000 square feet of laboratory and office space containing a fully equipped biochemical laboratory, including an Applied Biosystems 380B automatic DNA synthesizer (3-column) and the analytical apparatus to analyze the DNA constructions proposed.

Mt. Sinai Facilities The Bancroft laboratory contains all equipment required for molecular biological research, including ultracentrifuge, centrifuge and rotors, scintillation counter, recording spectrophotometer, thermal cycler for performing PCR, and equipment for gel electrophoresis, and access to numerous Core facilities. The Physiology/Biophysics Department also has a very strong Theoretical Biology Group, possessing extensive expertise and sophisticated hardware and software, thus representing a significant potential resource for this project.

U. Penn. And Univ. Delaware Laboratory Facilities: The group has two complete laboratories: the 1500 square foot Rubin Lab, and the 1200 square foot Chen Lab. Both labs have equipment for molecular cloning and expression, PCR technology, spectrophotometry, and nucleic acid analyses, including several gel electrophoresis systems, together with computerized gel documentation and analysis. Each Lab has Power Mac computers. Several electron microscopes are readily available including a state of the art Philip scope at U. Delaware. Also, a MerMade DNA synthesizer is now located there. The Rubin Lab includes three thermocyclers, and two Perkin-Elmer HPLC systems. The Chen Lab has a Biocad FPLC system, and a Beckman HPLC. Readily available to both Labs: peptide synthesis, automated DNA sequencing, ultracentrifuges, and fermentation facilities.

Binghamton Facilities Reddy's laboratory has the following features and equipment: 1000 sq. ft. of space, Coy Temperature Cycler, UV cross linker, Gas Chromatograph, BioRad Gene Pulser, Pulsed field gel equipment, two microfuges, and an IBI base runner for DNA sequencing. The biological sciences department provides many other facilities for general use.