The instructions below were written for the CPS 160 course (Introduction to Functional Genomics) taught in Spring 2006. Some of the links may not be up to date. Please consult Blackboard for updated instructions.
Bioperl is a set of Perl modules for manipulating biological data. Some of the things that you can do with Bioperl: read sequence data from files in different standard formats (FASTA, GenBank, SwissProt, etc.), manipulate sequences, run BLAST queries, parse BLAST report files, do multiple sequence alignment using tools like CLUSTALW, etc.
Once you have Perl on your computer, downloading and installing Bioperl is not always an easy task, but you can find detailed instructions here:
On Windows it is usually easy to install Bioperl by following the steps below, taken from "Quick instructions for the impatient, lucky, or experienced user" :) Open a command prompt (Start->Run and type cmd)
Run the PPM shell (C:\>ppm)
ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm Search for Bioperl:
ppm> search Bioperl
Install Bioperl:
ppm> install <number>
<number> corresponds to the number of the Bioperl-1.4 package in the numbered list obtained in step 4. A Duke coursein Bioperl (taught by Jason Stajich).
The help page for Bio::Tools::Run::RemoteBlast.
The help page for Bio::Tools::Genscan.
The help page for Bio::Tools::Run::Alignment::Clustalw.
Given a graph with vertices and edges, we often wish to determine a path from some start vertex to some end vertex within that graph. There are numerous ways to explore the vertices in a graph, but two popular methods are depth first search (DFS) and breadth first search (BFS). Upon closer examination, one can see that these two algorithms only differ in the data structure used to determine which vertex to visit next.
The pseudo-code for a general graph searching algorithm is given below:
GRAPH_SEARCH (Graph G, Vertex s)
Set S = {s}; # set of explored vertices
while (there is an unused edge (u,v) connected to any node u in S)
follow edge (u,v) from u to v
set edge (u,v) as used
add v to S
end while
end
Breadth first search uses a queue as the data structure for determining which vertex to visit next. Every time a new vertex is visited, all of its neighbors are added to the end of the queue. Note, in a directed graph, the neighbors are only those vertices that can be reached if there is an edge pointing from the current vertex to the neighbor's vertex. To decide which vertex to visit next, a vertex from the front of the queue is removed and the process is repeated. Thus, the vertices are visited in a first in first out (FIFO) manner. The result is that all of the vertices that are a distance i away from the start vertex are visited before the vertices that are a distance i+1 away from the start vertex.
The pseudo-code for BFS is given below:
BFS (Graph G, Vertex s)
Set S = {s}; # set of explored vertices
Queue Q = all neighbors of s;
while (Q is not empty)
dequeue vertex v from the front of Q # shift in Perl
if(v is not in S)
add v to S
enqueue neighbors of v onto the end of Q # push in Perl
end if
end while
end BFS
Depth first search uses a stack as the data structure for determining which vertex to visit next. Every time a new vertex is visited, all of its neighbors are added to the the top of the stack. Note, in a directed graph, the neighbors are only those vertices that can be reached if there is an edge pointing from the current vertex to the neighbor's vertex. To decide which vertex to visit next, a vertex from the top of the stack is removed and the process is repeated. Thus, the vertices are visited in a last in first out (LIFO) manner. The result is that vertices are visited down one path from the start vertex until the path can be extended no longer, then another path is visited from the start vertex until it can be extended no longer, etc.
The pseudo-code for DFS is given below:
DFS (Graph G, Vertex s)
Set S = {s}; # set of explored vertices
Stack T = all neighbors of s;
while (T is not empty)
pop vertex v from the end of T # pop in Perl
if (v is not in S)
add v to S
push neighbors of v onto the end of T # push in Perl
end if
end while
end DFS
Sources:
Josh Robinson's notes for CPS160 (Spring 2005)
Chapter 22 of the second edition of Introduction to Algorithms (Second edition) by Cormen, Leiserson, Rivest and Stein (CLRS).
In this class we will be using a common IDE (Integrated Development Environment) called Eclipse. IDEs are nice because they often make programming easier with features such as syntax highlighting and debugging capabilities. This is especially useful for Perl, which can sometimes be syntactically fickle.
Eclipse was developed in Java, so in addition to Eclipse, you will also need to download the JRE (Java Runtime Environment) from Sun. This allows Eclipse to be run on almost any OS, from Windows to Mac and UNIX. Note, the Mac OS comes preloaded with the JRE, so you will most likely not need to download anything if you are using a Mac. Also, if you have programmed in Java before, you almost certainly have the JRE installed. The JRE 5.0 (also called 1.5) can be found here. You can download the JDK (Java Development Kit) instead. If you plan on coding anything in Java later, or if you plan on taking another Computer Science class at Duke, I would recommend you download the JDK instead. In addition to the JRE, it includes libraries which may be necessary for any Java coding projects you take on.
Once you have downloaded and installed either the JRE or the JDK, you need to actually download Eclipse. The Eclipse file you download is not a self-installing archive. Instead, you will need to extract the zip file's contents to a folder on your hard-drive (you can just use C:\ or C:\Program Files).
Download Eclipse 3.1.1 for your operating system:
For convenience, you will most likely want a short-cut to Eclipse on your desktop. In windows, this is fairly simple to do. Right click on the desktop and select new shortcut. Then type C:\eclipse\eclipse.exe or C:\Program Files\eclipse\eclipse.exe depending upon where you extracted the files to.
Now that you have Eclipse installed, you will need to install the libraries that are necessary to run Perl. Once again, these come preinstalled in UNIX and thus in any Mac OS. One of the most popular libraries for this is ActivePerl. The current version is 5.8.7.
Download and install ActivePerl:
Next, you need to install the Ambient plug-in. The Ambient plug-in provides a simple interface for you to receive and submit code. Run Eclipse and use its update manager to download and install the Ambient plugin by following the directions online.
You also need to install and set up the EPIC plug-in. This will allow you to create and run Perl programs from within Eclipse. These steps are similar to those found at the link above. Follow the instructions below:
Finally, you need to set up Eclipse so that you can run your Perl programs with a simple click. This will allow you to bring up the shell or command prompt in the console.
Next, try making your first perl program in Eclipse:
Now that you have spent so much time getting Eclipse working, it is time to try your first program.
The Ambient plug-in allows one to browse code online by using a tool called Snarf. Basically, we will provide you with some code as a framework and possibly some data files for each homework assignment and Snarf will allow you to import these files into your local copy of Eclipse. To start your first program, follow the directions below.
Open Eclipse after setting up everything as described above.
Now we will try running the simple Perl program that you downloaded.
You can just repeat step 2. every time you edit the program. For each assignment, we will provide a codebase for you to work from, which you will always be able to import by Snarfing.
Now you will try to modify the program and then submit the code from within Eclipse.
Save the file (Ctrl+S).
Submit the project.
Select Ambient>Submit a Project for Grading... This will bring up a submit window. The top box is for what class and assignment you wish to submit. Click on cps160 and select the test folder. The bottom window is what you wish to submit. You will need to know the location of your project. Click the Browse... button and select your directory. Hit OK, and the bottom window should then display all of the items in that directory. Check only the perl files (first.pl) and then hit submit. You will be asked to enter your Acpub ID and password. Now, you have submitted your project!You can submit as many times as you like, as everything is stored on the server. Thus, if you realize that you did something incorrectly at the last minute, you can resubmit and we will see you updated submission.