The instructions below were written for the CPS 160 course (Introduction to Functional Genomics) taught in Spring 2006. Some of the links may not be up to date. Please consult Blackboard for updated instructions.

Installing and using Bioperl

Bioperl is a set of Perl modules for manipulating biological data. Some of the things that you can do with Bioperl: read sequence data from files in different standard formats (FASTA, GenBank, SwissProt, etc.), manipulate sequences, run BLAST queries, parse BLAST report files, do multiple sequence alignment using tools like CLUSTALW, etc.

Once you have Perl on your computer, downloading and installing Bioperl is not always an easy task, but you can find detailed instructions here:

On Windows it is usually easy to install Bioperl by following the steps below, taken from "Quick instructions for the impatient, lucky, or experienced user" :)
  1. Open a command prompt (Start->Run and type cmd)

  2. Run the PPM shell (C:\>ppm)

  3. Add these new PPM repositories with the following commands:
    • ppm> rep add Bioperl http://bioperl.org/DIST
    • ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms
    • ppm> rep add Bribes http://www.Bribes.org/perl/ppm
  4. Search for Bioperl:

    • ppm> search Bioperl

    This returns a numbered list of packages with corresponding version numbers etc. with "Bioperl" in their name. You should see "Bioperl-1.4" (this is the latest stable release).

  5. Install Bioperl:

    • ppm> install <number>

    where <number> corresponds to the number of the Bioperl-1.4 package in the numbered list obtained in step 4.

If you are indeed a lucky user, you should have Bioperl installed by now!

Some bioperl links



Graph search algorithms

Given a graph with vertices and edges, we often wish to determine a path from some start vertex to some end vertex within that graph. There are numerous ways to explore the vertices in a graph, but two popular methods are depth first search (DFS) and breadth first search (BFS). Upon closer examination, one can see that these two algorithms only differ in the data structure used to determine which vertex to visit next.

The pseudo-code for a general graph searching algorithm is given below:

GRAPH_SEARCH (Graph G, Vertex s)
  Set S = {s};  # set of explored vertices
  while (there is an unused edge (u,v) connected to any node u in S)
    follow edge (u,v) from u to v
    set edge (u,v) as used
    add v to S
  end while
end

breadth first search (BFS)

Breadth first search uses a queue as the data structure for determining which vertex to visit next. Every time a new vertex is visited, all of its neighbors are added to the end of the queue. Note, in a directed graph, the neighbors are only those vertices that can be reached if there is an edge pointing from the current vertex to the neighbor's vertex. To decide which vertex to visit next, a vertex from the front of the queue is removed and the process is repeated. Thus, the vertices are visited in a first in first out (FIFO) manner. The result is that all of the vertices that are a distance i away from the start vertex are visited before the vertices that are a distance i+1 away from the start vertex.

The pseudo-code for BFS is given below:

BFS (Graph G, Vertex s)
  Set S = {s};  # set of explored vertices
  Queue Q = all neighbors of s;
  while (Q is not empty)
    dequeue vertex v from the front of Q  # shift in Perl
    if(v is not in S)
      add v to S
      enqueue neighbors of v onto the end of Q  # push in Perl
    end if
  end while
end BFS

depth first search (DFS)

Depth first search uses a stack as the data structure for determining which vertex to visit next. Every time a new vertex is visited, all of its neighbors are added to the the top of the stack. Note, in a directed graph, the neighbors are only those vertices that can be reached if there is an edge pointing from the current vertex to the neighbor's vertex. To decide which vertex to visit next, a vertex from the top of the stack is removed and the process is repeated. Thus, the vertices are visited in a last in first out (LIFO) manner. The result is that vertices are visited down one path from the start vertex until the path can be extended no longer, then another path is visited from the start vertex until it can be extended no longer, etc.

The pseudo-code for DFS is given below:

DFS (Graph G, Vertex s)
  Set S = {s};  # set of explored vertices
  Stack T = all neighbors of s;
  while (T is not empty)
    pop vertex v from the end of T  # pop in Perl
    if (v is not in S)
      add v to S
      push neighbors of v onto the end of T  # push in Perl
    end if
  end while
end DFS

Sources:

Josh Robinson's notes for CPS160 (Spring 2005)

Chapter 22 of the second edition of Introduction to Algorithms (Second edition) by Cormen, Leiserson, Rivest and Stein (CLRS).



directions for perl programming setup in cps160

In this class we will be using a common IDE (Integrated Development Environment) called Eclipse. IDEs are nice because they often make programming easier with features such as syntax highlighting and debugging capabilities. This is especially useful for Perl, which can sometimes be syntactically fickle.

Eclipse was developed in Java, so in addition to Eclipse, you will also need to download the JRE (Java Runtime Environment) from Sun. This allows Eclipse to be run on almost any OS, from Windows to Mac and UNIX. Note, the Mac OS comes preloaded with the JRE, so you will most likely not need to download anything if you are using a Mac. Also, if you have programmed in Java before, you almost certainly have the JRE installed. The JRE 5.0 (also called 1.5) can be found here. You can download the JDK (Java Development Kit) instead. If you plan on coding anything in Java later, or if you plan on taking another Computer Science class at Duke, I would recommend you download the JDK instead. In addition to the JRE, it includes libraries which may be necessary for any Java coding projects you take on.

  1. Download the JRE or the JDK 5.0 for your operating system (if it is not already installed). Click the "Download JDK 5.0 Update 6" link, and you will have to accept a license agreement. Then, select the appropriate file for your OS.
    • Windows (59.86 MB). Self-installing application.
    • Macintosh (already installed with Mac OS X). However, if you want, you can update to version 5.0.
    • Linux (46.71 MB). Self-installing application.

    Once you have downloaded and installed either the JRE or the JDK, you need to actually download Eclipse. The Eclipse file you download is not a self-installing archive. Instead, you will need to extract the zip file's contents to a folder on your hard-drive (you can just use C:\ or C:\Program Files).

  2. Download Eclipse 3.1.1 for your operating system:

    • Windows (105.9 MB). Unzip into your Program Files folder.
    • Macintosh (101.2 MB). Unstuff into your Applications folder.
    • Linux (101.7 MB). Unzip into /usr/local directory.

  3. For convenience, you will most likely want a short-cut to Eclipse on your desktop. In windows, this is fairly simple to do. Right click on the desktop and select new shortcut. Then type C:\eclipse\eclipse.exe or C:\Program Files\eclipse\eclipse.exe depending upon where you extracted the files to.

    Now that you have Eclipse installed, you will need to install the libraries that are necessary to run Perl. Once again, these come preinstalled in UNIX and thus in any Mac OS. One of the most popular libraries for this is ActivePerl. The current version is 5.8.7.

  4. Download and install ActivePerl:

    • Windows (12.7 MB). Self-installing application.
    • Macintosh (already installed with Mac OS).
    • Linux (15.4 MB). Unzip into /usr/local directory.

  5. Next, you need to install the Ambient plug-in. The Ambient plug-in provides a simple interface for you to receive and submit code. Run Eclipse and use its update manager to download and install the Ambient plugin by following the directions online.

  6. You also need to install and set up the EPIC plug-in. This will allow you to create and run Perl programs from within Eclipse. These steps are similar to those found at the link above. Follow the instructions below:

    • Open Eclipse and access the Help menu.
    • Select Software Updates > Find and Install.
    • Select Search for new features to install and click Next.
    • Type http://e-p-i-c.sf.net/updates/testing as a New Remote Site.
    • Select the update site you just created and click Finish.
    • Select the EPIC feature and click Next.
    • Follow the next few easy steps to finish the installation.
    • It is recommended to restart the Eclipse workbench in order for the changes to take place.

  7. Finally, you need to set up Eclipse so that you can run your Perl programs with a simple click. This will allow you to bring up the shell or command prompt in the console.

    • Select Run > External Tools > External Tools...
    • In the window that pops up, select Program and click New.
    • You can leave the name the same, but in the Location field, type C:\WINDOWS\system32\cmd.exe (this will be C:\WINNT\system32\cmd.exe instead if you are using Windows 2000 or upgraded from Windows 2000, and /bin/tcsh for Macs or Linux).
    • In the Working Directory field, type ${project_loc}.
    • Leave the Arguments field blank if using Windows, but place a -i in the field if you are using a Macintosh or Linux.
    • Click the Common tab, and ensure that the checkbox under "Display in favorites menu" is checked and that the "Launch in background" checkbox is also checked.
    • Click Apply then close.

Next, try making your first perl program in Eclipse:

your first perl program in eclipse

Now that you have spent so much time getting Eclipse working, it is time to try your first program.

The Ambient plug-in allows one to browse code online by using a tool called Snarf. Basically, we will provide you with some code as a framework and possibly some data files for each homework assignment and Snarf will allow you to import these files into your local copy of Eclipse. To start your first program, follow the directions below.

  1. Open Eclipse after setting up everything as described above.

    • Select Ambient > Download (Snarf) a Project....
    • This should open a new tab at the bottom called Snarfer Site Browser. If it does not:
      • Select Window > Show View > Other...
      • Click Ambient then select Snarfer Site Browser and hit ok.
    • Right-click in the Snarfer Site Browser window, and select New Site.
    • In the window type http://www.cs.duke.edu/courses/cps160/spring06/snarf/.
    • Click through the list until you find first perl program (1.0), and double click on it.
    • Click the Install Project... button, and in the window that pops up check the Use Default box, and click Next then Finish.
    • You can then double-click first perl program in the Navigator window near the upper left head corner of the program and then double click first.pl.
    • If you do not see a Navigator window, it could be that you are in the wrong perspective. If this is the case:
      • Select Window > Open Perspective > Other...
      • Select Perl then hit OK.

  2. Now we will try running the simple Perl program that you downloaded.

    • Click the button in the bar at the top that looks like a green circle with a white arrowhead and a red suitcase in front of it.
    • This should create a Console tab at the very bottom. If it does not:
      • Select Window > Show View > Other...
      • Click Basic then select Console and hit ok.
    • Click the Console tab.
    • At the command prompt, type perl first.pl.
    • You can see the program run printing out the array and the minimum value.
    • Type exit to stop the console. Otherwise, it will continue running the command prompt, using up your machine's resources.

    You can just repeat step 2. every time you edit the program. For each assignment, we will provide a codebase for you to work from, which you will always be able to import by Snarfing.

  3. You should also notice that PerlDoc is available from Eclipse. Just double-click on "print" and then choose Help > Perldoc (or use the shortcut Shift+Ctrl+H).

Submitting a sample project

Now you will try to modify the program and then submit the code from within Eclipse.

  1. Modify the file first.pl to print out the minimum and the maximum value:
    • Add another line:
      print "The maximum value is: $maxval.\n";
    • Save the file (Ctrl+S).

    • Run first.pl again, to see if it prints out the maximum.
  2. Submit the project.

    Select Ambient>Submit a Project for Grading... This will bring up a submit window. The top box is for what class and assignment you wish to submit. Click on cps160 and select the test folder. The bottom window is what you wish to submit. You will need to know the location of your project. Click the Browse... button and select your directory. Hit OK, and the bottom window should then display all of the items in that directory. Check only the perl files (first.pl) and then hit submit. You will be asked to enter your Acpub ID and password. Now, you have submitted your project!

    You can submit as many times as you like, as everything is stored on the server. Thus, if you realize that you did something incorrectly at the last minute, you can resubmit and we will see you updated submission.