COMPSCI 561/CBB 561 - Computational Sequence Biology

Spring 2020

Wed & Fri 3:05pm - 4:20pm in LSRC D106

Overview
Syllabus

Course Description:

Algorithmic and computational issues in analysis of biological sequences: DNA, RNA, and protein. Emphasizes probabilistic approaches and machine learning methods, e.g. Hidden Markov models. Explores applications in analysis of high-throughput sequencing data, protein and DNA homology detection, gene finding, motif discovery, comparative genomics and phylogenetics, genome segmentation, DNA/RNA/protein structure prediction, with a strong focus on algorithmic aspects. Prerequisites: basic knowledge of algorithmic design (COMPSCI 330 or equivalent), probability and statistics (STA 611 or equivalent), molecular biology (BIO 201L or equivalent), basic computer programming skills (preferred programming languages: Python, Java, C/C++, Perl, R, or Matlab).

Course materials, homeworks and quizzes are avalaible through Sakai.

Instructor:
Raluca Gordan
Office hours: Fri 4:30-5:30pm
Office: LSRC D211
Email: raluca.gordan at duke dot edu

TA:
Yuning Zhang
Office hours: Wed 4:30-5:30pm
Office hours location: TBD
Email: yuning.zhang at duke dot edu

Grading:
Course grade is based on homeworks (70%), pre-class quizzes (15%), and class participation (15%). Homeworks and quizzes will be distributed through Sakai.
You will have 2 weeks to complete each homework. Late homeworks will not be accepted; however, you are allowed one late homework for the course, for a maximum of 1 week.
Pre-class quizzes will be due 1 hour before class. The quizzes will test either your background on a subject (to make sure you will be able to follow and participate in the lecture) or your understanding of a subject or paper presented in a previous lecture. You can take each quiz twice; only the highest grade will be considered.

Collaboration policy:
All homeworks and pre-class quizzes should be completed individually, unless otherwise stated. However, if you have worked for a while on a particular problem and have encountered a mental wall, and if you have banged your head against the wall for a while, you should consult others to make progress—that is better than giving up entirely. Your first course of action is to speak to the instructor or TA. If for any reason you consult your peers, it should remain understood that such an interaction must be one of consultation and not collaboration: hints rather than answers; after consultation, it is expected that you should still have some thinking to do (otherwise this course will not be very useful for you!). In addition, if you happen to consult with another student, both of you must cite this.

Readings/textbook:
We will have readings for the course (which will be available on Sakai), but there is no formal textbook. Useful resources include:

•    Durbin, Eddy, Krogh, Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
•    Cristianini and Hahn, Introduction to Computational Genomics: A Case Studies Approach
•    Jones and Pevzner, An Introduction to Bioinformatics Algorithms
•    Majoros, Methods for Computational Gene Prediction
•    Alberts, Johnson, Lewis, Raff, Roberts, Walter, Molecular Biology of the Cell
•    Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms


Syllabus


This syllabus is tentative and may change (slighly) during the semester. Please check Sakai for the latest version.

1 10-Jan Introduction; DNA sequencing



2 15-Jan Global sequence alignment; Needleman-Wunsch
3 17-Jan Local sequence alignment; Smith-Waterman



4 22-Jan Heuristic search; FASTA; BLAST
5 24-Jan String matching; suffix arrays



6 29-Jan Short read alignment; BWA; Bowtie
7 31-Jan Probabilistic models for biological sequences



8 5-Feb HMM parsing; Viterbi
9 7-Feb HMM training; Baum-Welch



10 12-Feb HMM applications
11 14-Feb Profile HMMs; PSIBLAST



12 19-Feb Phylogenetic trees: UPGMA; NJ
13 21-Feb Motif finding: EM and Gibbs sampling



14 26-Feb Guest lecture on cryo-EM algorithms, by Prof. Alberto Bartesaghi
15 28-Feb Motif finding: Bayesian networks



16 4-Mar Unsupervised learning
17 6-Mar Clustering; non-negative matrix factorization




11-Mar SPRING BREAK

13-Mar SPRING BREAK



18 18-Mar Supervised learning; classification and regression
19 20-Mar SVM; string kernels



20 25-Mar Naive Bayes; logistic regression
21 27-Mar DNA structure



22 1-Apr Student presentation (e.g Deep learning, applied to regulatory genomics)
23 3-Apr Student presentation (e.g. Unsupervised learning/deconvolution)



24 8-Apr Student presentation (e.g. Hardware acceleration of sequence alignment)
25 10-Apr Student presentation (e.g. Algorithms to infer the 3D DNA architecture)



26 15-Apr Student presentation (e.g. Graph algorithms, applied to signaling networks)



Link to Sakai