APT, Promotion and tata boxes I

Problem Statement

Transcription for protein-coding regions is often indicated by a TATA box, a sequence of nucleotides TATAA just prior to the actual encoding region. Locating a TATA box is what RNA "does" in finding proteins and what you will do in this problem.

In this problem you'll analyze only part of a DNA sequence in looking for TATA boxes. In real genomic data a TATA box that indicates a possible protein coding sequence must be followed by a start codon ATG located approximately 150 base pairs from the TATA box. In this program rather than 150 you'll use the range (20-30).

Any DNA sequence with a TATA-box located 20-30 base-pairs before a start codon is a possible protein source. The distance between the last A in TATAA and the first A in ATG must be 20-30 nucleotides in this problem. You may need to find more than one TATA box, see the examples for details. Any TATA box with a start codon 20-30 nucleotides away is a predictor for gene/protein finding.

Given a DNA sequence string, return the index/location of the start of the first TATA box that predicts a protein coding region (start codon 20-30 nucleotides from the end of the box). Return -1 if there is no protein-predicting TATA box.

Definition