APT, Promotion and tata boxes II

Problem Statement

Transcription for protein-coding regions is often indicated by a TATA box, a sequence of nucleotides TATAAA just prior to the actual encoding region. Locating a TATA box is what RNA "does" in finding proteins and what you will do in this problem.

In this problem you'll analyze only part of a DNA sequence in looking for TATA boxes. In real genomic data a TATA box that indicates a possible protein coding sequence must be followed by a start codon ATG located approximately 150 base pairs from the TATA box. In this program rather than 150 you'll use the range (20-30).

Any DNA sequence with a TATA-box located 20-30 base-pairs before a start codon is a possible protein source. The distance between the last A in TATAA and the first A in ATG must be 20-30 nucleotides in this problem. A TATA box is a predictor for the closest start codon. This means that in the sequence TATAAGGGGGGTATAAGGATG the first box doesn't predict the start codon at the end of the sequence, the second TATA box is the predictor (and it is only two nucleotides away).

Given an array of DNA sequences, return the number of sequences that could be protein sources according to the rules here.

Definition