In this problem you'll analyze only part of a DNA sequence in looking for TATA boxes. In real genomic data a TATA box that indicates a possible protein coding sequence must be followed by a start codon ATG located approximately 150 base pairs from the TATA box. In this program rather than 150 you'll use the range (20-30).
Any DNA sequence with a TATA-box located 20-30 base-pairs before a start codon is a possible protein source. The distance between the last A in TATAA and the first A in ATG must be 20-30 nucleotides in this problem. You may need to find more than one TATA box, see the examples for details. Any TATA box with a start codon 20-30 nucleotides away is a predictor for gene/protein finding.
Given a DNA sequence string, return the index/location of the start of the first TATA box that predicts a protein coding region (start codon 20-30 nucleotides from the end of the box). Return -1 if there is no protein-predicting TATA box.
String
int
public int tata(String dna)(be sure your method is public)
"TATAAGGGGGGGGGGGGGGGGGGGGGGGGGATGCC" return 0The TATA box at the beginning of the string is followed by a start codon after 25 nucleotides.
"TATAAGGGGGGGGGGGGGGGGGGGATGCC" return -1The start codon is 19 nucleotides from the start, this isn't between 20 and 30 so return -1.
"TATAAGGGGGGGGGGGGGGGGGGGGGGGGGATCC" return -1There is no start codon
"ATGATGATGTATAAGGGGGGGGGGGGGGGGGGGGGGGGGATGCC" return 9like the first example with some extra nucleotides before the TATA box.
"TATAAGGGGGGGGGGAGTGGGTATAACCCCCCCCCCCCCCCCCCCCCCAGTCCC" return 21The first TATA box is 10 away from a start codon, but the second TATA box, starting at index 21 is 22 nucleotides away from a start codon, so return 21.