In this problem you'll analyze only part of a DNA sequence in looking for TATA boxes. In real genomic data a TATA box that indicates a possible protein coding sequence must be followed by a start codon ATG located approximately 150 base pairs from the TATA box. In this program rather than 150 you'll use the range (20-30).
Any DNA sequence with a TATA-box located 20-30 base-pairs before a start codon is a possible protein source. The distance between the last A in TATAA and the first A in ATG must be 20-30 nucleotides in this problem. A TATA box is a predictor for the closest start codon. This means that in the sequence TATAAGGGGGGTATAAGGATG the first box doesn't predict the start codon at the end of the sequence, the second TATA box is the predictor (and it is only two nucleotides away).
Given an array of DNA sequences, return the number of sequences that could be protein sources according to the rules here.
String[]
int
public int proteinCount(String[] list)(be sure your method is public)
list will have a length of less
than 100.
{
"TATAAGGGGGGGGGGGGGGGGGGGGGGGGGATGCC"
"TATAAGGGGGGGGGGGGGGGGGGGATGCC"
"TATAAGGGGGGGGGGGGGGGGGGGGGGGGGATCC"
"ATGATGATGTATAAGGGGGGGGGGGGGGGGGGGGGGGGGATGCC"
}
Returns: 2
The first string has a TATA box followed by a start codon 25 base pairs later. The second string has a start codon 19 base pairs after the TATA box. The third string has no stop codon, the fourth string is similar to the first, but has some nucleotides before the TATA box. So the first and fourth strings are protein coders, return 2.
{
"TATAAGGGGGGGTATAAGGGGGGGGGGAGTCC"
}
Returns: 0
The only string has a TATA box at index 0 and one at index 12. Although the first box and the start codon are 22 nucleotides apart, the second (and closest to start codon) TATA box is only 10 away, so this DNA strand does not meet the criteria.