Strings vary and apply to different specialization. They are sequences of letters or any character that you can type.
For instance, what a pianist think about the word strings is not the same as what Genome scientists think about strings.
For the later, genomic strings come to mind.
The Genome of an organism stores all the necessary genetic information to maintain that organism. This genetic information is stored as a long list over the four-letter alphabets “A”, “T”, “C” and “G”.
These four characters correspond to the four DNA bases:
- Adenine
- Thymine
- Cytosine
- Guanine
The size of the Genome is difficult to be analyzed by hand if not impossible, thus finding any information on the Gene requires computational approaches. In addition the Genome is complex and contains different types of information so computational approaches are needed to find information including Gene as we will soon demonstrate using a simplified Java Algorithm.
Finding a Gene requires more than simply looking at the tags or Codons which indicate the start and end of a Gene. We also need to look for regulatory elements as demonstrated in this tutorial and we will be doing this utilizing computational techniques
we initialize the following variables
int startIndex =0;
int stopCodonIndex =0;
String theSubString =" ";
The “ATG” represents the start of a Gene and is called the start codon while the TAA indicates the end of a Gene and is called the stop codon. There are other stop codons but for now we will limit it to TAA. Everything between and including these two codons makes up one Gene
The code below is a simple algorithm for finding a Gene in String which represents DNA.
We will also utilze the fact that real Genes are multiple of 3 in lenght because they are made up of codons
public String findSimpleGene (String dna) {
for(int it =0; it< dna.length(); it++) {
if (dna.indexOf("ATG") != -1) {
//find the index position of the start codon
startIndex = dna.indexOf("ATG");
//return startIndex;
}
else
return "";
}
for(int it =0; it< dna.length(); it++) {
if (dna.indexOf("TAA", startIndex) != -1) {
//find the index position of the stOP codon
stopCodonIndex = dna.indexOf("TAA", startIndex);
//theSubString = dna.substring(0, stopCodonIndex);//startIndex +3 , stopCodonIndex);
}else
return "nothing";
}
if(dna.substring(startIndex +3 , stopCodonIndex).length()%3 ==0) {
theSubString = dna.substring(0);//startIndex +3 , stopCodonIndex);
System.out.println("the DNA present is: " + theSubString);
}
return theSubString;
}
public void testSimpleGene () {
String st =" ";
String dnaPresent = " ";
String [] dnaSamples = new String [5];
dnaSamples[0] = "GCTTAACCGTGACCTTAACGGA";
dnaSamples[1] = "ATGCCGGTATTCGAGCGGTG";
dnaSamples[2] = "GCCCGTCCGGCGCTCGCCGT";
dnaSamples[3] = "CCGATGCGTCCGTAAACCTG";
dnaSamples[4] = "TTTATGACGCGTAAAGGCT";
for(int i =0; i < dnaSamples.length; i++) {
st = dnaSamples[i];
System.out.println("The samples of possible DNA are: " + st);
dnaPresent = findSimpleGene(st);
}
}
testSimpleGene()