Java Programming: Working with Strings representing DNA

Strings vary and apply to different specialization. They are sequences of letters or any character that you can type.

For instance, what a pianist think about the word strings is not the same as what Genome scientists think about strings.
For the later, genomic strings come to mind.
The Genome of an organism stores all the necessary genetic information to maintain that organism. This genetic information is stored as a long list over the four-letter alphabets “A”, “T”, “C” and “G”.
These four characters correspond to the four DNA bases:

  1. Adenine
  2. Thymine
  3. Cytosine
  4. Guanine
    The size of the Genome is difficult to be analyzed by hand if not impossible, thus finding any information on the Gene requires computational approaches. In addition the Genome is complex and contains different types of information so computational approaches are needed to find information including Gene as we will soon demonstrate using a simplified Java Algorithm.
    Finding a Gene requires more than simply looking at the tags or Codons which indicate the start and end of a Gene. We also need to look for regulatory elements as demonstrated in this tutorial and we will be doing this utilizing computational techniques

we initialize the following variables

In [1]:
int startIndex =0;
int stopCodonIndex =0;
String theSubString =" ";

The “ATG” represents the start of a Gene and is called the start codon while the TAA indicates the end of a Gene and is called the stop codon. There are other stop codons but for now we will limit it to TAA. Everything between and including these two codons makes up one Gene

The code below is a simple algorithm for finding a Gene in String which represents DNA.
We will also utilze the fact that real Genes are multiple of 3 in lenght because they are made up of codons

In [2]:
public String findSimpleGene (String dna) {

for(int it =0; it< dna.length(); it++) {
    if (dna.indexOf("ATG") != -1) {
    //find the index position of the start codon
        startIndex = dna.indexOf("ATG");
        //return startIndex;
        }
        else
            return "";
    }

for(int it =0; it< dna.length(); it++) {
    if (dna.indexOf("TAA", startIndex) != -1) {
    //find the index position of the stOP codon
        stopCodonIndex = dna.indexOf("TAA", startIndex);
        //theSubString = dna.substring(0, stopCodonIndex);//startIndex +3 , stopCodonIndex);

        }else
            return "nothing";
            }
    if(dna.substring(startIndex +3 , stopCodonIndex).length()%3 ==0) {
        theSubString = dna.substring(0);//startIndex +3 , stopCodonIndex);
        System.out.println("the DNA present is: " + theSubString);
        }
            return theSubString;
}
In [3]:
public void testSimpleGene () {
    String st =" ";
    String dnaPresent = " ";
    String [] dnaSamples = new String [5];
    dnaSamples[0] = "GCTTAACCGTGACCTTAACGGA";
    dnaSamples[1] =  "ATGCCGGTATTCGAGCGGTG";
    dnaSamples[2]  = "GCCCGTCCGGCGCTCGCCGT";
    dnaSamples[3] = "CCGATGCGTCCGTAAACCTG";
    dnaSamples[4] =  "TTTATGACGCGTAAAGGCT";
    for(int i =0; i < dnaSamples.length; i++) {
        st = dnaSamples[i];
        System.out.println("The samples of possible DNA are: " + st);
        dnaPresent = findSimpleGene(st);
    }
}
In [4]:
testSimpleGene()
The samples of possible DNA are: GCTTAACCGTGACCTTAACGGA
The samples of possible DNA are: ATGCCGGTATTCGAGCGGTG
The samples of possible DNA are: GCCCGTCCGGCGCTCGCCGT
The samples of possible DNA are: CCGATGCGTCCGTAAACCTG
the DNA present is: CCGATGCGTCCGTAAACCTG
The samples of possible DNA are: TTTATGACGCGTAAAGGCT
In [ ]:

Leave a Reply

Your email address will not be published. Required fields are marked *