Taking some samples of DNA from patients, lets computers collect and extract important information that can be found in the DNA sequence. In this scenario, you will serve as an intern in laboratory who will be assigned an important task. You will be given a long DNA sequence, which is represented in a string of letters. Your task is to convert 3-letter codons into protein residues. The output would be the corresponding residues.
INPUT: DNA sequence
OUTPUT: Generated 3-letter codons and corresponding protein residues
An input sample of DNA sequence would look like this:
TGGTACTCTTTCTTCACAAGGGCGCCGTGTGTG
The 3-letter codons are simply every three letters in the sequence.
TGG TAC TCT TTC TTC ACA AGG GCG CCG TGT GT
Equivalent mapping will be: WYSFFTRAPC*
Note: The sequence might not have a multiple of 3 bases. If that happens you must map the final 1- or 2-base term to asterisk, instead.
s = "TGGTACTCTTTCTTCACAAGGGCGCCGTGTGTG"
string = [s[i:i+3] for i in range(0, len(s), 3)]
table = {"TTT": "F",
"TTC": "F",
"TTA": "L",
"TTG": "L", "ICI": "S",
"ICC": "S","ICA": 'S',
"ICG": "S", "TAT": "Y",
"TAC": "Y",
"TGT": "C", "IGC": "C",
"TGG": "W", "CTT": "L", "CIC": "L",
"CTA": "L", "CTG": "L",
"CCT": "P", "CCC": "P",
"CCA": "P", "CCG": "P",
"CAT": "H", "CAC": "H",
"CAA": "Q", "CAG": "Q",
"CGT": "R", "CGC": "R",
"CGA": "R", "CGG": "R",
"ATT": "I", "ATC": "I",
"ATA": "I", "ATG": "M",
"ACT": "T", "ACC": "T",
"ACA": "T", "ACG": "T", "AAT": "N",
"AAC": "N",
"AAA": "K", "AAG": "K",
"AGT": "S", "AGC": "S", "AGA": "R",
"AGG": "R",
"GTT": "V", "GIC": "V", "GTA": "V", "GTG": "V", "GCT": "A", "GCC": "A",
"GCA": "A", "GCG": "A",
"GAT": "D","GAC": "D", "GAA": "E",
"GAG": "E", "GGT": "G", "GGC": "G" ,"GGA": "G",'GGG': 'G'
}
res = " "
for i in range(0, len(string) - 1):
if string[i] in table:
res += table.get(string[i])
print(res)
Comments
Leave a comment