(a) Show how Hidden Markov Models (HMMs) are used to build profiles, using the
following alignment:
LEVK
LDIR
LEIK
LDVE
(b) How do HMMs help to deal with gaps in protein families? Explain.
(a) Find the sum-of-pairs score for a given alignment. Use the following scoring function for this program: 4 points for a match, -1 points for a mismatch, -2 for a s(-,base) or s(base,-) and 0 for a s(-,-).
A-G
AC-
TCG
(b) What is the Jukes-Cantor distance model and why is it more appropriate than a simple model that merely counts the number of mismatches?
(a) Explain the role of guide trees in progressive multiple sequence alignment algorithms.
What do the leaf and internal nodes of a guide tree represent?
(b) Determine the sum-of-pairs scores for the following multiple sequence alignment of DNA sequences, using the scoring matrix in which a match gets a value of +4, a mismatch gets a value of -1, a (base,-) pair gets a value of -2, and a (-,-) pair gets a value of 0.
GCAA
GT - A
C - - A
What are some issues associated with adapting multiple sequence alignment programs to large genomic sequences?
(a) What is a sequence pattern? Explain the use of patterns for functional annotation.
(b) What are True positive, True negative, False positive, and False-negative in the context of pattern searches in protein sequences? How to obtain sequence patterns?
(a) What are low complexity regions and how are they handled in database searching and why?
(b) What is the importance of E-value in database searching?
(a) Distinguish between the programs BLASTP, BLASTN, BLASTX, TBLASTN,
TBLASTX of the BLAST package.
(b) Give a brief discussion of programs, PHI-BLAST, and PSI-BLAST. How is PSI- BLAST used in multiple sequence alignment?
(a)What is RMSD and what is it used for?
(b) Proteins are not rigid, but flexible. How could the RMSD definition be modified to cope with flexibility? Give the advantages and disadvantages of the basic RMSD definition and
your modification.
(a) Explain the energy landscape model for protein folding with appropriate illustrations.
(b) What is the principle of minimal frustration?
(a) Hidden Markov models (HMM) are used to identify genes in genome sequencing projects. Describe how you would build a hidden Markov model to identify genes in a genome sequence.
(b) Give one other application of hidden Markov models.