(a) In order to calculate a multiple sequence alignment for N sequences, how many pair- wise alignments have to be calculated?
(b) Align the following using “star alignment” showing all intermediate steps:
S1= ATTCGGATT
S2= ATCCGGATT
S3= ATGGAATTTT
S4= ATGTTGTT
S5= AGTCAGG
(a) You have a protein of unknown function from a bacterium. You have made a knock- out mutant, but the bacteria die immediately without the corresponding gene. You have sequenced the protein. What steps would you take to guess the function of the protein? What kind of information would you look for?
(a) What is the difference between spotted and oligonucleotide microarrays?
(b) What is a probe? How are probes for microarrays designed?
(c) What is a probeset? What is probeset summarization and why do we need it?
(d) If a gene is shown to be induced four-fold in a microarray experiment, what would be the log2-transformed expression ratio?
(a) Why do you have to normalize microarray data to compare two conditions? Explain two normalization techniques that can be used here.
(b) Describe and discuss specific problems likely to appear on a microarray? Describe and discuss what measures can be taken to reduce or eliminate such effects from a data analysis point of view?
(a) What is the output obtained from a RNA-seq experiment? Why do you have to remove rRNA and tRNA before performing RNA-seq?
(b) Why is mapping of RNA-seq reads more difficult than mapping re-sequencing reads or ChIP-seq reads? Explain.
(c) What is Phred quality score? Explain its use in RNA-seq experiment.
(a) Why must the inside of a spectrometer kept at a high vacuum?
(b) How are molecular ions formed? What information could be obtained from
mass/charge value of a molecular ion?
(c) Define Ion Trap, ICR, Quadrupole and Octapole.
(a) A researcher is scanning a cDNA microarray and obtains an image with the following characteristics: most of the spots are visible and many are very bright; the background appears to be light gray. The researcher proceeds to the image processing and quantification stages and finds that most spots appear to be characterized by a high average intensity. Discuss what might have happened? What steps would you undertake in order to test your hypothesis and correct the situation?
(b) What experiment can be used as an alternative to microarray analysis? What are its advantages over the former?
(c) What is the difference between concordant, discordant and unmapped reads?
There are two hypothetical “one column” multiple sequence alignments. In the first alignment of N sequences, every residue is a tyrosine. In the second alignment, there are N-1 tyrosines and one proline.
(b) Given that the BLOSUM62 Y ↔ Y score is 7, calculate the score of the first alignment as a function of N. This is SY N (N).
(c) Given that the BLOSUM62 Y ↔ P score is -4, calculate the score of the second alignment as a function of N. This is SY N-1 P1 (N).(d) Evaluate and simplify the following expression, representing the fractional difference between the two different sequence alignments.
f (N) = [SY N (N) - SY N-1 P1 (N)]/ SY N (N)
(e) Construct a plot of the expression you derived in part (c), and explain why this scoring behavior is incorrect
(a) Differentiate between: 1) a standard sequence consensus; 2) a one-dimensional, ‘regular-expression’ motif; 3) a simple, two-dimensional, weight matrix; 4) profile position specific site matrix (PSSM); and 5) a profile Hidden Markov Model (HMM).
Discuss pros and cons, relative power of each, why and when one would be used over the other.
(b) What do logs-odds scoring matrices like the BLOSUM50 table have to do with the concept of ‘pseudocounts’ and background frequencies in most types of multiple sequence alignment profiles?
(a) Assume that we have n sequences, each 50 residues long and pairwise alignment of two such sequences takes 1 second of CPU time on computer. An alignment of four sequences takes (2L) N-2 =10 2N-4 = 10 4 seconds. If we had unlimited computer memory and can wait for the answer until just before the sun burns out in five billion years, what is n that our computer could align?
(b) Outline the whole genome re-sequencing pipeline of short reads arising out of Next Generation Sequencing platform.