# Standard Markov chains

We would like to detect CpG islands in a DNA strand. Inside the island, the probability for a G following a C is higher. We assume the letters are a Markovian process (memory size = 1). For determining the likelihood of a given small section to be from an island or not, we calculate the two likelihoods .

# General case – hidden Markov models

When does a CpG island starts and stops in the strand? The analogy used is an occasionally unfair casino that switches to using a loaded die sometimes and then returns to use a normal die. We model a hidden layer of states with transition matrix a, and X is the emission with probability . The most probable path can be found by Viterbi = DP with back pointers. Note that if the string size is m, and there are n states, the complexity is O(m*n^2), as for each state we consider all states from the previous state, and DOES NOT relate to the alphabet size, as the emissions are known!

## Posterior state probabilities

We want to calculate . We want to compute the joint probability of X, . This is equal to probability of the prefix joint with and the suffix joint with that again. We find these by forward and backward algorithms

- Q: why do we need the backward alg, and can’t move forward from ?
- A: because backward is calculating the joint probability and forward is calculating a conditional probability.

We then calculate the entire X and divide by that: .

## Posterior decoding

Now to decode, we cant go position by position and take the max prob, as it might contradict the other positions… We define a function of interest which is 1 at interesting states and 0 at all other states, and the posterior probability G(i|X) is the sum of posterior state probabilities for the interesting states.

## HMM parameter estimation – known state sequence

We get a multiple sequence alignment and output an HMM model. We determine the match state positions by some heuristic (e.g. more than 50% non-gaps in the position). Then we estimate the parameters by counting transitions:

* In a match position: letter = match, gap = deletion

* In a no-match position: letter = insert, gap = nothing

then we estimate for each transition a(state->nextState) = . But when the training data is small, this leaves a lot of empty transitions. Better to use Laplace rule: a(state->nextState) = (aka Dirichlet priors).

- Q: in slide 17, what does training a profile HMM from unaligned sequences mean? “obtain MA with the resulting profile as before”??

## HMM parameter estimation – unknown state sequence

If the states are unknown, we use a version of the EM algorithm (Baum-Welch). We are given multiple sequences. In the E-step, we compute expectations for:

- Ek(b) – the conditional probability of h(i) = k in every position the character b appears, in every one of the sequences
- Akl – the conditional probability for the transition to appear in every position on every sequence

## Recitation insights

Viterbi for string against a profile – The original Viterbi shown was good for when there’s a string of size m and a single layer of n states = O(m*n^2). Now there are L layers, and it’s a O(m*L) – because now every character can be from every layer. We estimate the probability of every character, to be I/D/M for every layer depending on the ingoing edges.

The same goes for estimating forward/backward probabilities (only with sum instead of max…)

## Questions

- what’s the relation between HMM estimation when state is known, HMM estimation when states are unknown, profile HMM, EM?
- What is training a profile HMM from unaligned sequences? were they aligned before – the only difference I see is the length of the model…
- whats the connection to previous search techniques we learned? (FASTA et al)
- In the intro to HMM (hmm-2nd-lec-slide 7) they talk about the price of gaps! what do they mean? how does that come to use in the next slides illustrating hmm estimation?