Forward/Viterbi Algorithm

The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM).

And we have:

There is a linear-time algorithm for finding the most likely sequence, but it requires a little more thought. It relies on the same Markov property that yielded efficient algorithms for filtering and smoothing. The easiest way to think about the problem is to view each sequence as a path through a graph whose nodes are the possible states at each time step. Now consider the task of finding the most likely path through this graph, where the likelihood of any path is the product of the transition probabilities along the path and the probabilities of the given observations at each state. Let’s focus in particular on paths that reach the state $Rain_5 = true$ Because of the Markov property, it follows that the most likely path to the state _$Rain_5 = true$_ consists of the most likely path to some state at time 4 followed by a transition to $Rain_5 = true$ ; and the state at time 4 that will become part of the path to $Rain_5 = true$ is whichever maximizes the likelihood of that path. In other words, there is a recursive relationship between most likely paths to each state xt+1 and most likely paths to each state $x_t$. We can write this relationship as an equation connecting the probabilities of the paths:

Thus, the algorithm for computing the most likely sequence is similar to filtering: it runs for- ward along the sequence, computing the m message at each time step, using Equation above. The progress of this computation is shown in Figure below. At the end, it will have the probability for the most likely sequence reaching each of the final states. One can thus easily select the most likely sequence overall (the states outlined in bold). In order to identify the actual sequence, as opposed to just computing its probability, the algorithm will also need to record, for each state, the best state that leads to it; these are indicated by the bold arrows in Figure below. The optimal sequence is identified by following these bold arrows backwards from the best final state.

The algorithm we have just described is called the Viterbi algorithm, after its inventor. Like the filtering algorithm, its time complexity is linear in $t$, the length of the sequence. Unlike filtering, which uses constant space, its space requirement is also linear in $t$. This is because the Viterbi algorithm needs to keep the pointers that identify the best sequence leading to each state.

An example of HMM

Suppose someone wants to spy on HTTPS connections and infer the sequence of browsing webpages. How he can do this? If he/she measures the sequence of sizes of packets coming in as noisy observations and define the contents of packets as hidden variables he/she can use the HMM Model to reach that goal!

Transition model can be calculate via links on each webpage. In other words probability of choosing next webpage is related to the links in each webpage. That's mean we do random walk between webpages. After considering some tips such as dynamically generated content and user-specific content and ... we can run the HMM Algorithm to estimate probability distribution P(packet size| webpage).

In the following chart we can see the error of this algorithm is around 10% (BoG) and it's so so shocking. nowadays Deep-Learning can decrease error to about 0%! Do you think you have secuirty?!

Temporal Probability Models (Markov Models and Particle Filtering)

Search Results:

Ali Mirzaei Saghezchi

Arman Mohammadi

Arman Babaei

Mahdi Ghaznavi