Hidden Markov Models¶
The Hidden Markov Model (HMM) is a state-based statistical model for sequence modelling.
When used for classification, a HMM can be used to represent an individual observation sequence class. For example, if we were recognizing spoken digits from the Free Spoken Digit Dataset, we would train a separate HMM for each digit, to recognize observation sequences belonging to that class.
HMMs can be used to classify both categorical and numerical sequences.
See also
See [1] for a detailed introduction to HMMs.
Parameters and training¶
A HMM is composed of:
a Markov chain, which models the probability of transitioning between hidden states.
an emission model, which models the probability of emitting an observation from a hidden state.
A HMM \(\lambda\) is defined by the following parameters:
- Initial state distribution \(\boldsymbol{\pi}\):A probability distribution that dictates the probability of the HMM starting in each state.
- Transition probability matrix \(A\):A matrix whose rows represent a probability distribution that determine how likely the HMM is to transition to each state, given some current state.
Note
Sequentia HMMs are time homogeneous.
- Emission probability distributions \(B\):A collection of \(M\) probability distributions (one for each state) that specify the probability of the HMM emitting an observation given some current state.
- For categorical sequences, the emission distribution \(b_m(o^{(t)})\) at state \(m\) is a univariate discrete distribution of the probability of the observation \(o^{(t)}\) at time \(t\) being one of the possible symbols \(\mathcal{S}=\{s_0,s_1,\ldots,s_K\}\).
This collection of state emission distributions can be modelled as an \(M \times K\) transition matrix over all states and symbols \(\mathcal{S}\).
- For numerical sequences, the emission distribution \(b_m(\mathbf{o}^{(t)})\) at state \(m\) is a multivariate continuous distribution of the probability of the observation \(\mathbf{o}^{(t)}\) at time \(t\).
Numerical sequence support in Sequentia assumes unbounded real-valued emissions which are modelled according to a multivariate Gaussian mixture distribution.
HMMs are fitted to observation sequences using the Baum-Welch (or forward-backward) algorithm which learns all of the parameters described above via Expectation-Maximization (EM).
Topologies¶
The nature of the transition matrix determines the topology of the HMM.
Three common types of topology used in sequence modelling are ergodic, left-right and linear.
Ergodic topology: All states have a non-zero probability of transitioning to any state.
Left-right topology: States are arranged in a way such that any state may only transition to itself or any state ahead of it, but not to any previous state.
Linear topology: Same as left-right, but states are only permitted to transition to the next state.
Left-right topologies are particularly useful for modelling sequences where ordering must be respected.
Note
Making predictions¶
Multiple HMMs trained to recognize individual observation sequence classes can be combined to form a single multi-class classifier that makes predictions according to posterior maximization.
See HMM Classifier for more detail on how HMMs can be used for classification.
References