# Hidden Markov Models¶

The Hidden Markov Model (HMM) is a state-based statistical model for sequence modelling.

When used for classification, a HMM can be used to represent an individual observation sequence class. For example, if we were recognizing spoken digits from the Free Spoken Digit Dataset, we would train a separate HMM for each digit, to recognize observation sequences belonging to that class.

HMMs can be used to classify both categorical and numerical sequences.

See [1] for a detailed introduction to HMMs.

## Parameters and training¶

A HMM is composed of:

• a Markov chain, which models the probability of transitioning between hidden states.

• an emission model, which models the probability of emitting an observation from a hidden state.

A HMM $$\lambda$$ is defined by the following parameters:

• Initial state distribution $$\boldsymbol{\pi}$$:
A probability distribution that dictates the probability of the HMM starting in each state.
• Transition probability matrix $$A$$:
A matrix whose rows represent a probability distribution that determine how likely the HMM is to transition to each state, given some current state.

Note

Sequentia HMMs are time homogeneous.

• Emission probability distributions $$B$$:
A collection of $$M$$ probability distributions (one for each state) that specify the probability of the HMM emitting an observation given some current state.
• For categorical sequences, the emission distribution $$b_m(o^{(t)})$$ at state $$m$$ is a univariate discrete distribution of the probability of the observation $$o^{(t)}$$ at time $$t$$ being one of the possible symbols $$\mathcal{S}=\{s_0,s_1,\ldots,s_K\}$$.

This collection of state emission distributions can be modelled as an $$M \times K$$ transition matrix over all states and symbols $$\mathcal{S}$$.

• For numerical sequences, the emission distribution $$b_m(\mathbf{o}^{(t)})$$ at state $$m$$ is a multivariate continuous distribution of the probability of the observation $$\mathbf{o}^{(t)}$$ at time $$t$$.

Numerical sequence support in Sequentia assumes unbounded real-valued emissions which are modelled according to a multivariate Gaussian mixture distribution.

HMMs are fitted to observation sequences using the Baum-Welch (or forward-backward) algorithm which learns all of the parameters described above via Expectation-Maximization (EM).

## Topologies¶

The nature of the transition matrix determines the topology of the HMM.

Three common types of topology used in sequence modelling are ergodic, left-right and linear.

• Ergodic topology: All states have a non-zero probability of transitioning to any state.

• Left-right topology: States are arranged in a way such that any state may only transition to itself or any state ahead of it, but not to any previous state.

• Linear topology: Same as left-right, but states are only permitted to transition to the next state.

Left-right topologies are particularly useful for modelling sequences where ordering must be respected.

Note

Sequentia will still permit zero probabilities in an ergodic transition matrix, but will issue a warning stating that these probabilities will not be learned.

## Making predictions¶

Multiple HMMs trained to recognize individual observation sequence classes can be combined to form a single multi-class classifier that makes predictions according to posterior maximization.

See HMM Classifier for more detail on how HMMs can be used for classification.

References