Hidden Markov Models

The Hidden Markov Model (HMM) is a state-based statistical model for sequence modelling.

When used for classification, a HMM can be used to represent an individual observation sequence class. For example, if we were recognizing spoken digits from the Free Spoken Digit Dataset, we would train a separate HMM for each digit, to recognize observation sequences belonging to that class.

HMMs can be used to classify both categorical and numerical sequences.

See also

See [1] for a detailed introduction to HMMs.

Parameters and training

A HMM is composed of:

  • a Markov chain, which models the probability of transitioning between hidden states.

  • an emission model, which models the probability of emitting an observation from a hidden state.

A HMM \(\lambda\) is defined by the following parameters:

  • Initial state distribution \(\boldsymbol{\pi}\):
    A probability distribution that dictates the probability of the HMM starting in each state.
  • Transition probability matrix \(A\):
    A matrix whose rows represent a probability distribution that determine how likely the HMM is to transition to each state, given some current state.


    Sequentia HMMs are time homogeneous.

  • Emission probability distributions \(B\):
    A collection of \(M\) probability distributions (one for each state) that specify the probability of the HMM emitting an observation given some current state.
    • For categorical sequences, the emission distribution \(b_m(o^{(t)})\) at state \(m\) is a univariate discrete distribution of the probability of the observation \(o^{(t)}\) at time \(t\) being one of the possible symbols \(\mathcal{S}=\{s_0,s_1,\ldots,s_K\}\).

      This collection of state emission distributions can be modelled as an \(M \times K\) transition matrix over all states and symbols \(\mathcal{S}\).

    • For numerical sequences, the emission distribution \(b_m(\mathbf{o}^{(t)})\) at state \(m\) is a multivariate continuous distribution of the probability of the observation \(\mathbf{o}^{(t)}\) at time \(t\).

      Numerical sequence support in Sequentia assumes unbounded real-valued emissions which are modelled according to a multivariate Gaussian mixture distribution.

HMMs are fitted to observation sequences using the Baum-Welch (or forward-backward) algorithm which learns all of the parameters described above via Expectation-Maximization (EM).


The nature of the transition matrix determines the topology of the HMM.

Three common types of topology used in sequence modelling are ergodic, left-right and linear.

  • Ergodic topology: All states have a non-zero probability of transitioning to any state.

  • Left-right topology: States are arranged in a way such that any state may only transition to itself or any state ahead of it, but not to any previous state.

  • Linear topology: Same as left-right, but states are only permitted to transition to the next state.

Left-right topologies are particularly useful for modelling sequences where ordering must be respected.

HMM Topologies


Sequentia will still permit zero probabilities in an ergodic transition matrix, but will issue a warning stating that these probabilities will not be learned.

Making predictions

Multiple HMMs trained to recognize individual observation sequence classes can be combined to form a single multi-class classifier that makes predictions according to posterior maximization.

See HMM Classifier for more detail on how HMMs can be used for classification.