Hidden Markov Model (HMM)

The Hidden Markov Model (HMM) is a state-based statistical model that can be used to represent an individual observation sequence class \(c\). As seen in the diagram below, the rough idea is that each state should correspond to one ‘section’ of the sequence.

HMM

A single HMM is modeled by the HMM class.

Parameters and Training

The ‘sections’ in the image above are determined by the parameters of the HMM, explained below.

  • Initial state distribution \(\boldsymbol{\pi}\):
    A discrete probability distribution that dictates the probability of the HMM starting in each state.
  • Transition probability matrix \(A\):
    A matrix whose rows represent a discrete probability distribution that dictates how likely the HMM is to transition to each state, given some current state.
  • Emission probability distributions \(B\):
    A collection of \(N\) continuous multivariate probability distributions (one for each state) that each dictate the probability of the HMM generating an observation \(\mathbf{o}\), given some current state. Recall that we are generally considering multivariate observation sequences – that is, at time \(t\), we have an observation \(\mathbf{o}^{(t)}=\left(o_1^{(t)}, o_2^{(t)}, \ldots, o_D^{(t)}\right)\). The fact that the observations are multivariate necessitates a multivariate emission distribution. Sequentia uses the multivariate Gaussian distribution.

In order to learn these parameters, we must train the HMM on examples that are labeled with the class \(c\) that the HMM models. Denote the HMM that models class \(c\) as \(\lambda_c=(\boldsymbol{\pi}_c, A_c, B_c)\). We can use the Baum-Welch algorithm (an application of the Expectation-Maximization algorithm) to fit \(\lambda_c\) and learn its parameters. This fitting is implemented by the fit() function.

Model Topologies

As we usually wish to preserve the natural ordering of time, we normally want to prevent our HMM from transitioning to previous states (this is shown in the figure above). This restriction leads to what known as a left-right HMM, and is the most commonly used type of HMM for sequential modeling. Mathematically, a left-right HMM is defined by an upper-triangular transition matrix.

If we allow transitions to any state at any time, this HMM topology is known as ergodic.

Note: Ergodicity is mathematically defined as having a transition matrix with no non-zero entries. Using the ergodic topology in Sequentia will still permit zero entries in the transition matrix, but will issue a warning stating that those probabilities will not be learned.

Sequentia offers both topologies, specified by a string parameter topology in the HMM constructor that takes values ‘left-right’ or ‘ergodic’.

Making Predictions

A score for how likely a HMM is to generate an observation sequence is given by the Forward algorithm. It calculates the likelihood \(\mathbb{P}(O|\lambda_c)\) of the HMM \(\lambda_c\) generating the observation sequence \(O\).

Note: The likelihood does not account for the fact that a particular observation class may occur more or less frequently than other observation classes. Once an ensemble of HMM objects (represented by a HMMClassifier) is created and configured, this can be accounted for by calculating the joint probability (or un-normalized posterior) \(\mathbb{P}(O, \lambda_c)=\mathbb{P}(O|\lambda_c)\mathbb{P}(\lambda_c)\) and using this score to classify instead. The addition of the prior term \(\mathbb{P}(\lambda_c)\) accounts for some classes occuring more frequently than others.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import numpy as np
from sequentia.classifiers import HMM

# Create some sample data
X = [np.random.random((10 * i, 3)) for i in range(1, 4)]

# Create and fit a left-right HMM with random transitions and initial state distribution
hmm = HMM(label='class1', n_states=5, topology='left-right')
hmm.set_random_initial()
hmm.set_random_transitions()
hmm.fit(X)

For more elaborate examples, please have a look at the example notebooks.

API reference

class sequentia.classifiers.hmm.HMM(label, n_states, topology='left-right', random_state=None)[source]

A hidden Markov model representing an isolated temporal sequence class.

Parameters:
label: str

A label for the model, corresponding to the class being represented.

n_states: int

The number of states for the model.

topology: {‘ergodic’, ‘left-right’}

The topology for the model.

random_state: numpy.random.RandomState, int, optional

A random state object or seed for reproducible randomness.

Attributes:
label: str

The label for the model.

n_states: int

The number of states for the model.

n_seqs: int

The number of observation sequences use to train the model.

initial: numpy.ndarray

The initial state distribution of the model.

transitions: numpy.ndarray

The transition matrix of the model.

set_uniform_initial(self)[source]

Sets a uniform initial state distribution.

set_random_initial(self)[source]

Sets a random initial state distribution.

set_uniform_transitions(self)[source]

Sets a uniform transition matrix according to the topology.

set_random_transitions(self)[source]

Sets a random transition matrix according to the topology.

fit(self, X, n_jobs=1)[source]

Fits the HMM to observation sequences assumed to be labeled as the class that the model represents.

Parameters:
X: List[numpy.ndarray]

Collection of multivariate observation sequences, each of shape \((T \times D)\) where \(T\) may vary per observation sequence.

n_jobs: int
The number of jobs to run in parallel.
Setting this to -1 will use all available CPU cores.
forward(self, sequence)[source]

Runs the forward algorithm to calculate the (negative log) likelihood of the model generating an observation sequence.

Parameters:
sequence: numpy.ndarray

An individual sequence of observations of size \((T \times D)\) where \(T\) is the number of time frames (or observations) and \(D\) is the number of features.

Returns:
negative log-likelihood: float

The negative log-likelihood of the model generating the observation sequence.

Ensemble Hidden Markov Model Classifier (HMMClassifier)

Multiple HMMs can be combined to form an ensemble multi-class classifier. To classify a new observation sequence \(O'\), this works by:

  1. Creating and training the HMMs \(\lambda_1, \lambda_2, \ldots, \lambda_N\).
  2. Calculating the likelihoods \(\mathbb{P}(O'|\lambda_1), \mathbb{P}(O'|\lambda_2), \ldots, \mathbb{P}(O'|\lambda_N)\) of each model generating \(O'\).
    Note: You can also used the un-normalized posterior \(\mathbb{P}(O'|\lambda_c)\mathbb{P}(\lambda_c)\) instead of the likelihood.
  3. Choose the class represented by the HMM with the highest likelihood – that is, \(c^*=\mathop{\arg\max}_{c\in\{1,\ldots,N\}}{\mathbb{P}(O'|\lambda_c)}\).

These steps are summarized in the diagram below.

Ensemble HMM Classifier System

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import numpy as np
from sequentia.classifiers import HMM, HMMClassifier

# Set of possible labels
labels = ['class{}'.format(i) for i in range(5)]

# Create and fit some sample HMMs
hmms = []
for i, label in enumerate(labels):
    hmm = HMM(label=label, n_states=(i + 3), topology='left-right')
    hmm.set_random_initial()
    hmm.set_random_transitions()
    hmm.fit([np.arange((i + j * 20) * 30).reshape(-1, 3) for j in range(1, 4)])
    hmms.append(hmm)

# Create some sample test data and labels
X = [np.random.random((10 * i, 3)) for i in range(1, 4)]
y = ['class0', 'class1', 'class1']

# Create a classifier and calculate predictions and evaluations
clf = HMMClassifier()
clf.fit(hmms)
predictions = clf.predict(X)
accuracy, confusion = clf.evaluate(X, y, labels=labels)

For more elaborate examples, please have a look at the example notebooks.

API reference

class sequentia.classifiers.hmm.HMMClassifier[source]

An ensemble classifier that combines individual HMM objects, which model isolated sequences from different classes.

fit(self, models)[source]

Fits the ensemble classifier with a collection of HMM objects.

Parameters:
models: List[HMM] or Dict[Any, HMM]

A collection of HMM objects to use for classification.

predict(self, X, prior=True, return_scores=False)[source]

Predicts the label for an observation sequence (or multiple sequences) according to maximum likelihood or posterior scores.

Parameters:
X: numpy.ndarray or List[numpy.ndarray]

An individual observation sequence or a list of multiple observation sequences.

prior: bool

Whether to calculate a prior for each model and perform MAP estimation by scoring with the joint probability (or un-normalized posterior) \(\mathbb{P}(O, \lambda_c)=\mathbb{P}(O|\lambda_c)\mathbb{P}(\lambda_c)\).

If this parameter is set to false, then the negative log likelihoods \(\mathbb{P}(O|\lambda_c)\) generated from the models’ forward() function are used.

return_scores: bool

Whether to return the scores of each model on the observation sequence(s).

Returns:
prediction(s): str or List[str]

The predicted label(s) for the observation sequence(s).

If return_scores is true, then for each observation sequence, a tuple (label, scores) is returned for each label, consisting of the scores of each HMM and the label of the HMM with the best score.

evaluate(self, X, y, prior=True, labels=None)[source]

Evaluates the performance of the classifier on a batch of observation sequences and their labels.

Parameters:
X: List[numpy.ndarray]

A list of multiple observation sequences.

y: List[str]

A list of labels for the observation sequences.

prior: bool

Whether to calculate a prior for each model and perform MAP estimation by scoring with the joint probability (or un-normalized posterior) \(\mathbb{P}(O, \lambda_c)=\mathbb{P}(O|\lambda_c)\mathbb{P}(\lambda_c)\).

If this parameter is set to false, then the negative log likelihoods \(\mathbb{P}(O|\lambda_c)\) generated from the models’ forward() function are used.

labels: List[str]

A list of labels for ordering the axes of the confusion matrix.

Returns:
accuracy: float

The categorical accuracy of the classifier on the observation sequences.

confusion: numpy.ndarray

The confusion matrix representing the discrepancy between predicted and actual labels.