# Gaussian Mixture HMM¶

The Gaussian Mixture HMM is a variant of HMM that uses a multivariate Gaussian mixture model as the emission distribution for each state.

This HMM variant can be used to recognize unbounded real-valued univariate or multivariate sequences.

## Emissions¶

The emission distribution $$b_m$$ of an observation $$\mathbf{o}^{(t)}$$ at time $$t$$ for state $$m$$ is formed by a weighted mixture of $$K$$ multivariate Gaussian probability density functions, defined as:

$b_m(\mathbf{o}^{(t)}) = \sum_{k=1}^K c_k^{(m)} \mathcal{N}_D\big(\mathbf{o}^{(t)}\ \big|\ \boldsymbol\mu_k^{(m)}, \Sigma_k^{(m)}\big)$

Where:

• $$\mathbf{o}^{(t)}=\left(o_1^{(t)}, o_2^{(t)}, \ldots, o_D^{(t)}\right)$$ is a single observation at time $$t$$, such that $$\mathbf{o}^{(t)}\in\mathbb{R}^D$$.
• $$c_k^{(m)}$$ is a component mixture weight for the $$k^\text{th}$$ Gaussian component of the $$m^\text{th}$$ state, such that $$\sum_{k=1}^K c_k^{(m)} = 1$$ and $$c_k^{(m)}\in[0, 1]$$.
• $$\boldsymbol\mu_k^{(m)}$$ is a mean vector for the $$k^\text{th}$$ Gaussian component of the $$m^\text{th}$$ state, such that $$\boldsymbol\mu_k^{(m)}\in\mathbb{R}^D$$.
• $$\Sigma_k^{(m)}$$ is a covariance matrix for the $$k^\text{th}$$ Gaussian component of the $$m^\text{th}$$ state, such that $$\Sigma_k^{(m)}\in\mathbb{R}^{D\times D}$$ and $$\Sigma_k^{(m)}$$ is symmetric and positive semi-definite.
• $$\mathcal{N}_D$$ is the $$D$$-dimensional multivariate Gaussian probability density function.

Using a mixture rather than a single Gaussian allows for more flexible modelling of observations.

The component mixture weights, mean vector and covariance matrix for all states and Gaussian components are updated during training via Expectation-Maximization through the Baum-Welch algorithm.

## Covariance matrix types¶

The $$K$$ covariance matrices for a state can come in different forms:

• Full: All values are fully learnable independently for each component.

• Diagonal: Only values along the diagonal may be learned independently for each component.

• Spherical: Same as diagonal, with a single value shared along the diagonal for each component.

• Tied: Same as full, with all components sharing the same single covariance matrix.

Estimating a full covariance matrix is not always necessary, particularly when a sufficient number of Gaussian components are used. If time is limiting, a spherical, diagonal and tied covariance matrix may also yield strong results while reducing training time due to having fewer parameters to estimate.

## API reference¶

### Class¶

 GaussianMixtureHMM A hidden Markov model with multivariate Gaussian mixture emissions.

### Methods¶

 __init__(*[, n_states, n_components, ...]) Initializes the GaussianMixtureHMM. aic(X[, lengths]) The Akaike information criterion of the model, evaluated with the maximum likelihood of X. bic(X[, lengths]) The Bayesian information criterion of the model, evaluated with the maximum likelihood of X. fit(X[, lengths]) Fits the HMM to the sequences in X, using the Baum—Welch algorithm. freeze([params]) Freezes the trainable parameters of the HMM, preventing them from being updated during the Baum—Welch algorithm. Retrieves the number of trainable parameters. Calculates the log-likelihood of the HMM generating a single observation sequence. set_start_probs([values]) Sets the initial state probabilities. set_state_covariances(values) Sets the covariance matrices of the state emission distributions. set_state_means(values) Sets the mean vectors of the state emission distributions. set_state_weights(values) Sets the component mixture weights of the state emission distributions. set_transitions([values]) Sets the transition probability matrix. unfreeze([params]) Unfreezes the trainable parameters of the HMM, allowing them to be updated during the Baum—Welch algorithm.

class sequentia.models.hmm.variants.GaussianMixtureHMM[source]

A hidden Markov model with multivariate Gaussian mixture emissions.

Examples

Using a GaussianMixtureHMM to learn how to recognize spoken samples of the digit 3.

See load_digits() for more information on the sample dataset used in this example.

import numpy as np
from sequentia.models.hmm import GaussianMixtureHMM

# Seed for reproducible pseudo-randomness
random_state = np.random.RandomState(1)

# Fetch MFCCs of spoken samples for the digit 3
train_data, test_data = data.split(test_size=0.2, random_state=random_state)

# Create and train a GaussianMixtureHMM to recognize the digit 3
model = GaussianMixtureHMM(random_state=random_state)
X_train, lengths_train = train_data.X_lengths
model.fit(X_train, lengths_train)

# Calculate the log-likelihood of the first test sample being generated by this model
x, y = test_data[0]
model.score(x)

__init__(*, n_states=5, n_components=3, covariance_type='spherical', topology='left-right', random_state=None, hmmlearn_kwargs={'init_params': 'stmcw', 'params': 'stmcw'})[source]

Initializes the GaussianMixtureHMM.

Parameters:
• n_states (PositiveInt) – Number of states in the Markov chain.

• n_components (PositiveInt) – Number of Gaussian components in the mixture emission distribution for each state.

• covariance_type (Literal['spherical', 'diag', 'full', 'tied']) – Type of covariance matrix in the mixture emission distribution for each state - see Covariance matrix types.

• topology (Literal['ergodic', 'left-right', 'linear'] | None) –

Transition topology of the Markov chain — see Topologies.

• If None, behaves the same as 'ergodic' but with hmmlearn initialization.

• random_state (NonNegativeInt | RandomState | None) – Seed or numpy.random.RandomState object for reproducible pseudo-randomness.

• hmmlearn_kwargs (Dict[str, Any]) – Additional key-word arguments provided to the hmmlearn HMM constructor.

Return type:

GaussianMixtureHMM

aic(X, lengths=None)[source]

The Akaike information criterion of the model, evaluated with the maximum likelihood of X.

Parameters:
• X (Array) –

Univariate or multivariate observation sequence(s).

• Should be a single 1D or 2D array.

• Should have length as the 1st dimension and features as the 2nd dimension.

• Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

• lengths (Array | None) –

Lengths of the observation sequence(s) provided in X.

• If None, then X is assumed to be a single observation sequence.

• len(X) should be equal to sum(lengths).

Note:

This method requires a trained model — see fit().

Returns:

The Akaike information criterion.

Return type:

float

bic(X, lengths=None)[source]

The Bayesian information criterion of the model, evaluated with the maximum likelihood of X.

Parameters:
• X (Array) –

Univariate or multivariate observation sequence(s).

• Should be a single 1D or 2D array.

• Should have length as the 1st dimension and features as the 2nd dimension.

• Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

• lengths (Array | None) –

Lengths of the observation sequence(s) provided in X.

• If None, then X is assumed to be a single observation sequence.

• len(X) should be equal to sum(lengths).

Note:

This method requires a trained model — see fit().

Returns:

The Bayesian information criterion.

Return type:

float

fit(X, lengths=None)[source]

Fits the HMM to the sequences in X, using the Baum—Welch algorithm.

Parameters:
• X (Array) –

Univariate or multivariate observation sequence(s).

• Should be a single 1D or 2D array.

• Should have length as the 1st dimension and features as the 2nd dimension.

• Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

• lengths (Array | None) –

Lengths of the observation sequence(s) provided in X.

• If None, then X is assumed to be a single observation sequence.

• len(X) should be equal to sum(lengths).

Returns:

The fitted HMM.

Return type:

GaussianMixtureHMM

freeze(params='stmcw')[source]

Freezes the trainable parameters of the HMM, preventing them from being updated during the Baum—Welch algorithm.

Parameters:

params (str) –

A string specifying which parameters to freeze. Can contain a combination of:

• 's' for initial state probabilities,

• 't' for transition probabilities,

• 'm' for emission distribution means,

• 'c' for emission distribution covariances,

• 'w' for emission distribution mixture weights.

Note:

If used, this method should normally be called before fit().

unfreeze

Unfreezes the trainable parameters of the HMM, allowing them to be updated during the Baum—Welch algorithm.

n_params()[source]

Retrieves the number of trainable parameters.

Note:

This method requires a trained model — see fit().

Returns:

Number of trainable parameters.

Return type:

NonNegativeInt

score(x)[source]

Calculates the log-likelihood of the HMM generating a single observation sequence.

Parameters:

x (Array) –

Univariate or multivariate observation sequence.

• Should be a single 1D or 2D array.

• Should have length as the 1st dimension and features as the 2nd dimension.

Note:

This method requires a trained model — see fit().

Returns:

The log-likelihood.

Return type:

float

set_start_probs(values='random')

Sets the initial state probabilities.

If this method is not called, initial state probabilities are initialized depending on the value of topology provided to __init__().

• If topology was set to 'ergodic', 'left-right' or 'linear', then random probabilities will be assigned according to the topology by calling set_start_probs() with value='random'.

• If topology was set to None, then initial state probabilities will be initialized by hmmlearn.

Parameters:

values (Array | Literal['uniform', 'random']) –

Probabilities or probability type to assign as initial state probabilities.

• If an Array, should be a vector of starting probabilities for each state.

• If 'uniform', there is an equal probability of starting in any state.

• If 'random', the vector of initial state probabilities is sampled from a Dirichlet distribution with unit concentration parameters.

Note:

If used, this method should normally be called before fit().

set_state_covariances(values)[source]

Sets the covariance matrices of the state emission distributions.

If this method is not called, covariance matrices will be initialized by hmmlearn.

Parameters:

values (Array) – Array of emission distribution covariance values.

Note:

If used, this method should normally be called before fit().

set_state_means(values)[source]

Sets the mean vectors of the state emission distributions.

If this method is not called, mean vectors will be initialized by hmmlearn.

Parameters:

values (Array) – Array of emission distribution mean values.

Note:

If used, this method should normally be called before fit().

set_state_weights(values)[source]

Sets the component mixture weights of the state emission distributions.

If this method is not called, component mixture weights will be initialized by hmmlearn.

Parameters:

values (Array) – Array of emission distribution component mixture weights.

Note:

If used, this method should normally be called before fit().

set_transitions(values='random')

Sets the transition probability matrix.

If this method is not called, transition probabilities are initialized depending on the value of topology provided to __init__():

• If topology was set to 'ergodic', 'left-right' or 'linear', then random probabilities will be assigned according to the topology by calling set_transitions() with value='random'.

• If topology was set to None, then initial state probabilities will be initialized by hmmlearn.

Parameters:

values (Array | Literal['uniform', 'random']) –

Probabilities or probability type to assign as state transition probabilities.

• If an Array, should be a matrix of probabilities where each row must some to one and represents the probabilities of transitioning out of a state.

• If 'uniform', for each state there is an equal probability of transitioning to any state permitted by the topology.

• If 'random', the vector of transition probabilities for each row is sampled from a Dirichlet distribution with unit concentration parameters, according to the shape of the topology.

Note:

If used, this method should normally be called before fit().

unfreeze(params='stmcw')[source]

Unfreezes the trainable parameters of the HMM, allowing them to be updated during the Baum—Welch algorithm.

Parameters:

params (str) –

A string specifying which parameters to unfreeze. Can contain a combination of:

• 's' for initial state probabilities,

• 't' for transition probabilities,

• 'm' for emission distribution means,

• 'c' for emission distribution covariances,

• 'w' for emission distribution mixture weights.

freeze

Freezes the trainable parameters of the HMM, preventing them from be updated during the Baum—Welch algorithm.

covariance_type

Type of covariance matrix in the emission model mixture distribution for each state.

hmmlearn_kwargs

Additional key-word arguments provided to the hmmlearn HMM constructor.

model

Underlying HMM object from hmmlearn — only set after fit().

n_components

Number of Gaussian components in the emission model mixture distribution for each state.

n_states

Number of states in the Markov chain.

random_state

Seed or numpy.random.RandomState object for reproducible pseudo-randomness.

topology

Transition topology of the Markov chain — see Topologies.