Categorical HMM

The Categorical HMM is a variant of HMM that uses a discrete probability distribution over a finite set of symbols as the emission distribution for each state.

This HMM variant can be used to recognize categorical univariate sequences.


The emission distribution \(b_m\) of an observation \(o^{(t)}\) at time \(t\) for state \(m\) is given by a probability vector:

\[\bigg[\underbrace{\mathbb{P}\big(o^{(t)}=s_0\ |\ q^{(t)}=m\big)}_{p_{m,0}}, \ldots, \underbrace{\mathbb{P}\big(o^{(t)}=s_K\ |\ q^{(t)}=m\big)}_{p_{m,K}}\bigg]\]


  • \(\mathcal{S}=\{s_0,s_1,\ldots,s_K\}\) is a finite set of observation symbols.
  • \(o^{(t)}\in\mathcal{S}\) is a single observation at time \(t\).
  • \(q^{(t)}\) is a discrete random variable representing the hidden state at time \(t\).
  • \(p_{m,k}=\mathbb{P}\big(o^{(t)}=s_k\ |\ q^{(t)}=m\big)\) is the probability of observing \(s_k\) while in state \(m\).

The emission distributions for all states can be represented by a single \(M\times K\) emission matrix:

\[\begin{split}\begin{bmatrix} p_{0,0} & \cdots & p_{0,K} \\ \vdots & \ddots & \vdots \\ p_{M,0} & \cdots & p_{M,K} \end{bmatrix}\end{split}\]


Observation symbols must be encoded as integers. Consider performing label encoding using sklearn.preprocessing.LabelEncoder.

API reference



A hidden Markov model with univariate categorical emissions.


__init__(*[, n_states, topology, ...])

Initializes the CategoricalHMM.

aic(X[, lengths])

The Akaike information criterion of the model, evaluated with the maximum likelihood of X.

bic(X[, lengths])

The Bayesian information criterion of the model, evaluated with the maximum likelihood of X.

fit(X[, lengths])

Fits the HMM to the sequences in X, using the Baum—Welch algorithm.


Freezes the trainable parameters of the HMM, preventing them from being updated during the Baum—Welch algorithm.


Retrieves the number of trainable parameters.


Calculates the log-likelihood of the HMM generating a single observation sequence.


Sets the initial state probabilities.


Sets the state emission distribution of the HMM's emission model.


Sets the transition probability matrix.


Unfreezes the trainable parameters of the HMM, allowing them to be updated during the Baum—Welch algorithm.

class sequentia.models.hmm.variants.CategoricalHMM[source]

A hidden Markov model with univariate categorical emissions.


Using a CategoricalHMM to learn how to recognize DNA sequences from the synthetase gene family.

See load_gene_families() for more information on the sample dataset used in this example.

import numpy as np
from sequentia.datasets import load_gene_families
from sequentia.models.hmm import CategoricalHMM

# Seed for reproducible pseudo-randomness
random_state = np.random.RandomState(1)

# Fetch DNA sequences for the synthetase gene family (no. 4)
data, enc = load_gene_families(families=[4])
train_data, test_data = data.split(test_size=0.2, random_state=random_state)

# Create and train a CategoricalHMM to recognize the synthetase DNA sequences
model = CategoricalHMM(random_state=random_state)
X_train, lengths_train = train_data.X_lengths, lengths_train)

# Calculate the log-likelihood of the first test sample being generated by this model
x, y = test_data[0]
__init__(*, n_states=5, topology='left-right', random_state=None, hmmlearn_kwargs={'init_params': 'ste', 'params': 'ste'})[source]

Initializes the CategoricalHMM.

  • n_states (PositiveInt) – Number of states in the Markov chain.

  • topology (Literal['ergodic', 'left-right', 'linear'] | None) –

    Transition topology of the Markov chain — see Topologies.

    • If None, behaves the same as 'ergodic' but with hmmlearn initialization.

  • random_state (NonNegativeInt | RandomState | None) – Seed or numpy.random.RandomState object for reproducible pseudo-randomness.

  • hmmlearn_kwargs (Dict[str, Any]) – Additional key-word arguments provided to the hmmlearn HMM constructor.

Return type:


aic(X, lengths=None)[source]

The Akaike information criterion of the model, evaluated with the maximum likelihood of X.

  • X (Array) –

    Univariate observation sequence(s).

    • Should be a single 1D array.

    • Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

  • lengths (Array | None) –

    Lengths of the observation sequence(s) provided in X.

    • If None, then X is assumed to be a single observation sequence.

    • len(X) should be equal to sum(lengths).


This method requires a trained model — see fit().


The Akaike information criterion.

Return type:


bic(X, lengths=None)[source]

The Bayesian information criterion of the model, evaluated with the maximum likelihood of X.

  • X (Array) –

    Univariate observation sequence(s).

    • Should be a single 1D array.

    • Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

  • lengths (Array | None) –

    Lengths of the observation sequence(s) provided in X.

    • If None, then X is assumed to be a single observation sequence.

    • len(X) should be equal to sum(lengths).


This method requires a trained model — see fit().


The Bayesian information criterion.

Return type:


fit(X, lengths=None)[source]

Fits the HMM to the sequences in X, using the Baum—Welch algorithm.

  • X (Array) –

    Univariate observation sequence(s).

    • Should be a single 1D array.

    • Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

  • lengths (Array | None) –

    Lengths of the observation sequence(s) provided in X.

    • If None, then X is assumed to be a single observation sequence.

    • len(X) should be equal to sum(lengths).


The fitted HMM.

Return type:



Freezes the trainable parameters of the HMM, preventing them from being updated during the Baum—Welch algorithm.


params (str) –

A string specifying which parameters to freeze. Can contain a combination of:

  • 's' for initial state probabilities,

  • 't' for transition probabilities,

  • 'e' for emission probailities.


If used, this method should normally be called before fit().

See also


Unfreezes the trainable parameters of the HMM, allowing them to be updated during the Baum—Welch algorithm.


Retrieves the number of trainable parameters.


This method requires a trained model — see fit().


Number of trainable parameters.

Return type:



Calculates the log-likelihood of the HMM generating a single observation sequence.


x (Array) –

Univariate observation sequence.

  • Should be a single 1D array.


This method requires a trained model — see fit().


The log-likelihood.

Return type:



Sets the initial state probabilities.

If this method is not called, initial state probabilities are initialized depending on the value of topology provided to __init__().

  • If topology was set to 'ergodic', 'left-right' or 'linear', then random probabilities will be assigned according to the topology by calling set_start_probs() with value='random'.

  • If topology was set to None, then initial state probabilities will be initialized by hmmlearn.


values (Array | Literal['uniform', 'random']) –

Probabilities or probability type to assign as initial state probabilities.

  • If an Array, should be a vector of starting probabilities for each state.

  • If 'uniform', there is an equal probability of starting in any state.

  • If 'random', the vector of initial state probabilities is sampled from a Dirichlet distribution with unit concentration parameters.


If used, this method should normally be called before fit().


Sets the state emission distribution of the HMM’s emission model.

If this method is not called, emission probabilities will be initialized by hmmlearn.


values (Array) – Array of emission probabilities.


If used, this method should normally be called before fit().


Sets the transition probability matrix.

If this method is not called, transition probabilities are initialized depending on the value of topology provided to __init__():

  • If topology was set to 'ergodic', 'left-right' or 'linear', then random probabilities will be assigned according to the topology by calling set_transitions() with value='random'.

  • If topology was set to None, then initial state probabilities will be initialized by hmmlearn.


values (Array | Literal['uniform', 'random']) –

Probabilities or probability type to assign as state transition probabilities.

  • If an Array, should be a matrix of probabilities where each row must some to one and represents the probabilities of transitioning out of a state.

  • If 'uniform', for each state there is an equal probability of transitioning to any state permitted by the topology.

  • If 'random', the vector of transition probabilities for each row is sampled from a Dirichlet distribution with unit concentration parameters, according to the shape of the topology.


If used, this method should normally be called before fit().


Unfreezes the trainable parameters of the HMM, allowing them to be updated during the Baum—Welch algorithm.


params (str) –

A string specifying which parameters to unfreeze. Can contain a combination of:

  • 's' for initial state probabilities,

  • 't' for transition probabilities,

  • 'e' for emission probailities.

See also


Freezes the trainable parameters of the HMM, preventing them from being updated during the Baum—Welch algorithm.


Additional key-word arguments provided to the hmmlearn HMM constructor.


Underlying HMM object from hmmlearn — only set after fit().


Number of states in the Markov chain.


Seed or numpy.random.RandomState object for reproducible pseudo-randomness.


Transition topology of the Markov chain — see Topologies.