Hyper-parameter search methods

In order to optimize the hyper-parameters for a specific model, hyper-parameter search methods are used (often in conjunction with cross-validation methods) to evaluate the performance of a model with different configurations and find the optimal settings.

sklearn.model_selection provides such hyper-parameter search methods, but does not support sequence data. Sequentia provides modified versions of these methods to support sequence data.

API reference

Classes/Methods

param_grid(**kwargs)

Generates a hyper-parameter grid for a nested object.

GridSearchCV

Exhaustive search over specified parameter values for an estimator.

RandomizedSearchCV

Randomized search on hyper parameters.

HalvingGridSearchCV

Search over specified parameter values with successive halving.

HalvingRandomSearchCV

Randomized search on hyper parameters with successive halving.

Example

Using GridSearchCV with StratifiedKFold to cross-validate a KNNClassifier training pipeline.

import numpy as np

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import minmax_scale

from sequentia.datasets import load_digits
from sequentia.models import KNNClassifier
from sequentia.preprocessing import IndependentFunctionTransformer
from sequentia.model_selection import StratifiedKFold, GridSearchCV

EPS: np.float32 = np.finfo(np.float32).eps

# Define model and hyper-parameter search space
search = GridSearchCV(
    # Create a basic pipeline with a KNNClassifier to be optimized
    estimator=Pipeline(
        [
            ("scale", IndependentFunctionTransformer(minmax_scale)),
            ("clf", KNNClassifier(use_c=True, n_jobs=-1))
        ]
    ),
    # Optimize over k, weighting function and window size
    param_grid={
        "clf__k": [1, 2, 3, 4, 5],
        "clf__weighting": [
            None, lambda x: 1 / (x + EPS), lambda x: np.exp(-x)
        ],
        "clf__window": [1.0, 0.75, 0.5, 0.25, 0.1],
    },
    # Use StratifiedKFold cross-validation
    cv=StratifiedKFold(),
    n_jobs=-1,
)

# Load the spoken digit dataset with a train/test set split
data = load_digits()
train_data, test_data = data.split(test_size=0.2, stratify=True)

# Perform cross-validation over accuracy and retrieve the best model
search.fit(train_data.X, train_data.y, lengths=train_data.lengths)
clf = search.best_estimator_

# Calculate accuracy on the test set split
acc = clf.score(test_data.X, test_data.y, lengths=test_data.lengths)

Definitions

sequentia.model_selection.param_grid(**kwargs)

Generates a hyper-parameter grid for a nested object.

Examples

Using param_grid() in a grid search to cross-validate over settings for GaussianMixtureHMM, which is a nested model specified in the constructor of a HMMClassifier.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import minmax_scale

from sequentia.enums import PriorMode, CovarianceMode, TopologyMode
from sequentia.models import HMMClassifier, GaussianMixtureHMM
from sequentia.preprocessing import IndependentFunctionTransformer
from sequentia.model_selection import GridSearchCV, StratifiedKFold

GridSearchCV(
    estimator=Pipeline(
        [
            ("scale", IndependentFunctionTransformer(minmax_scale)),
            ("clf", HMMClassifier(variant=GaussianMixtureHMM)),
        ]
    ),
    param_grid={
        "clf__prior": [PriorMode.UNIFORM, PriorMode.FREQUENCY],
        "clf__model_kwargs": param_grid(
            n_states=[3, 5, 7],
            n_components=[2, 3, 4],
            covariance=[
                CovarianceMode.DIAGONAL, CovarianceMode.SPHERICAL
            ],
            topology=[
                TopologyMode.LEFT_RIGHT, TopologyMode.LINEAR
            ],
        )
    },
    cv=StratifiedKFold(),
)
Parameters:

**kwargs (list[Any]) – Hyper-parameter name and corresponding values.

Return type:

Hyper-parameter grid for a nested object.

class sequentia.model_selection.GridSearchCV

Exhaustive search over specified parameter values for an estimator.

cv must be a valid splitting method from sequentia.model_selection.

See also

sklearn.model_selection.GridSearchCV

GridSearchCV is a modified version of this class that supports sequences.

__init__(estimator, param_grid, *, scoring=None, n_jobs=None, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)
class sequentia.model_selection.RandomizedSearchCV

Randomized search on hyper parameters.

cv must be a valid splitting method from sequentia.model_selection.

See also

sklearn.model_selection.RandomizedSearchCV

RandomizedSearchCV is a modified version of this class that supports sequences.

__init__(estimator, param_distributions, *, n_iter=10, scoring=None, n_jobs=None, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', random_state=None, error_score=nan, return_train_score=False)
class sequentia.model_selection.HalvingGridSearchCV

Search over specified parameter values with successive halving.

cv must be a valid splitting method from sequentia.model_selection.

See also

sklearn.model_selection.HalvingGridSearchCV

HalvingGridSearchCV is a modified version of this class that supports sequences.

__init__(estimator, param_grid, *, factor=3, resource='n_samples', max_resources='auto', min_resources='exhaust', aggressive_elimination=False, cv=5, scoring=None, refit=True, error_score=nan, return_train_score=True, random_state=None, n_jobs=None, verbose=0)
class sequentia.model_selection.HalvingRandomSearchCV

Randomized search on hyper parameters with successive halving.

cv must be a valid splitting method from sequentia.model_selection.

See also

sklearn.model_selection.HalvingRandomSearchCV

HalvingRandomSearchCV is a modified version of this class that supports sequences.

__init__(estimator, param_distributions, *, n_candidates='exhaust', factor=3, resource='n_samples', max_resources='auto', min_resources='smallest', aggressive_elimination=False, cv=5, scoring=None, refit=True, error_score=nan, return_train_score=True, random_state=None, n_jobs=None, verbose=0)