Hyper-parameter search methods¶
In order to optimize the hyper-parameters for a specific model, hyper-parameter search methods are used (often in conjunction with cross-validation methods) to evaluate the performance of a model with different configurations and find the optimal settings.
sklearn.model_selection provides such hyper-parameter search methods,
but does not support sequence data. Sequentia provides modified
versions of these methods to support sequence data.
API reference¶
Classes/Methods¶
|
Generates a hyper-parameter grid for a nested object. |
Exhaustive search over specified parameter values for an estimator. |
|
Randomized search on hyper parameters. |
|
Search over specified parameter values with successive halving. |
|
Randomized search on hyper parameters with successive halving. |
Example¶
Using GridSearchCV with StratifiedKFold to
cross-validate a KNNClassifier training pipeline.
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import minmax_scale
from sequentia.datasets import load_digits
from sequentia.models import KNNClassifier
from sequentia.preprocessing import IndependentFunctionTransformer
from sequentia.model_selection import StratifiedKFold, GridSearchCV
EPS: np.float32 = np.finfo(np.float32).eps
# Define model and hyper-parameter search space
search = GridSearchCV(
# Create a basic pipeline with a KNNClassifier to be optimized
estimator=Pipeline(
[
("scale", IndependentFunctionTransformer(minmax_scale)),
("clf", KNNClassifier(use_c=True, n_jobs=-1))
]
),
# Optimize over k, weighting function and window size
param_grid={
"clf__k": [1, 2, 3, 4, 5],
"clf__weighting": [
None, lambda x: 1 / (x + EPS), lambda x: np.exp(-x)
],
"clf__window": [1.0, 0.75, 0.5, 0.25, 0.1],
},
# Use StratifiedKFold cross-validation
cv=StratifiedKFold(),
n_jobs=-1,
)
# Load the spoken digit dataset with a train/test set split
data = load_digits()
train_data, test_data = data.split(test_size=0.2, stratify=True)
# Perform cross-validation over accuracy and retrieve the best model
search.fit(train_data.X, train_data.y, lengths=train_data.lengths)
clf = search.best_estimator_
# Calculate accuracy on the test set split
acc = clf.score(test_data.X, test_data.y, lengths=test_data.lengths)
Definitions¶
- sequentia.model_selection.param_grid(**kwargs)¶
Generates a hyper-parameter grid for a nested object.
Examples
Using
param_grid()in a grid search to cross-validate over settings forGaussianMixtureHMM, which is a nested model specified in the constructor of aHMMClassifier.from sklearn.pipeline import Pipeline from sklearn.preprocessing import minmax_scale from sequentia.enums import PriorMode, CovarianceMode, TopologyMode from sequentia.models import HMMClassifier, GaussianMixtureHMM from sequentia.preprocessing import IndependentFunctionTransformer from sequentia.model_selection import GridSearchCV, StratifiedKFold GridSearchCV( estimator=Pipeline( [ ("scale", IndependentFunctionTransformer(minmax_scale)), ("clf", HMMClassifier(variant=GaussianMixtureHMM)), ] ), param_grid={ "clf__prior": [PriorMode.UNIFORM, PriorMode.FREQUENCY], "clf__model_kwargs": param_grid( n_states=[3, 5, 7], n_components=[2, 3, 4], covariance=[ CovarianceMode.DIAGONAL, CovarianceMode.SPHERICAL ], topology=[ TopologyMode.LEFT_RIGHT, TopologyMode.LINEAR ], ) }, cv=StratifiedKFold(), )
- class sequentia.model_selection.GridSearchCV¶
Exhaustive search over specified parameter values for an estimator.
cvmust be a valid splitting method fromsequentia.model_selection.See also
sklearn.model_selection.GridSearchCVGridSearchCVis a modified version of this class that supports sequences.
- __init__(estimator, param_grid, *, scoring=None, n_jobs=None, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)¶
- class sequentia.model_selection.RandomizedSearchCV¶
Randomized search on hyper parameters.
cvmust be a valid splitting method fromsequentia.model_selection.See also
sklearn.model_selection.RandomizedSearchCVRandomizedSearchCVis a modified version of this class that supports sequences.
- __init__(estimator, param_distributions, *, n_iter=10, scoring=None, n_jobs=None, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', random_state=None, error_score=nan, return_train_score=False)¶
- class sequentia.model_selection.HalvingGridSearchCV¶
Search over specified parameter values with successive halving.
cvmust be a valid splitting method fromsequentia.model_selection.See also
sklearn.model_selection.HalvingGridSearchCVHalvingGridSearchCVis a modified version of this class that supports sequences.
- __init__(estimator, param_grid, *, factor=3, resource='n_samples', max_resources='auto', min_resources='exhaust', aggressive_elimination=False, cv=5, scoring=None, refit=True, error_score=nan, return_train_score=True, random_state=None, n_jobs=None, verbose=0)¶
- class sequentia.model_selection.HalvingRandomSearchCV¶
Randomized search on hyper parameters with successive halving.
cvmust be a valid splitting method fromsequentia.model_selection.See also
sklearn.model_selection.HalvingRandomSearchCVHalvingRandomSearchCVis a modified version of this class that supports sequences.
- __init__(estimator, param_distributions, *, n_candidates='exhaust', factor=3, resource='n_samples', max_resources='auto', min_resources='smallest', aggressive_elimination=False, cv=5, scoring=None, refit=True, error_score=nan, return_train_score=True, random_state=None, n_jobs=None, verbose=0)¶