KNN Classifier¶
The KNN Classifier is a classifier that uses the \(k\)NN algorithm with DTW as a distance measure to identify a \(k\)neighborhood of the most similar training sequences to the sequence being classified.
To classify a sequence \(O'\), the KNNClassifier
works by:
 Calculating the DTW distance between \(O'\) and every training sequence.
 Forming a kneighborhood \(\mathcal{K}'=\left\{O^{(1)},\ldots,O^{(k)}\right\}\) of the \(k\) nearest training sequences to \(O'\).
 Calculating a distance weighting for each sequence in \(\mathcal{K}'\).A uniform weighting of 1 is used by default, meaning that all sequences in \(\mathcal{K}'\) have equal influence on the predicted class. However, custom functions such as \(e^{x}\) (where \(x\) is the DTW distance) can be specified to increase classification weight on training sequences that are more similar to \(O'\).
 Calculating a score for each of the unique classes corresponding to the sequences in \(\mathcal{K}'\).The score for each class is calculated as the sum of the distance weightings of all sequences in \(\mathcal{K}'\) belonging to that class.
 Selecting the class with the highest score.If there is a tie between classes, a class is randomly selected between the tied classes.
API reference¶
Class¶
A knearest neighbor classifier that uses DTW as a distance measure for sequence comparison. 
Methods¶

Initializes the 

Calculates a matrix of DTW distances between the sequences in 

Calculates the DTW distance between two univariate or multivariate sequences. 

Fits the classifier to the sequence(s) in 

Fits the classifier to the sequence(s) in 

Loads and deserializes a fitted KNN estimator. 

Calculates DTW distances between 

Calculates the DTW matrix between two sequences and plots the optimal warping path. 

Calculates DTW weights between 

Predicts classes for the sequence(s) in 

Predicts class probabilities for the sequence(s) in 

Predicts class scores for the sequence(s) in 

Queries the knearest training observation sequences to each sequence in 

Serializes and saves a fitted KNN estimator. 

Calculates accuracy for the sequence(s) in 
 class sequentia.models.knn.classifier.KNNClassifier[source]¶
A knearest neighbor classifier that uses DTW as a distance measure for sequence comparison.
The classifier computes the score for each class as the total of the distance weightings of every sequence belonging to that class, within the DTW kneighborhood of the sequence being classified.
Examples
Using a
KNNClassifier
to classify spoken digits.import numpy as np from sequentia.datasets import load_digits from sequentia.models.knn import KNNClassifier # Seed for reproducible pseudorandomness random_state = np.random.RandomState(1) # Fetch MFCCs of spoken digits data = load_digits() train_data, test_data = data.split(test_size=0.2, random_state=random_state) # Create a HMMClassifier using a class frequency prior clf = KNNClassifier() # Fit the classifier X_train, y_train, lengths_train = train_data.X_y_lengths clf.fit(X_train, y_train, lengths_train) # Predict classes for the test observation sequences X_test, lengths_test = test_data.X_lengths y_pred = clf.predict(X_test, lengths_test)
 __init__(*, k=1, weighting=None, window=1, independent=False, classes=None, use_c=False, n_jobs=1, random_state=None)[source]¶
Initializes the
KNNClassifier
. Parameters:
k (PositiveInt) – Number of neighbors.
weighting (Callable  None) – A callable that specifies how distance weighting should be performed. The callable should accept a
numpy.ndarray
of DTW distances, apply an elementwise weighting transformation to the matrix of DTW distances, then return an equallysizednumpy.ndarray
of weightings. IfNone
, then a uniform weighting of 1 will be applied to all distances.window (ConstrainedFloatValue) –
The size of the Sakoe—Chiba band global constrant as a fraction of the length of the shortest of the two sequences being compared.
A larger window will give more freedom to the DTW alignment, allowing more deviation but leading to potentially slower computation. A window of 1 is equivalent to full DTW computation with no global constraint applied.
A smaller window will restrict the DTW alignment, and possibly speed up the DTW computation. A window of 0 is equivalent to Euclidean distance.
independent (bool) – Whether or not to allow features to be warped independently from each other. See [1] for an overview of independent and dependent dynamic time warping.
classes (Array  None) –
Set of possible class labels.
If not provided, these will be determined from the training data labels.
If provided, output from methods such as
predict_proba()
andpredict_scores()
will follow the ordering of the class labels provided here.
use_c (bool) – Whether or not to use fast pure C compiled functions from dtaidistance to perform the DTW computations.
n_jobs (NegativeInt  PositiveInt) –
Maximum number of concurrently running workers.
If 1, no parallelism is used at all (useful for debugging).
If 1, all CPUs are used.
If < 1,
(n_cpus + 1 + n_jobs)
are used — e.g.n_jobs=2
uses all but one.
random_state (NonNegativeInt  RandomState  None) – Seed or
numpy.random.RandomState
object for reproducible pseudorandomness.
 Return type:
 compute_distance_matrix(X, lengths=None, **kwargs)¶
Calculates a matrix of DTW distances between the sequences in
X
and the training sequences. Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
**kwargs –
Model parameters to temporarily override (for experimentation purposes).
window
: See__init__()
.independent
: See__init__()
.
 Note:
This method requires a trained classifier — see
fit()
. Returns:
DTW distance matrix.
 Return type:
Array
 dtw(A, B, **kwargs)¶
Calculates the DTW distance between two univariate or multivariate sequences.
 Parameters:
A (Array) – The first sequence.
B (Array) – The second sequence.
**kwargs –
Model parameters to temporarily override (for experimentation purposes).
window
: See__init__()
.independent
: See__init__()
.
 Returns:
DTW distance.
 Return type:
float
 fit(X, y, lengths=None)[source]¶
Fits the classifier to the sequence(s) in
X
. Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
y (Array) – Classes corresponding to sequence(s) provided in
X
.lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
 Returns:
The fitted classifier.
 Return type:
 fit_predict(X, y, lengths=None)[source]¶
Fits the classifier to the sequence(s) in
X
and predicts classes forX
. Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
y (Array) – Classes corresponding to sequence(s) provided in
X
.lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
 Returns:
Class predictions.
 Return type:
Array
 classmethod load(path)¶
Loads and deserializes a fitted KNN estimator.
 Parameters:
path (str  Path  IO) – Location to load the serialized estimator from.
 Returns:
Fitted KNN estimator.
See also
save
Serializes and saves a fitted KNN estimator.
 plot_dtw_histogram(X, lengths=None, ax=None, **kwargs)¶
Calculates DTW distances between
X
and training sequences, and plots a distance histogram. Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
ax (matplotlib.axes.Axes  None) – Plot axes. If
None
, new axes are created.**kwargs –
Model parameters to temporarily override (for experimentation purposes).
window
: See__init__()
.independent
: See__init__()
.
 Note:
This method requires a trained classifier — see
fit()
. Returns:
Plot axes.
 Return type:
matplotlib.axes.Axes
 plot_warping_path_1d(a, b, **kwargs)¶
Calculates the DTW matrix between two sequences and plots the optimal warping path.
 Parameters:
a (Array) – The first sequence.
b (Array) – The second sequence.
**kwargs –
Model parameters to temporarily override (for experimentation purposes).
window
: See__init__()
.
 Note:
Only supports univariate sequences.
 Returns:
Plot axes.
 Return type:
matplotlib.axes.Axes
 plot_weight_histogram(X, lengths=None, ax=None, **kwargs)¶
Calculates DTW weights between
X
and training sequences, and plots a weight histogram. Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
ax (matplotlib.axes.Axes  None) – Plot axes. If
None
, new axes are created.**kwargs –
Model parameters to temporarily override (for experimentation purposes).
weighting
: See__init__()
.window
: See__init__()
.independent
: See__init__()
.
 Note:
This method requires a trained classifier — see
fit()
. Returns:
Plot axes.
 Return type:
matplotlib.axes.Axes
 predict(X, lengths=None)[source]¶
Predicts classes for the sequence(s) in
X
. Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
 Note:
This method requires a trained classifier — see
fit()
. Returns:
Class predictions.
 Return type:
Array
 predict_proba(X, lengths=None)[source]¶
Predicts class probabilities for the sequence(s) in
X
.Probabilities are calculated as normalized class scores.
 Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
 Note:
This method requires a trained classifier — see
fit()
. Returns:
Class membership probabilities.
 Return type:
Array
 predict_scores(X, lengths=None)[source]¶
Predicts class scores for the sequence(s) in
X
.Scores are calculated as the class distance weighting sums of all training sequences in the kneighborhood.
 Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
 Note:
This method requires a trained classifier — see
fit()
. Returns:
Class scores.
 Return type:
Array
 query_neighbors(X, lengths=None, sort=True, **kwargs)¶
Queries the knearest training observation sequences to each sequence in
X
. Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
sort (bool) – Whether to sort the neighbors in order of nearest to furthest.
**kwargs –
Model parameters to temporarily override (for experimentation purposes).
k
: See__init__()
.window
: See__init__()
.independent
: See__init__()
.
 Returns:
Knearest neighbors for each sequence in
X
.Indices of the knearest training sequences.
DTW distances of the knearest training sequences.
Corresponding outputs of the knearest training sequences.
 Return type:
Tuple[Array, Array, Array]
 save(path)¶
Serializes and saves a fitted KNN estimator.
 Parameters:
path (str  Path  IO) – Location to save the serialized estimator.
 Note:
This method requires a trained classifier — see
fit()
.
See also
load
Loads and deserializes a fitted KNN estimator.
 score(X, y, lengths, normalize=True, sample_weight=None)[source]¶
Calculates accuracy for the sequence(s) in
X
. Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
y (Array) – Classes corresponding to the observation sequence(s) in
X
.lengths (Array  None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
normalize (bool) – See
sklearn.metrics.accuracy_score()
.sample_weight (Array  None) – See
sklearn.metrics.accuracy_score()
.
 Note:
This method requires a trained classifier — see
fit()
. Returns:
Classification accuracy.
 Return type:
float
 classes¶
Set of possible class labels.
 independent¶
Whether or not to allow features to be warped independently from each other.
 k¶
Number of neighbors.
 n_jobs¶
Maximum number of concurrently running workers.
 random_state¶
Seed or
numpy.random.RandomState
object for reproducible pseudorandomness.
 use_c¶
Whether or not to use fast pure C compiled functions from dtaidistance to perform the DTW computations.
 weighting¶
A callable that specifies how distance weighting should be performed.
 window¶
The size of the Sakoe—Chiba band global constrant as a fraction of the length of the shortest of the two sequences being compared.
References