KNN Classifier¶

The KNN Classifier is a classifier that uses the \(k\)-NN algorithm with DTW as a distance measure to identify a \(k\)-neighborhood of the most similar training sequences to the sequence being classified.

To classify a sequence \(O'\), the KNNClassifier works by:

Calculating the DTW distance between \(O'\) and every training sequence.
Forming a k-neighborhood \(\mathcal{K}'=\left\{O^{(1)},\ldots,O^{(k)}\right\}\) of the \(k\) nearest training sequences to \(O'\).
Calculating a distance weighting for each sequence in \(\mathcal{K}'\).

A uniform weighting of 1 is used by default, meaning that all sequences in \(\mathcal{K}'\) have equal influence on the predicted class. However, custom functions such as \(e^{-x}\) (where \(x\) is the DTW distance) can be specified to increase classification weight on training sequences that are more similar to \(O'\).
Calculating a score for each of the unique classes corresponding to the sequences in \(\mathcal{K}'\).

The score for each class is calculated as the sum of the distance weightings of all sequences in \(\mathcal{K}'\) belonging to that class.
Selecting the class with the highest score.

If there is a tie between classes, a class is randomly selected between the tied classes.

API reference¶

Class¶

KNNClassifier

A k-nearest neighbor classifier that uses DTW as a distance measure for sequence comparison.

Methods¶

`__init__`(*[, k, weighting, window, ...])	Initializes the `KNNClassifier`.
`compute_distance_matrix`(X[, lengths])	Calculates a matrix of DTW distances between the sequences in `X` and the training sequences.
`dtw`(A, B, **kwargs)	Calculates the DTW distance between two univariate or multivariate sequences.
`fit`(X, y[, lengths])	Fits the classifier to the sequence(s) in `X`.
`fit_predict`(X, y[, lengths])	Fits the classifier to the sequence(s) in `X` and predicts classes for `X`.
`load`(path)	Loads and deserializes a fitted KNN estimator.
`plot_dtw_histogram`(X[, lengths, ax])	Calculates DTW distances between `X` and training sequences, and plots a distance histogram.
`plot_warping_path_1d`(a, b, **kwargs)	Calculates the DTW matrix between two sequences and plots the optimal warping path.
`plot_weight_histogram`(X[, lengths, ax])	Calculates DTW weights between `X` and training sequences, and plots a weight histogram.
`predict`(X[, lengths])	Predicts classes for the sequence(s) in `X`.
`predict_proba`(X[, lengths])	Predicts class probabilities for the sequence(s) in `X`.
`predict_scores`(X[, lengths])	Predicts class scores for the sequence(s) in `X`.
`query_neighbors`(X[, lengths, sort])	Queries the k-nearest training observation sequences to each sequence in `X`.
`save`(path)	Serializes and saves a fitted KNN estimator.
`score`(X, y, lengths[, normalize, sample_weight])	Calculates accuracy for the sequence(s) in `X`.

class sequentia.models.knn.classifier.KNNClassifier[source]¶

A k-nearest neighbor classifier that uses DTW as a distance measure for sequence comparison.

The classifier computes the score for each class as the total of the distance weightings of every sequence belonging to that class, within the DTW k-neighborhood of the sequence being classified.

Examples

Using a KNNClassifier to classify spoken digits.

import numpy as np
from sequentia.datasets import load_digits
from sequentia.models.knn import KNNClassifier

# Seed for reproducible pseudo-randomness
random_state = np.random.RandomState(1)

# Fetch MFCCs of spoken digits
data = load_digits()
train_data, test_data = data.split(test_size=0.2, random_state=random_state)

# Create a HMMClassifier using a class frequency prior
clf = KNNClassifier()

# Fit the classifier
X_train, y_train, lengths_train = train_data.X_y_lengths
clf.fit(X_train, y_train, lengths_train)

# Predict classes for the test observation sequences
X_test, lengths_test = test_data.X_lengths
y_pred = clf.predict(X_test, lengths_test)

__init__(*, k=1, weighting=None, window=1, independent=False, classes=None, use_c=False, n_jobs=1, random_state=None)[source]¶

Initializes the KNNClassifier.

Parameters:

k (PositiveInt) – Number of neighbors.
weighting (Callable | None) – A callable that specifies how distance weighting should be performed. The callable should accept a numpy.ndarray of DTW distances, apply an element-wise weighting transformation to the matrix of DTW distances, then return an equally-sized numpy.ndarray of weightings. If None, then a uniform weighting of 1 will be applied to all distances.
window (ConstrainedFloatValue) –
The size of the Sakoe—Chiba band global constrant as a fraction of the length of the shortest of the two sequences being compared.
- A larger window will give more freedom to the DTW alignment, allowing more deviation but leading to potentially slower computation. A window of 1 is equivalent to full DTW computation with no global constraint applied.
- A smaller window will restrict the DTW alignment, and possibly speed up the DTW computation. A window of 0 is equivalent to Euclidean distance.
independent (bool) – Whether or not to allow features to be warped independently from each other. See [1] for an overview of independent and dependent dynamic time warping.
classes (Array | None) –
Set of possible class labels.
- If not provided, these will be determined from the training data labels.
- If provided, output from methods such as predict_proba() and predict_scores() will follow the ordering of the class labels provided here.
use_c (bool) – Whether or not to use fast pure C compiled functions from dtaidistance to perform the DTW computations.
n_jobs (NegativeInt | PositiveInt) –
Maximum number of concurrently running workers.
- If 1, no parallelism is used at all (useful for debugging).
- If -1, all CPUs are used.
- If < -1, (n_cpus + 1 + n_jobs) are used — e.g. n_jobs=-2 uses all but one.
random_state (NonNegativeInt | RandomState | None) – Seed or numpy.random.RandomState object for reproducible pseudo-randomness.

Return type:

KNNClassifier

compute_distance_matrix(X, lengths=None, **kwargs)¶

Calculates a matrix of DTW distances between the sequences in X and the training sequences.

Parameters:

X (Array) –
Univariate or multivariate observation sequence(s).
- Should be a single 1D or 2D array.
- Should have length as the 1st dimension and features as the 2nd dimension.
- Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in X.
- If None, then X is assumed to be a single observation sequence.
- len(X) should be equal to sum(lengths).
**kwargs –
Model parameters to temporarily override (for experimentation purposes).
- window: See __init__().
- independent: See __init__().

Note:

This method requires a trained classifier — see fit().

Returns:

DTW distance matrix.

Return type:

Array

dtw(A, B, **kwargs)¶

Calculates the DTW distance between two univariate or multivariate sequences.

Parameters:

A (Array) – The first sequence.
B (Array) – The second sequence.
**kwargs –
Model parameters to temporarily override (for experimentation purposes).
- window: See __init__().
- independent: See __init__().

Returns:

DTW distance.

Return type:

float

fit(X, y, lengths=None)[source]¶

Fits the classifier to the sequence(s) in X.

Parameters:

X (Array) –
Univariate or multivariate observation sequence(s).
- Should be a single 1D or 2D array.
- Should have length as the 1st dimension and features as the 2nd dimension.
- Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.
y (Array) – Classes corresponding to sequence(s) provided in X.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in X.
- If None, then X is assumed to be a single observation sequence.
- len(X) should be equal to sum(lengths).

Returns:

The fitted classifier.

Return type:

KNNClassifier

fit_predict(X, y, lengths=None)[source]¶

Fits the classifier to the sequence(s) in X and predicts classes for X.

Parameters:

X (Array) –
Univariate or multivariate observation sequence(s).
- Should be a single 1D or 2D array.
- Should have length as the 1st dimension and features as the 2nd dimension.
- Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.
y (Array) – Classes corresponding to sequence(s) provided in X.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in X.
- If None, then X is assumed to be a single observation sequence.
- len(X) should be equal to sum(lengths).

Returns:

Class predictions.

Return type:

Array

classmethod load(path)¶

Loads and deserializes a fitted KNN estimator.

Parameters:: path (str | Path | IO) – Location to load the serialized estimator from.
Returns:: Fitted KNN estimator.

KNN Classifier¶

API reference¶

Class¶

Methods¶

Table of Contents

Previous topic

Next topic

This Page