# Dynamic Time Warping k-Nearest Neighbors Classifier (DTWKNN)¶

Recall that the isolated sequences we are dealing with are represented as multivariate time series of different durations.
Suppose that our sequences are all $$D$$-dimensional. The main requirement of k-Nearest Neighbor ($$k$$-NN) classifiers is that each example must have the same number of dimensions – and hence, be in the same feature space. This is indeed the case with our $$D$$-dimensional sequences. However, we can’t use $$k$$-NN with simple distance metrics such as Euclidean distance because we are comparing sequences (which represent an ordered collection of points in $$D$$-dimensional space) rather than individual points in $$D$$-dimensional space.

One distance metric that allows us to compare multivariate sequences of different length is Dynamic Time Warping. Coupling this metric with $$k$$-NN creates a powerful classifier that assigns the class of a new observation sequence by looking at the classes of observation sequences with similar patterns.

However, $$k$$-NN classifiers suffer from the fact that they are non-parametric, which means that when predicting the class for a new observation sequence, we must look back at every observation sequence that was used to fit the model. To speed up prediction times, we have chosen to use a constrained DTW algorithm that sacrifices accuracy by calculating an approximate distance, but saves a lot of time. This is the FastDTW implementation, which has a radius parameter for controlling the imposed constraint on the distance calculation.

This approximate DTW $$k$$-NN classifier is implemented by the DTWKNN class.

## Example¶

  1 2 3 4 5 6 7 8 9 10 11 12 13 import numpy as np from sequentia.classifiers import DTWKNN # Create some sample data X = [np.random.random((10 * i, 3)) for i in range(1, 4)] y = ['class0', 'class1', 'class1'] # Create and fit the classifier clf = DTWKNN(k=1, radius=5) clf.fit(X, y) # Predict labels for the training data (just as an example) clf.predict(X) 

For more elaborate examples, please have a look at the example notebooks.

## API reference¶

class sequentia.classifiers.dtwknn.DTWKNN(k, radius, metric=<function euclidean>)[source]

A k-Nearest Neighbor classifier that compares differing length observation sequences using the efficient FastDTW dynamic time warping algorithm.

Parameters: k: int Number of neighbors. radius: int Radius parameter for FastDTW. metric: callable Distance metric for FastDTW.
fit(self, X, y)[source]

Fits the classifier by adding labeled training observation sequences.

Parameters: X: List[numpy.ndarray] A list of multiple observation sequences. y: List[str] A list of labels for the observation sequences.
predict(self, X, verbose=True, n_jobs=1)[source]

Predicts the label for an observation sequence (or multiple sequences).

Parameters: X: numpy.ndarray or List[numpy.ndarray] An individual observation sequence or a list of multiple observation sequences. verbose: bool Whether to display a progress bar or not. n_jobs: int The number of jobs to run in parallel. Setting this to -1 will use all available CPU cores. prediction(s): str or List[str] The predicted label(s) for the observation sequence(s).
evaluate(self, X, y, labels=None, verbose=True, n_jobs=1)[source]

Evaluates the performance of the classifier on a batch of observation sequences and their labels.

Parameters: X: List[numpy.ndarray] A list of multiple observation sequences. y: List[str] A list of labels for the observation sequences. labels: List[str] A list of labels for ordering the axes of the confusion matrix. verbose: bool Whether to display a progress bar for predictions or not. n_jobs: int The number of jobs to run in parallel. Setting this to -1 will use all available CPU cores. accuracy: float The categorical accuracy of the classifier on the observation sequences. confusion: numpy.ndarray The confusion matrix representing the discrepancy between predicted and actual labels.