Dynamic Time Warping kNearest Neighbors Classifier (DTWKNN
)¶
One distance metric that allows us to compare multivariate sequences of different length is Dynamic Time Warping. Coupling this metric with \(k\)NN creates a powerful classifier that assigns the class of a new observation sequence by looking at the classes of observation sequences with similar patterns.
However, \(k\)NN classifiers suffer from the fact that they are nonparametric, which means that when predicting the class for a new observation sequence, we must look back at every observation sequence that was used to fit the model. To speed up prediction times, we have chosen to use a constrained DTW algorithm that sacrifices accuracy by calculating an approximate distance, but saves a lot of time. This is the FastDTW implementation, which has a radius parameter for controlling the imposed constraint on the distance calculation.
This approximate DTW \(k\)NN classifier is implemented by the DTWKNN
class.
Example¶
1 2 3 4 5 6 7 8 9 10 11 12 13  import numpy as np
from sequentia.classifiers import DTWKNN
# Create some sample data
X = [np.random.random((10 * i, 3)) for i in range(1, 4)]
y = ['class0', 'class1', 'class1']
# Create and fit the classifier
clf = DTWKNN(k=1, radius=5)
clf.fit(X, y)
# Predict labels for the training data (just as an example)
clf.predict(X)

For more elaborate examples, please have a look at the example notebooks.
API reference¶

class
sequentia.classifiers.dtwknn.
DTWKNN
(k, radius, metric=<function euclidean>)[source]¶ A kNearest Neighbor classifier that compares differing length observation sequences using the efficient FastDTW dynamic time warping algorithm.
Parameters:  k: int
Number of neighbors.
 radius: int
Radius parameter for FastDTW.
 metric: callable
Distance metric for FastDTW.

fit
(self, X, y)[source]¶ Fits the classifier by adding labeled training observation sequences.
Parameters:  X: List[numpy.ndarray]
A list of multiple observation sequences.
 y: List[str]
A list of labels for the observation sequences.

predict
(self, X, verbose=True, n_jobs=1)[source]¶ Predicts the label for an observation sequence (or multiple sequences).
Parameters:  X: numpy.ndarray or List[numpy.ndarray]
An individual observation sequence or a list of multiple observation sequences.
 verbose: bool
Whether to display a progress bar or not.
 n_jobs: int
 The number of jobs to run in parallel.Setting this to 1 will use all available CPU cores.
Returns:  prediction(s): str or List[str]
The predicted label(s) for the observation sequence(s).

evaluate
(self, X, y, labels=None, verbose=True, n_jobs=1)[source]¶ Evaluates the performance of the classifier on a batch of observation sequences and their labels.
Parameters:  X: List[numpy.ndarray]
A list of multiple observation sequences.
 y: List[str]
A list of labels for the observation sequences.
 labels: List[str]
A list of labels for ordering the axes of the confusion matrix.
 verbose: bool
Whether to display a progress bar for predictions or not.
 n_jobs: int
 The number of jobs to run in parallel.Setting this to 1 will use all available CPU cores.
Returns:  accuracy: float
The categorical accuracy of the classifier on the observation sequences.
 confusion: numpy.ndarray
The confusion matrix representing the discrepancy between predicted and actual labels.