Dynamic Time Warping k-Nearest Neighbors Classifier (KNNClassifier)

The \(k\)-nearest neighbors (\(k\)-NN) classification algorithm is a very commonly used algorithm, and perhaps one of the most intuitive ones too.

Before we discuss \(k\)-NN with dynamic time warping for sequence classification, let us recap \(k\)-NN in the usual case of individual points \(\mathbf{x}\in\mathbb{R}^D\) in \(D\)-dimensional Euclidean space.

Recap of general k-NN

Suppose we have a training dataset \(\mathcal{D}_\text{train}=\big\{(\mathbf{x}^{(n)},c^{(n)})\big\}_{n=1}^N\), where \(N\) is the number of training example-label pairs, \(\mathbf{x}^{(n)}\in\mathbb{R}^D\) is a training example and \(c^{(n)}\in\{1,\ldots,C\}\) is its corresponding label.

When classifying a new test example \(\mathbf{x}^{(m)}\in\mathbb{R}^D\), a \(k\)-NN classifier does the following:

  1. Computes the distance from \(\mathbf{x}^{(m)}\) to every training example in \(\mathcal{D}_\text{train}\), using a distance measure such as Euclidean distance, \(d(\mathbf{x}^{(m)},\mathbf{x}^{(n)})=\Vert\mathbf{x}^{(m)}-\mathbf{x}^{(n)}\Vert\).

  2. Assigns \(c^{(m)}\) as the most common label of the \(k\) nearest training examples.

The most immediate observation one can make is that for every prediction, you need to look through the entire training dataset. As you can imagine, the inability to summarize the model with simpler parameters (e.g. weights of a neural network, or transition/emission/initial probabilities for a HMM), limits the practical use of \(k\)-NN classifiers – especially on large datasets.

While this classifier is conceptually simple, it can very often outperform more sophisticated machine learning algorithms in various classification tasks, even when looking only at the nearest neighbor (\(k=1\)).

The algorithm works on the intuition that if \(\mathbf{x}^{(m)}\) has similar features to \(\mathbf{x}^{(n)}\), then they should physically be close together (in \(D\)-dimensional Euclidean space).

Extending k-NN to sequences

We can try to extend this intuition to work with sequences. In general, Sequentia supports multivariate observation sequences. These can be represented as an ordered sequence \(O=\mathbf{o}^{(1)},\ldots,\mathbf{o}^{(T)}\) of observations \(\mathbf{o}^{(t)}\in\mathbb{R}^D\). Indeed, the durations of any two observation sequences \(O^{(n)}\) and \(O^{(m)}\) may differ.

When trying to apply the \(k\)-NN intuition to observation sequences, we can say that two sequences \(O^{(n)}\) and \(O^{(m)}\) which are similar to each other should have a small ‘distance’, and if they are different, they should have a large ‘distance’.

But what sort of ‘distance’ could this be? We need a measure that can compare any two sequences of different length, and is small when the sequences are similar, and large if they are different. One such distance measure that allows us to compare sequences of different length is Dynamic Time Warping (DTW).

Given sequence-label pairs \(\mathcal{D}_\text{train}=\big\{(O^{(n)},c^{(n)})\big\}_{n=1}^N\), apart from the fact that we now compute DTW distances between sequences rather than Euclidean distances between points, the rest of the \(k\)-NN algorithm remains unchanged, and indeed \(k\)-NN and DTW coupled together creates a powerful sequence classifier.

This \(k\)-NN classifier with the DTW distance measure is implemented by the KNNClassifier class.

Example

 1import librosa
 2from sequentia.preprocessing import Compose, Custom, Standardize
 3from sequentia.classifiers import KNNClassifier
 4from sequentia.datasets import load_digits
 5
 6# Load the FSDD dataset and split into training and testing
 7dataset = load_digits()
 8train_set, test_set = dataset.split(split_size=0.2, stratify=True, shuffle=True)
 9
10# Set MFCC configuration
11spec_kwargs = {'sr': 8000, 'n_mfcc': 5, 'n_fft': 1024, 'hop_length': 256, 'power': 2}
12
13# Create preprocessing pipeline
14transforms = Compose([
15    Custom(lambda x: librosa.feature.mfcc(x.flatten(), **spec_kwargs).T, name='MFCCs', desc='Generate MFCCs'),
16    Standardize()
17])
18
19# Apply transformations to the training and test set
20train_set.X = transforms(train_set.X)
21test_set.X = transforms(test_set.X)
22
23# Create and fit a 1-NN classifier using C-compiled DTW functions
24clf = KNNClassifier(k=1, classes=range(10), use_c=True).fit(train_set.X, train_set.y)
25
26# Make a single prediction
27y0_pred = clf.predict(test_set.X[0])
28
29# Make multiple predictions
30y_pred = clf.predict(test_set.X)
31
32# Make multiple predictions and return class scores (with parallelization)
33y_pred, y_pred_scores = clf.predict(test_set.X, return_scores=True, n_jobs=-1)
34
35# Calculate accuracy and generate confusion matrix (with parallelization)
36accuracy, confusion = clf.evaluate(test_set.X, test_set.y, n_jobs=-1)

For more elaborate examples, please have a look at the example notebooks.

API reference

class sequentia.classifiers.knn.KNNClassifier(k, classes, weighting=None, window=1.0, use_c=False, independent=False, random_state=None)[source]

A k-Nearest Neighbor classifier that uses dynamic time warping as a distance measure for comparing observation sequences of different length.

Parameters
k: int > 0

Number of neighbors.

classes: array-like of str/numeric

The complete set of possible classes/labels.

weighting: callable, optional

A callable that specifies how distance weighting should be performed. The callable should accept a numpy.ndarray of DTW distances, apply an element-wise weighting transformation, then return an equally-sized numpy.ndarray of weighted distances.

If no weighting is chosen then the function lambda x: np.ones_like(x) is used, which weights all of the distances equally.

Examples of weighting functions are:

  • \(e^{-\alpha x}\), specified by lambda x: np.exp(-alpha * x) for some positive \(\alpha\),

  • \(\frac{1}{x}\), specified by lambda x: 1 / x.

A good weighting function should ideally be defined at \(x=0\) in the event that two observations are perfectly aligned and therefore have zero DTW distance.

Tip

It may be desirable to restrict DTW distances to a small range if you intend to use a weighting function.

Using the MinMaxScale or Standardize preprocessing transformations to scale your features helps to ensure that DTW distances remain small.

window: 0 ≤ float ≤ 1

The width of the Sakoe-Chiba band global constraint as a fraction of the length of the longest of the two sequences. A larger constraint will speed up the DTW alignment by restricting the maximum temporal deviation from the diagonal of the DTW matrix, but too much constraint may lead to poor alignment.

The default value of 1 corresponds to full DTW computation with no global constraint applied.

use_c: bool

Whether or not to use fast pure C compiled functions (from the dtaidistance package) to perform the DTW computations.

Tip

If you set use_c = True and are receiving an error about a C library not being available, try reinstalling dtaidistance and disabling the cache:

pip install -vvv --upgrade --no-cache-dir --force-reinstall dtaidistance
independent: bool

Whether or not to allow features to be warped independently from each other. See here for a good overview of both approaches.

random_state: numpy.random.RandomState, int, optional

A random state object or seed for reproducible randomness.

Attributes
k (property): int > 0

The number of neighbors.

weighting (property): callable

The distance weighting function.

window (property): 0 ≤ float ≤ 1

The width of the Sakoe-Chiba band global constraint as a fraction of the length of the longest of the two sequences.

use_c (property): bool

Whether or not to use fast pure C compiled functions to perform the DTW computations.

encoder_ (property): sklearn.preprocessing.LabelEncoder

The label encoder fitted on the set of classes provided during instantiation.

classes_ (property): numpy.ndarray (str/numeric)

The complete set of possible classes/labels.

X_ (property): list of numpy.ndarray (float)

A list of multiple observation sequences used to fit the classifier.

y_ (property): numpy.ndarray (int)

The encoded labels for the observation sequences used to fit the classifier.

fit(X, y)[source]

Fits the classifier by adding labeled training observation sequences.

Parameters
X: list of numpy.ndarray (float)

A list of multiple observation sequences.

y: array-like of str/numeric

An iterable of labels for the observation sequences.

predict(X, return_scores=False, original_labels=True, verbose=True, n_jobs=1)[source]

Predicts the label for an observation sequence (or multiple sequences).

Parameters
X: numpy.ndarray (float) or list of numpy.ndarray (float)

An individual observation sequence or a list of multiple observation sequences.

return_scores: bool

Whether to return the scores for each class on the observation sequence(s).

original_labels: bool

Whether to inverse-transform the labels to their original encoding.

verbose: bool

Whether to display a progress bar or not.

Note

The progress bar cannot be displayed if both verbose=True and n_jobs > 1.

n_jobs: int > 0 or -1
The number of jobs to run in parallel.
Setting this to -1 will use all available CPU cores.
Returns
prediction(s): str/numeric orclass:numpy:numpy.ndarray (str/numeric)

The predicted label(s) for the observation sequence(s).

If original_labels is true, then the returned labels are inverse-transformed into their original encoding.

evaluate(X, y, verbose=True, n_jobs=1)[source]

Evaluates the performance of the classifier on a batch of observation sequences and their labels.

Parameters
X: list of numpy.ndarray (float)

A list of multiple observation sequences.

y: array-like of str/numeric

An iterable of labels for the observation sequences.

verbose: bool

Whether to display a progress bar for predictions or not.

n_jobs: int > 0 or -1
The number of jobs to run in parallel.
Setting this to -1 will use all available CPU cores.
Returns
accuracy: float

The categorical accuracy of the classifier on the observation sequences.

confusion:class:numpy:numpy.ndarray (int)

The confusion matrix representing the discrepancy between predicted and actual labels.

save(path)[source]

Serializes the KNNClassifier object by pickling its hyper-parameters, variables and training data.

Warning

As \(k\)-NN classifier must look through each training example during prediction, saving the classifier simply copies all of the training observation sequences and labels.

Parameters
path: str

File path (usually with .pkl extension) to store the serialized KNNClassifier object.

classmethod load(path)[source]

Deserializes a KNNClassifier object which was serialized with the save() function.

Parameters
path: str

File path of the serialized data generated by the save() method.

Returns
deserialized:class:KNNClassifier

The deserialized DTW \(k\)-NN classifier object.