KNN Regressor¶

The KNN Regressor is a regressor that uses the \(k\)-NN algorithm with DTW as a distance measure to identify a \(k\)-neighborhood of the most similar training sequences to the sequence being predicted.

To predict an output \(y'\in\mathbb{R}\) for a sequence \(O'\), the KNNRegressor works by:

Calculating the DTW distance between \(O'\) and every training sequence.
Forming a k-neighborhood \(\mathcal{K}'=\left\{O^{(1)},\ldots,O^{(k)}\right\}\) of the \(k\) nearest training sequences to \(O'\).
Calculating a distance weighting \(w^{(1)},\ldots,w^{(k)}\) for each sequence in \(\mathcal{K}'\).

A uniform weighting of 1 is used by default, meaning that all sequences in \(\mathcal{K}'\) have equal influence on the predicted output \(y'\). However, custom functions such as \(e^{-x}\) (where \(x\) is the DTW distance) can be specified to increase weight on training sequences that are more similar to \(O'\).
Calculating \(y'\) as the distance weighted mean of the outputs \(y^{(1)},\ldots,y^{(k)}\) of sequences in \(\mathcal{K}'\).

\[y' = \frac{\sum_{k=1}^Kw^{(k)}y^{(k)}}{\sum_{k=1}^Kw^{(k)}}\]

Note

Using a value of \(k\) greater than 1 is highly recommended for regression, to reduce variance.

API reference¶

Class¶

KNNRegressor

A k-nearest neighbor regressor that uses DTW as a distance measure for sequence comparison.

Methods¶

`__init__`(*[, k, weighting, window, ...])	Initializes the `KNNRegressor`.
`compute_distance_matrix`(X[, lengths])	Calculates a matrix of DTW distances between the sequences in `X` and the training sequences.
`dtw`(A, B, **kwargs)	Calculates the DTW distance between two univariate or multivariate sequences.
`fit`(X, y[, lengths])	Fits the regressor to the sequence(s) in `X`.
`load`(path)	Loads and deserializes a fitted KNN estimator.
`plot_dtw_histogram`(X[, lengths, ax])	Calculates DTW distances between `X` and training sequences, and plots a distance histogram.
`plot_warping_path_1d`(a, b, **kwargs)	Calculates the DTW matrix between two sequences and plots the optimal warping path.
`plot_weight_histogram`(X[, lengths, ax])	Calculates DTW weights between `X` and training sequences, and plots a weight histogram.
`predict`(X[, lengths])	Predicts outputs for the sequence(s) in `X`.
`query_neighbors`(X[, lengths, sort])	Queries the k-nearest training observation sequences to each sequence in `X`.
`save`(path)	Serializes and saves a fitted KNN estimator.
`score`(X, y[, lengths, sample_weight])	Calculates the coefficient of determination (R²) for the sequence(s) in `X`.

class sequentia.models.knn.regressor.KNNRegressor[source]¶

A k-nearest neighbor regressor that uses DTW as a distance measure for sequence comparison.

The regressor computes the output as a distance weighted average of the outputs of the sequences within the DTW k-neighborhood of the sequence being predicted.

__init__(*, k=1, weighting=None, window=1, independent=False, use_c=False, n_jobs=1, random_state=None)[source]¶

Initializes the KNNRegressor.

Parameters:

k (PositiveInt) – Number of neighbors.
weighting (Optional[Callable]) – A callable that specifies how distance weighting should be performed. The callable should accept a numpy.ndarray of DTW distances, apply an element-wise weighting transformation to the matrix of DTW distances, then return an equally-sized numpy.ndarray of weightings. If None, then a uniform weighting of 1 will be applied to all distances.
window (ConstrainedFloatValue) –
The size of the Sakoe—Chiba band global constrant as a fraction of the length of the shortest of the two sequences being compared.
- A larger window will give more freedom to the DTW alignment, allowing more deviation but leading to potentially slower computation. A window of 1 is equivalent to full DTW computation with no global constraint applied.
- A smaller window will restrict the DTW alignment, and possibly speed up the DTW computation. A window of 0 is equivalent to Euclidean distance.
independent (bool) – Whether or not to allow features to be warped independently from each other. See [1] for an overview of independent and dependent dynamic time warping.
use_c (bool) – Whether or not to use fast pure C compiled functions from dtaidistance to perform the DTW computations.
n_jobs (Union[NegativeInt, PositiveInt]) –
Maximum number of concurrently running workers.
- If 1, no parallelism is used at all (useful for debugging).
- If -1, all CPUs are used.
- If < -1, (n_cpus + 1 + n_jobs) are used — e.g. n_jobs=-2 uses all but one.
random_state (Optional[Union[NonNegativeInt, RandomState]]) – Seed or numpy.random.RandomState object for reproducible pseudo-randomness.

Return type:

KNNRegressor

compute_distance_matrix(X, lengths=None, **kwargs)¶

Calculates a matrix of DTW distances between the sequences in X and the training sequences.

Parameters:

X (Array) –
Univariate or multivariate observation sequence(s).
- Should be a single 1D or 2D array.
- Should have length as the 1st dimension and features as the 2nd dimension.
- Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.
lengths (Optional[Array]) –
Lengths of the observation sequence(s) provided in X.
- If None, then X is assumed to be a single observation sequence.
- len(X) should be equal to sum(lengths).
**kwargs –
Model parameters to temporarily override (for experimentation purposes).
- window: See __init__().
- independent: See __init__().

Note:

This method requires a trained classifier — see fit().

Returns:

DTW distance matrix.

Return type:

Array

dtw(A, B, **kwargs)¶

Calculates the DTW distance between two univariate or multivariate sequences.

Parameters:

A (Array) – The first sequence.
B (Array) – The second sequence.
**kwargs –
Model parameters to temporarily override (for experimentation purposes).
- window: See __init__().
- independent: See __init__().

Returns:

DTW distance.

Return type:

float

fit(X, y, lengths=None)[source]¶

Fits the regressor to the sequence(s) in X.

Parameters:

X (Array) –
Univariate or multivariate observation sequence(s).
- Should be a single 1D or 2D array.
- Should have length as the 1st dimension and features as the 2nd dimension.
- Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.
y (Array) – Outputs corresponding to sequence(s) provided in X.
lengths (Optional[Array]) –
Lengths of the observation sequence(s) provided in X.
- If None, then X is assumed to be a single observation sequence.
- len(X) should be equal to sum(lengths).

Returns:

The fitted regressor.

Return type:

KNNRegressor

classmethod load(path)¶

Loads and deserializes a fitted KNN estimator.

Parameters:: path (Union[str, Path, IO]) – Location to load the serialized estimator from.
Returns:: Fitted KNN estimator.

KNN Regressor¶

API reference¶

Class¶

Methods¶

Table of Contents

Previous topic

Next topic

This Page