KNN Regressor

The KNN Regressor is a regressor that uses the \(k\)-NN algorithm with DTW as a distance measure to identify a \(k\)-neighborhood of the most similar training sequences to the sequence being predicted.

To predict an output \(y'\in\mathbb{R}\) for a sequence \(O'\), the KNNRegressor works by:

  1. Calculating the DTW distance between \(O'\) and every training sequence.
  2. Forming a k-neighborhood \(\mathcal{K}'=\left\{O^{(1)},\ldots,O^{(k)}\right\}\) of the \(k\) nearest training sequences to \(O'\).
  3. Calculating a distance weighting \(w^{(1)},\ldots,w^{(k)}\) for each sequence in \(\mathcal{K}'\).
    A uniform weighting of 1 is used by default, meaning that all sequences in \(\mathcal{K}'\) have equal influence on the predicted output \(y'\). However, custom functions such as \(e^{-x}\) (where \(x\) is the DTW distance) can be specified to increase weight on training sequences that are more similar to \(O'\).
  4. Calculating \(y'\) as the distance weighted mean of the outputs \(y^{(1)},\ldots,y^{(k)}\) of sequences in \(\mathcal{K}'\).
    \[y' = \frac{\sum_{k=1}^Kw^{(k)}y^{(k)}}{\sum_{k=1}^Kw^{(k)}}\]

Note

Using a value of \(k\) greater than 1 is highly recommended for regression, to reduce variance.

API reference

Class

KNNRegressor

A k-nearest neighbor regressor that uses DTW as a distance measure for sequence comparison.

Methods

__init__(*[, k, weighting, window, ...])

Initializes the KNNRegressor.

compute_distance_matrix(X, *[, lengths])

Calculate a matrix of DTW distances between the sequences in X and the training sequences.

dtw(A, B)

Calculate the DTW distance between two observation sequences.

fit(X, y, *[, lengths])

Fits the regressor to the sequence(s) in X.

fit_predict(X, y, *[, lengths])

Fit the model to the sequence(s) in X and predicts outputs for X.

load(path, /)

Load and deserialize a fitted KNN estimator.

predict(X, *[, lengths])

Predicts outputs for the sequence(s) in X.

query_neighbors(X, *[, lengths, sort])

Query the k-nearest training observation sequences to each sequence in X.

save(path, /)

Serialize and save a fitted KNN estimator.

score(X, y, *[, lengths, sample_weight])

Calculate the predictive coefficient of determination (R2) for the sequence(s) in X.


class sequentia.models.knn.regressor.KNNRegressor

A k-nearest neighbor regressor that uses DTW as a distance measure for sequence comparison.

The regressor computes the output as a distance weighted average of the outputs of the sequences within the DTW k-neighborhood of the sequence being predicted.

__init__(*, k=1, weighting=None, window=1.0, independent=False, use_c=False, n_jobs=1, random_state=None)

Initializes the KNNRegressor.

Parameters:
  • self (KNNRegressor) –

  • k (int) – Number of neighbors.

  • weighting (Callable[[ndarray[Any, dtype[float64]]], ndarray[Any, dtype[float64]]] | None) –

    A callable that specifies how distance weighting should be performed.

    The callable should accept a numpy.ndarray of DTW distances, apply an element-wise weighting transformation to the matrix of DTW distances, then return an equally-sized numpy.ndarray of weightings.

    If None, then a uniform weighting of 1 will be applied to all distances.

  • window (The size of the Sakoe—Chiba band global constrant as a) –

    fraction of the length of the shortest of the two sequences being compared.

    • A larger window will give more freedom to the DTW alignment, allowing more deviation but leading to potentially slower computation. A window of 1 is equivalent to full DTW computation with no global constraint applied.

    • A smaller window will restrict the DTW alignment, and possibly speed up the DTW computation. A window of 0 is equivalent to Euclidean distance.

  • independent (bool) – Whether or not to allow features to be warped independently from each other. See [1] for an overview of independent and dependent dynamic time warping.

  • use_c (bool) – Whether or not to use fast pure C compiled functions from dtaidistance to perform the DTW computations.

  • n_jobs (int) –

    Maximum number of concurrently running workers.

    • If 1, no parallelism is used at all (useful for debugging).

    • If -1, all CPUs are used.

    • If < -1, (n_cpus + 1 + n_jobs) are used — e.g. n_jobs=-2 uses all but one.

  • random_state (int | RandomState | None) – Seed or numpy.random.RandomState object for reproducible pseudo-randomness.

Return type:

KNNRegressor

compute_distance_matrix(X, *, lengths=None)

Calculate a matrix of DTW distances between the sequences in X and the training sequences.

Parameters:
  • self (KNNMixin) –

  • X (ndarray[Any, dtype[float64]]) – Sequence(s).

  • lengths (ndarray[Any, dtype[int64]] | None) –

    Lengths of the sequence(s) provided in X.

    • If None, then X is assumed to be a single sequence.

    • len(X) should be equal to sum(lengths).

Returns:

DTW distance matrix.

Return type:

numpy.ndarray

Notes

This method requires a trained model — see fit().

dtw(A, B)

Calculate the DTW distance between two observation sequences.

Parameters:
Returns:

DTW distance.

Return type:

numpy.ndarray

Notes

This method requires a trained model — see fit().

fit(X, y, *, lengths=None)

Fits the regressor to the sequence(s) in X.

Parameters:
  • self (KNNRegressor) –

  • X (ndarray[Any, dtype[float64]]) – Sequence(s).

  • y (ndarray[Any, dtype[float64]]) – Outputs corresponding to sequence(s) in X.

  • lengths (ndarray[Any, dtype[int64]] | None) –

    Lengths of the sequence(s) provided in X.

    • If None, then X is assumed to be a single sequence.

    • len(X) should be equal to sum(lengths).

Returns:

The fitted regressor.

Return type:

KNNRegressor

fit_predict(X, y, *, lengths=None)

Fit the model to the sequence(s) in X and predicts outputs for X.

Parameters:
  • self (RegressorMixin) –

  • X (ndarray[Any, dtype[float64]]) – Sequence(s).

  • y (ndarray[Any, dtype[float64]]) – Outputs corresponding to sequence(s) in X.

  • lengths (ndarray[Any, dtype[int64]] | None) –

    Lengths of the sequence(s) provided in X.

    • If None, then X is assumed to be a single sequence.

    • len(X) should be equal to sum(lengths).

Returns:

Output predictions.

Return type:

numpy.ndarray

classmethod load(path, /)

Load and deserialize a fitted KNN estimator.

Parameters:
  • cls (type[KNNMixin]) –

  • path (str | Path | IO) – Location to load the serialized estimator from.

Returns:

Fitted KNN estimator.

Return type:

KNNMixin

See also

save

Serialize and save a fitted KNN estimator.

predict(X, *, lengths=None)

Predicts outputs for the sequence(s) in X.

Parameters:
  • self (KNNRegressor) –

  • X (ndarray[Any, dtype[float64]]) – Sequence(s).

  • lengths (ndarray[Any, dtype[int64]] | None) –

    Lengths of the sequence(s) provided in X.

    • If None, then X is assumed to be a single sequence.

    • len(X) should be equal to sum(lengths).

Returns:

Output predictions.

Return type:

numpy.ndarray

Notes

This method requires a trained regressor — see fit().

query_neighbors(X, *, lengths=None, sort=True)

Query the k-nearest training observation sequences to each sequence in X.

Parameters:
  • self (KNNMixin) –

  • X (ndarray[Any, dtype[float64]]) – Sequence(s).

  • lengths (ndarray[Any, dtype[int64]] | None) –

    Lengths of the sequence(s) provided in X.

    • If None, then X is assumed to be a single sequence.

    • len(X) should be equal to sum(lengths).

  • sort (bool) – Whether to sort the neighbors in order of nearest to furthest.

Returns:

K-nearest neighbors for each sequence in X.

  • Indices of the k-nearest training sequences.

  • DTW distances of the k-nearest training sequences.

  • Corresponding outputs of the k-nearest training sequences.

Return type:

tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

Notes

This method requires a trained model — see fit().

save(path, /)

Serialize and save a fitted KNN estimator.

Parameters:
  • self (KNNMixin) –

  • path (str | Path | IO) – Location to save the serialized estimator.

Return type:

None

Notes

This method requires a trained model — see fit().

See also

load

Load and deserialize a fitted KNN estimator.

score(X, y, *, lengths=None, sample_weight=None)

Calculate the predictive coefficient of determination (R2) for the sequence(s) in X.

Parameters:
Returns:

Coefficient of determination.

Return type:

float

Notes

This method requires a trained classifier — see fit().

independent: bool

Whether or not to allow features to be warped independently from each other.

k: int

Number of neighbors.

n_jobs: int

Maximum number of concurrently running workers.

random_state

Seed or numpy.random.RandomState object for reproducible pseudo-randomness.

use_c: bool

Set of possible class labels.

weighting: Callable[[ndarray], ndarray] | None

A callable that specifies how distance weighting should be performed.

window: float

The size of the Sakoe—Chiba band global constrant as a fraction of the length of the shortest of the two sequences being compared.

References