KNN Regressor¶
The KNN Regressor is a regressor that uses the \(k\)-NN algorithm with DTW as a distance measure to identify a \(k\)-neighborhood of the most similar training sequences to the sequence being predicted.
To predict an output \(y'\in\mathbb{R}\) for a sequence \(O'\), the KNNRegressor
works by:
- Calculating the DTW distance between \(O'\) and every training sequence.
- Forming a k-neighborhood \(\mathcal{K}'=\left\{O^{(1)},\ldots,O^{(k)}\right\}\) of the \(k\) nearest training sequences to \(O'\).
- Calculating a distance weighting \(w^{(1)},\ldots,w^{(k)}\) for each sequence in \(\mathcal{K}'\).A uniform weighting of 1 is used by default, meaning that all sequences in \(\mathcal{K}'\) have equal influence on the predicted output \(y'\). However, custom functions such as \(e^{-x}\) (where \(x\) is the DTW distance) can be specified to increase weight on training sequences that are more similar to \(O'\).
- Calculating \(y'\) as the distance weighted mean of the outputs \(y^{(1)},\ldots,y^{(k)}\) of sequences in \(\mathcal{K}'\).\[y' = \frac{\sum_{k=1}^Kw^{(k)}y^{(k)}}{\sum_{k=1}^Kw^{(k)}}\]
Note
Using a value of \(k\) greater than 1 is highly recommended for regression, to reduce variance.
API reference¶
Class¶
A k-nearest neighbor regressor that uses DTW as a distance measure for sequence comparison. |
Methods¶
|
Initializes the |
|
Calculate a matrix of DTW distances between the sequences in |
|
Calculate the DTW distance between two observation sequences. |
|
Fits the regressor to the sequence(s) in |
|
Fit the model to the sequence(s) in |
|
Load and deserialize a fitted KNN estimator. |
|
Predicts outputs for the sequence(s) in |
|
Query the k-nearest training observation sequences to each sequence in |
|
Serialize and save a fitted KNN estimator. |
|
Calculate the predictive coefficient of determination (R2) for the sequence(s) in |
- class sequentia.models.knn.regressor.KNNRegressor¶
A k-nearest neighbor regressor that uses DTW as a distance measure for sequence comparison.
The regressor computes the output as a distance weighted average of the outputs of the sequences within the DTW k-neighborhood of the sequence being predicted.
- __init__(*, k=1, weighting=None, window=1.0, independent=False, use_c=False, n_jobs=1, random_state=None)¶
Initializes the
KNNRegressor
.- Parameters:
self (KNNRegressor) –
k (int) – Number of neighbors.
weighting (Callable[[ndarray[Any, dtype[float64]]], ndarray[Any, dtype[float64]]] | None) –
A callable that specifies how distance weighting should be performed.
The callable should accept a
numpy.ndarray
of DTW distances, apply an element-wise weighting transformation to the matrix of DTW distances, then return an equally-sizednumpy.ndarray
of weightings.If
None
, then a uniform weighting of 1 will be applied to all distances.window (The size of the Sakoe—Chiba band global constrant as a) –
fraction of the length of the shortest of the two sequences being compared.
A larger window will give more freedom to the DTW alignment, allowing more deviation but leading to potentially slower computation. A window of 1 is equivalent to full DTW computation with no global constraint applied.
A smaller window will restrict the DTW alignment, and possibly speed up the DTW computation. A window of 0 is equivalent to Euclidean distance.
independent (bool) – Whether or not to allow features to be warped independently from each other. See [1] for an overview of independent and dependent dynamic time warping.
use_c (bool) – Whether or not to use fast pure C compiled functions from dtaidistance to perform the DTW computations.
n_jobs (int) –
Maximum number of concurrently running workers.
If 1, no parallelism is used at all (useful for debugging).
If -1, all CPUs are used.
If < -1,
(n_cpus + 1 + n_jobs)
are used — e.g.n_jobs=-2
uses all but one.
random_state (int | RandomState | None) – Seed or
numpy.random.RandomState
object for reproducible pseudo-randomness.
- Return type:
- compute_distance_matrix(X, *, lengths=None)¶
Calculate a matrix of DTW distances between the sequences in
X
and the training sequences.- Parameters:
- Returns:
DTW distance matrix.
- Return type:
Notes
This method requires a trained model — see
fit()
.
- dtw(A, B)¶
Calculate the DTW distance between two observation sequences.
- Parameters:
- Returns:
DTW distance.
- Return type:
Notes
This method requires a trained model — see
fit()
.
- fit(X, y, *, lengths=None)¶
Fits the regressor to the sequence(s) in
X
.- Parameters:
- Returns:
The fitted regressor.
- Return type:
- fit_predict(X, y, *, lengths=None)¶
Fit the model to the sequence(s) in
X
and predicts outputs forX
.- Parameters:
- Returns:
Output predictions.
- Return type:
- classmethod load(path, /)¶
Load and deserialize a fitted KNN estimator.
- Parameters:
- Returns:
Fitted KNN estimator.
- Return type:
KNNMixin
See also
save
Serialize and save a fitted KNN estimator.
- predict(X, *, lengths=None)¶
Predicts outputs for the sequence(s) in
X
.- Parameters:
- Returns:
Output predictions.
- Return type:
Notes
This method requires a trained regressor — see
fit()
.
- query_neighbors(X, *, lengths=None, sort=True)¶
Query the k-nearest training observation sequences to each sequence in
X
.- Parameters:
- Returns:
K-nearest neighbors for each sequence in
X
.Indices of the k-nearest training sequences.
DTW distances of the k-nearest training sequences.
Corresponding outputs of the k-nearest training sequences.
- Return type:
Notes
This method requires a trained model — see
fit()
.
- save(path, /)¶
Serialize and save a fitted KNN estimator.
- Parameters:
- Return type:
None
Notes
This method requires a trained model — see
fit()
.See also
load
Load and deserialize a fitted KNN estimator.
- score(X, y, *, lengths=None, sample_weight=None)¶
Calculate the predictive coefficient of determination (R2) for the sequence(s) in
X
.- Parameters:
self (RegressorMixin) –
y (ndarray[Any, dtype[float64]]) – Outputs corresponding to sequence(s) in
X
.lengths (ndarray[Any, dtype[int64]] | None) –
Lengths of the sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single sequence.len(X)
should be equal tosum(lengths)
.
sample_weight (ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]] | None) – See
sklearn.metrics.r2_score()
.
- Returns:
Coefficient of determination.
- Return type:
Notes
This method requires a trained classifier — see
fit()
.
- random_state¶
Seed or
numpy.random.RandomState
object for reproducible pseudo-randomness.
References