Pipeline¶
Before fitting and using a model, it is common to apply a sequence of preprocessing steps to data.
Pipelines can be used to wrap preprocessing transformations as well as a model into a single estimator, making it more convenient to reapply the transformations and make predictions on new data.
The Pipeline
class implements this feature and is based on sklearn.pipeline.Pipeline
.
API reference¶
Class¶
Pipeline of transforms with a final estimator. |
Methods¶
|
Initializes the |
|
Fit the model. |
|
Transform the data, and apply fit_predict with the final estimator. |
|
Fit the model and transform with the final estimator. |
|
Apply inverse_transform for each step in a reverse order. |
|
Transform the data, and apply predict with the final estimator. |
|
Transform the data, and apply predict_proba with the final estimator. |
|
Transform the data, and apply score with the final estimator. |
|
Transform the data, and apply transform with the final estimator. |
- class sequentia.pipeline.Pipeline[source]¶
Pipeline of transforms with a final estimator.
Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement
fit
andtransform
methods. The final estimator only needs to implementfit
. The transformers in the pipeline can be cached usingmemory
argument. The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a__
. A step’s estimator may be replaced entirely by setting the parameter with its name to another estimator, or a transformer removed by setting it to'passthrough'
orNone
.See also
sklearn.pipeline.Pipeline
Pipeline
is based onsklearn.pipeline.Pipeline
, but adapted to accept and work with sequences. Read more in the User Guide.
Examples
Creating a
Pipeline
consisting of two transforms and aKNNClassifier
, and fitting it to sequences in the spoken digits dataset.from sequentia.models import KNNClassifier from sequentia.preprocessing import IndependentFunctionTransformer from sequentia.pipeline import Pipeline from sequentia.datasets import load_digits from sklearn.preprocessing import scale from sklearn.decomposition import PCA # Fetch MFCCs of spoken digits digits = load_digits() train, test = digits.split(test_size=0.2) # Create a pipeline with two transforms and a classifier pipeline = Pipeline([ ('standardize', IndependentFunctionTransformer(scale)), ('pca', PCA(n_components=5)), ('clf', KNNClassifier(k=1)) ]) # Fit the pipeline transforms and classifier to training data pipeline.fit(train.X, train.lengths) # Apply the transforms to training sequences and make predictions y_train_pred = pipeline.predict(train.X, train.y, train.lengths) # Calculate accuracy on test data acc = pipeline.score(test.X, test.y, test.lengths)
- __init__(steps, *, memory=None, verbose=False)[source]¶
Initializes the
Pipeline
.- Parameters:
steps (List[Tuple[str, BaseEstimator]]) – Collection of transforms implementing
fit
/transform
that are chained, with the last object being an estimator.memory (str | Memory | None) – Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute
named_steps
orsteps
to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.verbose (bool) – If
True
, the time elapsed while fitting each step will be printed as it is completed.
- Return type:
- fit(X, y=None, lengths=None, **fit_params)[source]¶
Fit the model.
Fit all the transformers one after the other and transform the data. Finally, fit the transformed data using the final estimator.
- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
y (Array | None) – Outputs corresponding to sequence(s) provided in
X
. Only required if the final estimator is a supervised model.lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
fit_params – Parameters passed to the
fit
method of each step, where each parameter name is prefixed such that parameterp
for steps
has keys__p
.
- Returns:
The fitted pipeline.
- Return type:
- fit_predict(X, y, lengths=None, **fit_params)[source]¶
Transform the data, and apply fit_predict with the final estimator.
Call fit_transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls fit_predict method. Only valid if the final estimator implements fit_predict.
- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
y (Array) – Outputs corresponding to sequence(s) provided in
X
.lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
fit_params – Parameters passed to the
fit
method of each step, where each parameter name is prefixed such that parameterp
for steps
has keys__p
.
- Returns:
Output predictions.
- Return type:
Array
- fit_transform(X, lengths=None, **fit_params)[source]¶
Fit the model and transform with the final estimator.
Fits all the transformers one after the other and transform the data. Then uses fit_transform on transformed data with the final estimator.
- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
fit_params – Parameters passed to the
fit
method of each step, where each parameter name is prefixed such that parameterp
for steps
has keys__p
.
- Returns:
The transformed data.
- Return type:
Array
- inverse_transform(X, lengths=None)[source]¶
Apply inverse_transform for each step in a reverse order.
All estimators in the pipeline must support inverse_transform.
- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
- Returns:
The inverse transformed data.
- Return type:
Array
- predict(X, lengths=None)[source]¶
Transform the data, and apply predict with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict method. Only valid if the final estimator implements predict.
- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
- Returns:
Output predictions.
- Return type:
Array
- predict_proba(X, lengths=None)[source]¶
Transform the data, and apply predict_proba with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict_proba method. Only valid if the final estimator implements predict_proba.
- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
- Returns:
Output probabilities.
- Return type:
Array
- score(X, y=None, lengths=None, sample_weight=None)[source]¶
Transform the data, and apply score with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls score method. Only valid if the final estimator implements score.
- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
y (Array | None) – Outputs corresponding to sequence(s) provided in
X
. Must be provided if the final estimator is a model, i.e. not a transform.lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
sample_weight (Any | None) – If not
None
, this argument is passed assample_weight
keyword argument to thescore
method of the final estimator.
- Returns:
Result of calling score on the final estimator.
- Return type:
float
- transform(X, lengths=None)[source]¶
Transform the data, and apply transform with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls transform method. Only valid if the final estimator implements transform.
This also works where final estimator is None in which case all prior transformations are applied.
- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
- Returns:
The transformed data.
- Return type:
Array
- property classes_¶
The classes labels. Only exist if the last step is a classifier.
- property n_features_in_¶
Number of features seen during first step fit method.
- property named_steps¶
Access the steps by name.
Read-only attribute to access any step by given name. Keys are steps names and values are the steps objects.