Function Transformer¶
When preprocessing sequential data, it is often preferable to apply certain transformations on each sequence independently rather than applying a single transformation to all of the data collectively.
For example in speech recognition, suppose we have a dataset of MFCC features extracted from audio recordings of different speakers.
If we are not interested in speaker-focused tasks such as speaker recognition, and instead only want to classify recordings,
we need to be able to compare recordings to each other — especially if using algorithms such as KNNClassifier
which rely on distance comparisons.
In this case, we might want to standardize the MFCCs for each recording individually, (i.e. centering and scaling by separate feature means and standard deviations for each recording) so that they are represented as deviations from zero, which is a form that is better suited for comparison as it reduces speaker-specific nuances in the data due to differences in scale or location.
Another example would be signal filters, which should be applied to each sequence independently.
IndependentFunctionTransformer
allows for such transformations to be defined for arbitrary functions.
API reference¶
Class¶
Constructs a transformer from an arbitrary callable, applying the transform independently to each sequence. |
Methods¶
|
Initializes the |
|
Fits the transformer to |
|
Fits the transformer to the sequence(s) in |
|
Applies the inverse transformation to |
|
Applies the transformation to |
- class sequentia.preprocessing.transforms.IndependentFunctionTransformer[source]¶
Constructs a transformer from an arbitrary callable, applying the transform independently to each sequence.
This transform forwards its
X
andlengths
arguments to a user-defined function or function object and returns the result of this function. This is useful for stateless transformations such as taking the log of frequencies, doing custom scaling, etc. Note: If a lambda is used as the function, then the resulting transformer will not be pickleable.This works conveniently with functions in
sklearn.preprocessing
such asscale()
ornormalize()
.- Note:
This is a stateless transform, meaning
fit()
andfit_transform()
will not fit on any data.
See also
sklearn.preprocessing.FunctionTransformer
IndependentFunctionTransformer
is based on this class, which applies the callable to the entire input arrayX
as if it were a single sequence. Read more in the User Guide.
Examples
Using an
IndependentFunctionTransformer
withsklearn.preprocessing.minmax_scale()
to scale features to the range [0, 1] independently for each sequence in the spoken digits dataset.from sklearn.preprocessing import minmax_scale from sequentia.preprocessing import IndependentFunctionTransformer from sequentia.datasets import load_digits # Fetch MFCCs of spoken digits data = load_digits() # Create an independent min-max transform transform = IndependentFunctionTransformer(minmax_scale) # Apply the transform to the data Xt = transform.transform(data.X, data.lengths)
- __init__(func=None, inverse_func=None, *, validate=False, check_inverse=True, kw_args=None, inv_kw_args=None)[source]¶
Initializes the
IndependentFunctionTransformer
.- Parameters:
func (Callable | None) – The callable to use for the transformation. This will be passed the same arguments as transform, with args and kwargs forwarded. If
None
, thenfunc
will be the identity function.inverse_func (Callable | None) – The callable to use for the inverse transformation. This will be passed the same arguments as inverse transform, with args and kwargs forwarded. If
None
, theninverse_func
will be the identity function.validate (bool) –
Indicates whether the input
X
array should be checked before callingfunc
.If
False
, there is no input validation.If
True
, thenX
will be converted to a 2-dimensional NumPy array. If the conversion is not possible an exception is raised.
check_inverse (bool) – Whether to check that or
func
followed byinverse_func
leads to the original inputs. It can be used for a sanity check, raising a warning when the condition is not fulfilled.kw_args – Dictionary of additional keyword arguments to pass to
func
.inv_kw_args – Dictionary of additional keyword arguments to pass to
inverse_func
.
- Return type:
- fit(X, lengths=None)[source]¶
Fits the transformer to
X
.- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
- Returns:
The fitted transformer.
- Return type:
- fit_transform(X, lengths=None)¶
Fits the transformer to the sequence(s) in
X
and returns a transformed version ofX
.- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
- Returns:
The transformed data.
- Return type:
Array
- inverse_transform(X, lengths=None)[source]¶
Applies the inverse transformation to
X
.- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
- Returns:
The inverse transformed array.
- Return type:
Array
- transform(X, lengths=None)[source]¶
Applies the transformation to
X
, producing a transformed version ofX
.- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
lengths (Array | None) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
- Returns:
The transformed array.
- Return type:
Array