Function Transformer

When preprocessing sequential data, it is often preferable to apply certain transformations on each sequence independently rather than applying a single transformation to all of the data collectively.

For example in speech recognition, suppose we have a dataset of MFCC features extracted from audio recordings of different speakers. If we are not interested in speaker-focused tasks such as speaker recognition, and instead only want to classify recordings, we need to be able to compare recordings to each other — especially if using algorithms such as KNNClassifier which rely on distance comparisons.

In this case, we might want to standardize the MFCCs for each recording individually, (i.e. centering and scaling by separate feature means and standard deviations for each recording) so that they are represented as deviations from zero, which is a form that is better suited for comparison as it reduces speaker-specific nuances in the data due to differences in scale or location.

Another example would be signal filters, which should be applied to each sequence independently.

IndependentFunctionTransformer allows for such transformations to be defined for arbitrary functions.

API reference

Class

IndependentFunctionTransformer

Constructs a transformer from an arbitrary callable, applying the transform independently to each sequence.

Methods

__init__([func, inverse_func, validate, ...])

Initializes the IndependentFunctionTransformer.

fit(X[, lengths])

Fits the transformer to X.

fit_transform(X[, lengths])

Fits the transformer to the sequence(s) in X and returns a transformed version of X.

inverse_transform(X[, lengths])

Applies the inverse transformation to X.

transform(X[, lengths])

Applies the transformation to X, producing a transformed version of X.


class sequentia.preprocessing.transforms.IndependentFunctionTransformer[source]

Constructs a transformer from an arbitrary callable, applying the transform independently to each sequence.

This transform forwards its X and lengths arguments to a user-defined function or function object and returns the result of this function. This is useful for stateless transformations such as taking the log of frequencies, doing custom scaling, etc. Note: If a lambda is used as the function, then the resulting transformer will not be pickleable.

This works conveniently with functions in sklearn.preprocessing such as scale() or normalize().

Note:

This is a stateless transform, meaning fit() and fit_transform() will not fit on any data.

See also

sklearn.preprocessing.FunctionTransformer

IndependentFunctionTransformer is based on this class, which applies the callable to the entire input array X as if it were a single sequence. Read more in the User Guide.

Examples

Using an IndependentFunctionTransformer with sklearn.preprocessing.minmax_scale() to scale features to the range [0, 1] independently for each sequence in the spoken digits dataset.

from sklearn.preprocessing import minmax_scale
from sequentia.preprocessing import IndependentFunctionTransformer
from sequentia.datasets import load_digits

# Fetch MFCCs of spoken digits
data = load_digits()

# Create an independent min-max transform
transform = IndependentFunctionTransformer(minmax_scale)

# Apply the transform to the data
Xt = transform.transform(data.X, data.lengths)
__init__(func=None, inverse_func=None, *, validate=False, check_inverse=True, kw_args=None, inv_kw_args=None)[source]

Initializes the IndependentFunctionTransformer.

Parameters:
  • func (Callable | None) – The callable to use for the transformation. This will be passed the same arguments as transform, with args and kwargs forwarded. If None, then func will be the identity function.

  • inverse_func (Callable | None) – The callable to use for the inverse transformation. This will be passed the same arguments as inverse transform, with args and kwargs forwarded. If None, then inverse_func will be the identity function.

  • validate (bool) –

    Indicates whether the input X array should be checked before calling func.

    • If False, there is no input validation.

    • If True, then X will be converted to a 2-dimensional NumPy array. If the conversion is not possible an exception is raised.

  • check_inverse (bool) – Whether to check that or func followed by inverse_func leads to the original inputs. It can be used for a sanity check, raising a warning when the condition is not fulfilled.

  • kw_args – Dictionary of additional keyword arguments to pass to func.

  • inv_kw_args – Dictionary of additional keyword arguments to pass to inverse_func.

Return type:

IndependentFunctionTransformer

fit(X, lengths=None)[source]

Fits the transformer to X.

Parameters:
  • X (Array) –

    Univariate or multivariate observation sequence(s).

    • Should be a single 1D or 2D array.

    • Should have length as the 1st dimension and features as the 2nd dimension.

    • Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

  • lengths (Array | None) –

    Lengths of the observation sequence(s) provided in X.

    • If None, then X is assumed to be a single observation sequence.

    • len(X) should be equal to sum(lengths).

Returns:

The fitted transformer.

Return type:

IndependentFunctionTransformer

fit_transform(X, lengths=None)

Fits the transformer to the sequence(s) in X and returns a transformed version of X.

Parameters:
  • X (Array) –

    Univariate or multivariate observation sequence(s).

    • Should be a single 1D or 2D array.

    • Should have length as the 1st dimension and features as the 2nd dimension.

    • Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

  • lengths (Array | None) –

    Lengths of the observation sequence(s) provided in X.

    • If None, then X is assumed to be a single observation sequence.

    • len(X) should be equal to sum(lengths).

Returns:

The transformed data.

Return type:

Array

inverse_transform(X, lengths=None)[source]

Applies the inverse transformation to X.

Parameters:
  • X (Array) –

    Univariate or multivariate observation sequence(s).

    • Should be a single 1D or 2D array.

    • Should have length as the 1st dimension and features as the 2nd dimension.

    • Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

  • lengths (Array | None) –

    Lengths of the observation sequence(s) provided in X.

    • If None, then X is assumed to be a single observation sequence.

    • len(X) should be equal to sum(lengths).

Returns:

The inverse transformed array.

Return type:

Array

transform(X, lengths=None)[source]

Applies the transformation to X, producing a transformed version of X.

Parameters:
  • X (Array) –

    Univariate or multivariate observation sequence(s).

    • Should be a single 1D or 2D array.

    • Should have length as the 1st dimension and features as the 2nd dimension.

    • Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.

  • lengths (Array | None) –

    Lengths of the observation sequence(s) provided in X.

    • If None, then X is assumed to be a single observation sequence.

    • len(X) should be equal to sum(lengths).

Returns:

The transformed array.

Return type:

Array