Introduction to Preprocessing

Sequentia provides a number of useful preprocessing methods for sequential data.

Additionally, the provided Compose class makes it possible to apply multiple transformations.

Note

The existing preprocessing methods in sequentia.preprocessing are currently only applicable to lists of numpy.ndarray objects, and therefore cannot be applied as transformations for torch.Tensor objects.

Unfortunately this means that the preprocessing methods can only be used to preprocess data for sequentia.classifiers.knn.KNNClassifier and sequentia.classifiers.hmm.HMMClassifier, and not sequentia.classifiers.rnn.DeepGRU.

It is possible to attempt to use these transformations on torch.Tensor objects by bypassing validation when applying the transformation,

x = torch.rand(5, 3)
x = Center()(x, validate=False)

but this likely will not work due to differences in numpy.ndarray and torch.Tensor.

Each of the transformations follow a similar interface, based on the abstract Transform class:

class sequentia.preprocessing.Transform[source]

Base class representing a single transformation.

__call__(X, validate=True)[source]

Applies the transformation to the observation sequence(s).

Parameters
X: numpy.ndarray (float) or list of numpy.ndarray (float)

An individual observation sequence or a list of multiple observation sequences.

validate: bool

Whether or not to validate the input sequences.

Returns
transformed: numpy.ndarray (float) or list of numpy.ndarray (float)

The transformed input observation sequence(s).

transform(x)[source]

Applies the transformation to a single observation sequence.

Parameters
X: numpy.ndarray (float)

An individual observation sequence.

Returns
transformed: numpy.ndarray (float)

The transformed input observation sequence.

fit(X, validate=True)[source]

Fit the transformation on the provided observation sequence(s) (without transforming them).

Parameters
X: numpy.ndarray (float) or list of numpy.ndarray (float)

An individual observation sequence or a list of multiple observation sequences.

validate: bool

Whether or not to validate the input sequences.

fit_transform(X, validate=True)[source]

Fit the transformation with the provided observation sequence(s) and transform them.

Parameters
X: numpy.ndarray (float) or list of numpy.ndarray (float)

An individual observation sequence or a list of multiple observation sequences.

validate: bool

Whether or not to validate the input sequences.

Returns
transformed: numpy.ndarray (float) or list of numpy.ndarray (float)

The transformed input observation sequence(s).

is_fitted()[source]

Check whether or not the transformation is fitted on some observation sequence(s).

Returns
fitted: bool

Whether or not the transformation is fitted.

unfit()[source]

Unfit the transformation by resetting the parameters to their default settings.