Constant Trimming (TrimConstants)

Many datasets consisting of sequential data often pad observation sequences with zeros or other values in order to ensure that the machine learning algorithms receive sequences of equal length. Although this comes with the advantage of being able to represent the sequences in a matrix, the added zeros may affect the performance of the machine learning algorithms.

As the algorithms implemented by Sequentia focus on supporting variable-length sequences out of the box, padding is not necessary, and can be removed with this method.

Warning

This preprocessing method does not only remove trailing constant observations from the start or end of a sequence, but will also remove any that occur anywhere in the sequence.

API reference

class sequentia.preprocessing.TrimConstants(constant=0)[source]

Trim constant observations from the input observation sequence(s).

Parameters
const: float

The constant value.

Examples

>>> # Create some sample data
>>> z = np.zeros((4, 3))
>>> x = lambda i: np.vstack((z, np.random.random((10 * i, 3)), z))
>>> X = [x(i) for i in range(1, 4)]
>>> # Trim the data
>>> X = TrimConstants()(X)
transform(x)[source]

Applies the transformation to a single observation sequence.

Parameters
X: numpy.ndarray (float)

An individual observation sequence.

Returns
transformed: numpy.ndarray (float)

The transformed input observation sequence.

__call__(X, validate=True)

Applies the transformation to the observation sequence(s).

Parameters
X: numpy.ndarray (float) or list of numpy.ndarray (float)

An individual observation sequence or a list of multiple observation sequences.

validate: bool

Whether or not to validate the input sequences.

Returns
transformed: numpy.ndarray (float) or list of numpy.ndarray (float)

The transformed input observation sequence(s).