Constant Trimming (
Many datasets consisting of sequential data often pad observation sequences with zeros or other values in order to ensure that the machine learning algorithms receive sequences of equal length. Although this comes with the advantage of being able to represent the sequences in a matrix, the added zeros may affect the performance of the machine learning algorithms.
As the algorithms implemented by Sequentia focus on supporting variable-length sequences out of the box, padding is not necessary, and can be removed with this method.
This preprocessing method does not only remove trailing constant observations from the start or end of a sequence, but will also remove any that occur anywhere in the sequence.
Trim constant observations from the input observation sequence(s).
- const: float
The constant value.
>>> # Create some sample data >>> z = np.zeros((4, 3)) >>> x = lambda i: np.vstack((z, np.random.random((10 * i, 3)), z)) >>> X = [x(i) for i in range(1, 4)] >>> # Trim the data >>> X = TrimConstants()(X)
Applies the transformation to a single observation sequence.
- X: numpy.ndarray (float)
An individual observation sequence.
The transformed input observation sequence.
Applies the transformation to the observation sequence(s).
- X: numpy.ndarray (float) or list of numpy.ndarray (float)
An individual observation sequence or a list of multiple observation sequences.
- validate: bool
Whether or not to validate the input sequences.