Zero-trimming (trim_zeros)

Removes zero-observations from an observation sequence.

Many datasets consisting of sequential data often pad observation sequences with zeros in order to ensure that the machine learning algorithms receive sequences of equal length. Although this comes with the advantage of being able to represent the sequences in a matrix, the added zeros may affect the performance of the machine learning algorithms.

As the algorithms implemented by Sequentia focus on supporting variable-length sequences out of the box, zero padding is not necessary, and can be removed with this method.

NOTE: This preprocessing method does not only remove trailing zeros from the start or end of a sequence, but will also remove any zero-observations that occur anywhere in the sequence.

For further information, please see the preprocessing tutorial notebook.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import numpy as np
from sequentia.preprocessing import trim_zeros

# Create some sample data
z = np.zeros((4, 3))
x = lambda i: np.vstack((z, np.random.random((10 * i, 3)), z))
X = [x(i) for i in range(1, 4)]

# Zero-trim the data
X = trim_zeros(X)

API reference

sequentia.preprocessing.trim_zeros(X)[source]

Trim zero-observations from the input observation sequence(s).

Parameters:
X: numpy.ndarray or List[numpy.ndarray]

An individual observation sequence or a list of multiple observation sequences.

Returns:
trimmed: numpy.ndarray or List[numpy.ndarray]

The zero-trimmed input observation sequence(s).