Datasets¶

Sequentia provides a selection of sample sequential datasets for quick experimentation.

Each dataset follows the interface described below.

API reference¶

Utility wrapper for a generic sequential dataset.

`__init__`(X[, y, lengths, classes])	Initializes a `SequentialDataset`.
`copy`()	Creates a copy of the dataset.
`iter_by_class`()	Subsets the observation sequences by class.
`load`(path)	Loads a stored dataset in `.npz` format.
`save`(path[, compress])	Stores the dataset in `.npz` format.
`split`([test_size, train_size, random_state, ...])	Splits the dataset into two partitions (train/test).

`X`	Observation sequences.
`X_lengths`	Observation sequences and corresponding lengths.
`X_y`	Observation sequences and corresponding outputs.
`X_y_lengths`	Observation sequences and corresponding outputs and lengths.
`classes`	Set of unique classes in `y`.
`idxs`	Observation sequence start and end indices.
`lengths`	Lengths corresponding to `X`.
`y`	Outputs corresponding to `X`.

class sequentia.utils.SequentialDataset[source]¶

Utility wrapper for a generic sequential dataset.

__init__(X, y=None, lengths=None, classes=None)[source]¶

Parameters:

X (Array) –
Univariate or multivariate observation sequence(s).
- Should be a single 1D or 2D array.
- Should have length as the 1st dimension and features as the 2nd dimension.
- Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the lengths argument for decoding the original sequences.
y (Optional[Array]) – Outputs corresponding to sequence(s) provided in X.
lengths (Optional[Array]) –
Lengths of the observation sequence(s) provided in X.
- If None, then X is assumed to be a single observation sequence.
- len(X) should be equal to sum(lengths).
classes (Optional[Array]) –
Set of possible class labels (only if y was provided with categorical values).
- If not provided, these will be determined from the training data labels.

Return type:

SequentialDataset

Creates a copy of the dataset.

iter_by_class()[source]¶

Subsets the observation sequences by class.

Raises:

AttributeError - If y was not provided to __init__(), or is not categorical.

Returns:

Generator iterating over classes, yielding:

Return type:

Iterator[Tuple[Array, Array, int]]

classmethod load(path)[source]¶

Loads a stored dataset in .npz format.

Parameters:: path (Union[str, Path, IO]) – Location to store the dataset.
Returns:: The loaded dataset.
Return type:: SequentialDataset