Datasets¶

Sequentia provides a selection of sample sequential datasets for quick experimentation.

Each dataset follows the interface described below.

API reference¶

Utility wrapper for a generic sequential dataset.

`__init__`(X[, y, lengths, classes])	Initialize a `SequentialDataset`.
`copy`()	Create a copy of the dataset.
`iter_by_class`()	Subset the observation sequences by class.
`load`(path, /)	Load a stored dataset in `.npz` format.
`save`(path, /, *[, compress])	Store the dataset in `.npz` format.
`split`(*[, test_size, train_size, ...])	Split the dataset into two partitions (train/test).

`X`	Observation sequences.
`X_lengths`	Observation sequences and corresponding lengths.
`X_y`	Observation sequences and corresponding outputs.
`X_y_lengths`	Observation sequences and corresponding outputs and lengths.
`classes`	Set of unique classes in `y`.
`idxs`	Observation sequence start and end indices.
`lengths`	Lengths corresponding to `X`.
`y`	Outputs corresponding to `X`.

class sequentia.datasets.base.SequentialDataset¶

Utility wrapper for a generic sequential dataset.

__init__(X, y=None, *, lengths=None, classes=None)¶

Parameters:

self (SequentialDataset) –
X (ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]]) – Sequence(s).
y (ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]] | None) – Outputs corresponding to sequence(s) in X.
lengths (ndarray[Any, dtype[int64]] | None) –
Lengths of the sequence(s) provided in X.
- If None, then X is assumed to be a single sequence.
- len(X) should be equal to sum(lengths).
classes (list[int] | None) –
Set of possible class labels (only if y was provided with categorical values).

If not provided, these will be determined from the training data labels.

Return type:

SequentialDataset

copy()¶

Create a copy of the dataset.

iter_by_class()¶

Subset the observation sequences by class.

Returns:

Generator iterating over classes, yielding:

Return type:

Generator[tuple[numpy.ndarray, numpy.ndarray, int]]

Raises:

Parameters:

self (SequentialDataset) –

classmethod load(path, /)¶

Load a stored dataset in .npz format.

Parameters:: path (str | Path | IO) – Location to store the dataset.
Returns:: The loaded dataset.
Return type:: SequentialDataset