Datasets¶
Sequentia provides a selection of sample sequential datasets for quick experimentation.
Each dataset follows the interface described below.
API reference¶
Class¶
Utility wrapper for a generic sequential dataset. |
Methods¶
|
Initialize a |
|
Create a copy of the dataset. |
Subset the observation sequences by class. |
|
|
Load a stored dataset in |
|
Store the dataset in |
|
Split the dataset into two partitions (train/test). |
Properties¶
Observation sequences. |
|
Observation sequences and corresponding lengths. |
|
Observation sequences and corresponding outputs. |
|
Observation sequences and corresponding outputs and lengths. |
|
Set of unique classes in |
|
Observation sequence start and end indices. |
|
Lengths corresponding to |
|
Outputs corresponding to |
- class sequentia.datasets.base.SequentialDataset¶
Utility wrapper for a generic sequential dataset.
- __init__(X, y=None, *, lengths=None, classes=None)¶
Initialize a
SequentialDataset
.- Parameters:
self (SequentialDataset) –
X (ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]]) – Sequence(s).
y (ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]] | None) – Outputs corresponding to sequence(s) in
X
.lengths (ndarray[Any, dtype[int64]] | None) –
Lengths of the sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single sequence.len(X)
should be equal tosum(lengths)
.
Set of possible class labels (only if
y
was provided with categorical values).If not provided, these will be determined from the training data labels.
- Return type:
- copy()¶
Create a copy of the dataset.
- Returns:
Dataset copy.
- Return type:
- Parameters:
self (SequentialDataset) –
- iter_by_class()¶
Subset the observation sequences by class.
- Returns:
Generator iterating over classes, yielding:
X
subset of sequences belonging to the class.Lengths corresponding to the
X
subset.Class used to subset
X
.
- Return type:
- Raises:
AttributeError – If
y
was not provided to__init__()
.TypeError – If
y
was provided but was not categorical.
- Parameters:
self (SequentialDataset) –
- classmethod load(path, /)¶
Load a stored dataset in
.npz
format.See
numpy.load()
.- Parameters:
- Returns:
The loaded dataset.
- Return type:
See also
save
Stores the dataset in
.npz
format.
- save(path, /, *, compress=True)¶
Store the dataset in
.npz
format.See
numpy.savez()
andnumpy.savez_compressed()
.- Parameters:
compress (bool) – Whether or not to compress the dataset.
self (SequentialDataset) –
- Return type:
None
See also
load
Loads a stored dataset in
.npz
format.
- split(*, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=False)¶
Split the dataset into two partitions (train/test).
See
sklearn.model_selection.train_test_split()
.- Parameters:
self (SequentialDataset) –
test_size (int | float | None) – Size of the test partition.
train_size (int | float | None) – Size of the training partition.
random_state (int | RandomState | None) – Seed or
numpy.random.RandomState
object for reproducible pseudo-randomness.shuffle (bool) – Whether or not to shuffle the data before splitting. If
shuffle=False
thenstratify
must beFalse
.stratify (bool) – Whether or not to stratify the partitions by class label.
- Returns:
Dataset partitions.
- Return type:
- property X: ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]]¶
Observation sequences.
- Returns:
Observation sequences.
- Return type:
- property X_lengths: dict[str, ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]]]¶
Observation sequences and corresponding lengths.
- Returns:
Mapping with keys:
"X"
for observation sequences,"lengths"
for lengths.
- Return type:
- property X_y: dict[str, ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]]]¶
Observation sequences and corresponding outputs.
- Returns:
Mapping with keys:
"X"
for observation sequences,"y"
for outputs.
- Return type:
- Raises:
AttributeError – If
y
was not provided to__init__()
.
- property X_y_lengths: dict[str, ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]]]¶
Observation sequences and corresponding outputs and lengths.
- Returns:
Mapping with keys:
"X"
for observation sequences,"y"
for outputs,"lengths"
for lengths.
- Return type:
- Raises:
AttributeError – If
y
was not provided to__init__()
.
- property classes: ndarray[Any, dtype[int64]] | None¶
Set of unique classes in
y
.- Returns:
Unique classes if
y
is categorical.- Return type:
numpy.ndarray | None
- property idxs: ndarray[Any, dtype[int64]]¶
Observation sequence start and end indices.
- Returns:
Start and end indices for each sequence in
X
.- Return type:
- property lengths: ndarray[Any, dtype[int64]]¶
Lengths corresponding to
X
.- Returns:
Lengths for each sequence in
X
.- Return type:
- property y: ndarray[Any, dtype[float64]] | ndarray[Any, dtype[int64]]¶
Outputs corresponding to
X
.- Returns:
Sequence outputs.
- Return type:
- Raises:
AttributeError – If
y
was not provided to__init__()
.