Datasets¶
Sequentia provides a selection of sample sequential datasets for quick experimentation.
Each dataset follows the interface described below.
API reference¶
Class¶
Utility wrapper for a generic sequential dataset. |
Methods¶
|
Initializes a |
|
Creates a copy of the dataset. |
Subsets the observation sequences by class. |
|
|
Loads a stored dataset in |
|
Stores the dataset in |
|
Splits the dataset into two partitions (train/test). |
Properties¶
Observation sequences. |
|
Observation sequences and corresponding lengths. |
|
Observation sequences and corresponding outputs. |
|
Observation sequences and corresponding outputs and lengths. |
|
Set of unique classes in |
|
Observation sequence start and end indices. |
|
Lengths corresponding to |
|
Outputs corresponding to |
- class sequentia.utils.SequentialDataset[source]¶
Utility wrapper for a generic sequential dataset.
- __init__(X, y=None, lengths=None, classes=None)[source]¶
Initializes a
SequentialDataset
.- Parameters:
X (Array) –
Univariate or multivariate observation sequence(s).
Should be a single 1D or 2D array.
Should have length as the 1st dimension and features as the 2nd dimension.
Should be a concatenated sequence if multiple sequences are provided, with respective sequence lengths being provided in the
lengths
argument for decoding the original sequences.
y (Optional[Array]) – Outputs corresponding to sequence(s) provided in
X
.lengths (Optional[Array]) –
Lengths of the observation sequence(s) provided in
X
.If
None
, thenX
is assumed to be a single observation sequence.len(X)
should be equal tosum(lengths)
.
classes (Optional[Array]) –
Set of possible class labels (only if
y
was provided with categorical values).If not provided, these will be determined from the training data labels.
- Return type:
- iter_by_class()[source]¶
Subsets the observation sequences by class.
- Raises:
AttributeError
- Ify
was not provided to__init__()
, or is not categorical.- Returns:
Generator iterating over classes, yielding:
X
subset of sequences belonging to the class.Lengths corresponding to the
X
subset.Class used to subset
X
.
- Return type:
Iterator[Tuple[Array, Array, int]]
- classmethod load(path)[source]¶
Loads a stored dataset in
.npz
format.See
numpy.load()
.- Parameters:
path (Union[str, Path, IO]) – Location to store the dataset.
- Returns:
The loaded dataset.
- Return type:
See also
save
Stores the dataset in
.npz
format.
- save(path, compress=True)[source]¶
Stores the dataset in
.npz
format.See
numpy.savez()
andnumpy.savez_compressed()
.- Parameters:
path (Union[str, Path, IO]) – Location to store the dataset.
compress (bool) – Whether or not to compress the dataset.
See also
load
Loads a stored dataset in
.npz
format.
- split(test_size=None, train_size=None, random_state=None, shuffle=True, stratify=False)[source]¶
Splits the dataset into two partitions (train/test).
See
sklearn.model_selection.train_test_split()
.- Parameters:
test_size (Optional[Union[NonNegativeInt, ConstrainedFloatValue]]) – Size of the test partition.
train_size (Optional[Union[NonNegativeInt, ConstrainedFloatValue]]) – Size of the train partition.
random_state (Optional[Union[NonNegativeInt, RandomState]]) – Seed or
numpy.random.RandomState
object for reproducible pseudo-randomness.shuffle (bool) – Whether or not to shuffle the data before splitting. If
shuffle=False
thenstratify
must beFalse
.stratify (bool) – Whether or not to stratify the partitions by class labels.
- Returns:
Dataset partitions.
- Return type:
Tuple[SequentialDataset, SequentialDataset]
- property X: Array¶
Observation sequences.
- property X_lengths: Tuple[Array, Array]¶
Observation sequences and corresponding lengths.
- property X_y: Tuple[Array, Array]¶
Observation sequences and corresponding outputs.
- Raises:
AttributeError
- Ify
was not provided to__init__()
.
- property X_y_lengths: Tuple[Array, Array, Array]¶
Observation sequences and corresponding outputs and lengths.
- Raises:
AttributeError
- Ify
was not provided to__init__()
.
- property classes: Optional[Array]¶
Set of unique classes in
y
. Ify
is not categorical, thenNone
.
- property idxs: Array¶
Observation sequence start and end indices.
- property lengths: Array¶
Lengths corresponding to
X
.
- property y: Array¶
Outputs corresponding to
X
.- Raises:
AttributeError
- Ify
was not provided to__init__()
.