Dataset
All toy datasets in the sequentia.datasets
module return an object of this type.
API reference
- class sequentia.datasets.Dataset(X, y, classes, random_state=None)[source]
Represents a generic dataset.
- Parameters
- X: array-like
Data instances.
- y: array-like
Labels corresponding to data instances.
- classes: array-like
The complete set of possible classes/labels.
- random_state: numpy.random.RandomState, int, optional
A random state object or seed for reproducible randomness.
- data()[source]
Fetch the instances and labels.
- Returns
- X: array-like
Data instances.
- y: array-like
Labels corresponding to data instances.
- iter_by_class()[source]
Generator for iterating through instances partitioned by class.
- Returns
- instances: generator yielding
(instances, class)
Instances belong to each class.
- instances: generator yielding
- split(split_size, stratify=True, shuffle=True)[source]
Splits the dataset into two smaller
Dataset
objects.- Parameters
- split_size: 0 < float < 1
Proportion of instances to be allocated to the second split.
- stratify: bool
Whether or not stratify the split on the labels such that each split has a similar distribution of labels.
- shuffle: bool
Whether or not to shuffle the data before partitioniing it.
- Returns
- split_1:class:Dataset
First dataset split.
- split_2:class:Dataset
Second dataset split.