`Dataset`

All toy datasets in the sequentia.datasets module return an object of this type.

API reference

class sequentia.datasets.Dataset(X, y, classes, random_state=None)[source]

Represents a generic dataset.

Parameters

X: array-like: Data instances.
y: array-like: Labels corresponding to data instances.
classes: array-like: The complete set of possible classes/labels.
random_state: numpy.random.RandomState, int, optional: A random state object or seed for reproducible randomness.

Fetch the instances and labels.

Returns

iter_by_class()[source]

Generator for iterating through instances partitioned by class.

Returns

instances: generator yielding (instances, class): Instances belong to each class.

split(split_size, stratify=True, shuffle=True)[source]

Splits the dataset into two smaller Dataset objects.

Parameters

split_size: 0 < float < 1: Proportion of instances to be allocated to the second split.
stratify: bool: Whether or not stratify the split on the labels such that each split has a similar distribution of labels.
shuffle: bool: Whether or not to shuffle the data before partitioniing it.

Returns