Dataset

All toy datasets in the sequentia.datasets module return an object of this type.

API reference

class sequentia.datasets.Dataset(X, y, classes, random_state=None)[source]

Represents a generic dataset.

Parameters
X: array-like

Data instances.

y: array-like

Labels corresponding to data instances.

classes: array-like

The complete set of possible classes/labels.

random_state: numpy.random.RandomState, int, optional

A random state object or seed for reproducible randomness.

data()[source]

Fetch the instances and labels.

Returns
X: array-like

Data instances.

y: array-like

Labels corresponding to data instances.

iter_by_class()[source]

Generator for iterating through instances partitioned by class.

Returns
instances: generator yielding (instances, class)

Instances belong to each class.

split(split_size, stratify=True, shuffle=True)[source]

Splits the dataset into two smaller Dataset objects.

Parameters
split_size: 0 < float < 1

Proportion of instances to be allocated to the second split.

stratify: bool

Whether or not stratify the split on the labels such that each split has a similar distribution of labels.

shuffle: bool

Whether or not to shuffle the data before partitioniing it.

Returns
split_1:class:Dataset

First dataset split.

split_2:class:Dataset

Second dataset split.