Source: texcla/data.py#L0
Dataset
Dataset.labels
Dataset.num_classes
Dataset.__init__
__init__(self, X, y, tokenizer=None)
Encapsulates all pieces of data to run an experiment. This is basically a bag of items that makes it easy to serialize and deserialize everything as a unit.
Args:
- X: The raw model inputs. This can be set to None if you dont want to serialize this value when you save the dataset.
- y: The raw output labels.
- tokenizer: The optional test indices to use. Ideally, this should be generated one time and reused
across experiments to make results comparable.
generate_test_indices
can be used generate first time indices. **kwargs: Additional key value items to store.