Source: texcla/data.py#L0


Dataset

Dataset.labels

Dataset.num_classes


Dataset.__init__

__init__(self, X, y, tokenizer=None)

Encapsulates all pieces of data to run an experiment. This is basically a bag of items that makes it easy to serialize and deserialize everything as a unit.

Args:

  • X: The raw model inputs. This can be set to None if you dont want to serialize this value when you save the dataset.
  • y: The raw output labels.
  • tokenizer: The optional test indices to use. Ideally, this should be generated one time and reused across experiments to make results comparable. generate_test_indices can be used generate first time indices. **kwargs: Additional key value items to store.