Source: texcla/experiment.py#L0


create_experiment_folder

create_experiment_folder(base_dir, model, lr, batch_size)

copy_called_file

copy_called_file(exp_path)

create_callbacks

create_callbacks(exp_path, patience)

train

train(model, word_encoder_model, lr=0.001, batch_size=64, epochs=50, patience=10, \
    base_dir="experiments", **fit_args)

load_csv

load_csv(data_path=None, text_col="text", class_col="class", limit=None)

process_save

process_save(X, y, tokenizer, proc_data_path, max_len=400, save_tokenizer=True)

Process text and save as Dataset


setup_data

setup_data(X, y, tokenizer, proc_data_path, **kwargs)

Setup data

Args:

  • X: text data,
  • y: data labels,
  • tokenizer: A Tokenizer instance
  • proc_data_path: Path for the processed data

split_data

split_data(X, y, ratio=(0.8, 0.1, 0.1))

Splits data into a training, validation, and test set.

Args:

  • X: text data
  • y: data labels
  • ratio: the ratio for splitting. Default: (0.8, 0.1, 0.1)

Returns:

split data: X_train, X_val, X_test, y_train, y_val, y_test


setup_data_split

setup_data_split(X, y, tokenizer, proc_data_dir, **kwargs)

Setup data while splitting into a training, validation, and test set.

Args:

  • X: text data,
  • y: data labels,
  • tokenizer: A Tokenizer instance
  • proc_data_dir: Directory for the split and processed data

load_data_split

load_data_split(proc_data_dir)

Loads a split dataset

Args:

  • proc_data_dir: Directory with the split and processed data

Returns:

(Training Data, Validation Data, Test Data)