HyperParameters
HyperParameters¶
Hyperparameters for Transformer model and AdamW optimizer
- dim (int): Dimension of embedding. Default is 64.
- n_layers (int): Number of Transformer layers. Default is 6.
- n_heads (int): Number of attention heads. Default is 8.
- output_hidden_dim (int): Hidden layer dimension of output MLP head. Default is 128.
- output_forward_dim (int): Dimension to squeeze the embedding before concatenation. Default is 8.
- multiple_of (int): Hidden dimension will be a multiple of this value. Default is 32.
- dropout (float): Dropout ratio. Default is 0.0.
- weight_decay (float): Weight decay parameter in AdamW optimizer. Default is 0.1.
- beta1 (float): Beta1 parameter in AdamW optimizer. Default is 0.9.
- beta2 (float): Beta2 parameter in AdamW optimizer. Default is 0.95.
TrainSettings¶
Training settings and configurations.
- out_dir (str): Output directory for checkpoints and predictions. Default is "out".
- log_interval (int): Interval of iterations for logging to the terminal. Default is 1.
- eval_only (bool): If True, the script exits after the first evaluation. Default is False.
- wandb_log (bool): Enable logging with Weights & Biases. Default is False.
- wandb_project (str): Weights & Biases project name. Default is "TabularTransformer".
- wandb_run_name (str): Weights & Biases run name. Default is "run".
- min_cat_count (float): Minimum category count for valid classes; others labeled as
UNKNOWN. Default is 0.02. - apply_power_transform (bool): Apply power transform to numerical columns. Default is True.
- unk_ratio_default (float): Default percentage of tabular values to be randomly masked as unknown during training. Default is 0.2.
- dataset_seed (int): Seed for dataset loader. Default is 42.
- torch_seed (int): Seed for PyTorch. Default is 1377.
- dataset_device (str): Device to load the dataset when tokenized. Default is "cpu".
- device (str): Training device (e.g., 'cpu', 'cuda'). Default is "cuda".
- dtype (Literal): PyTorch data type for training ('float32', 'bfloat16', 'float16'). Default is "bfloat16".
TrainParameters¶
Parameters for the training process.
- max_iters (int): Total number of training iterations. Default is 100000.
- batch_size (int): Batch size per iteration. Default is 128.
- output_dim (int): Output dimension of the model. Default is 1.
- loss_type (Literal): Type of loss function ('BINCE', 'MULCE', 'MSE', 'SUPCON').
BINCE:torch.nn.functional.binary_cross_entropy_with_logits,MULCE:torch.nn.functional.cross_entropy,MSE:torch.nn.functional.mse_loss,SUPCON:Supervised Contrastive Loss, see arXiv:2004.11362, Default is 'BINCE'. - eval_interval (int): Interval of iterations to start an evaluation. Default is 100.
- eval_iters (int): Number of iterations to run during evaluation. Default is 100.
- validate_split (float): Proportion of training data used for validation. Default is 0.2.
- unk_ratio (Dict[str, float]): Unknown ratio for specific columns, overrides
unk_ratio_default. Default is{}. - learning_rate (float): Learning rate for the optimizer. Default is 5e-4.
- transformer_lr (float): Learning rate for the transformer part; overrides
learning_rateif set. Default isNone. - output_head_lr (float): Learning rate for the output head; overrides
learning_rateif set. Default isNone. - warmup_iters (int): Number of iterations for learning rate warm-up. Default is 1000.
- lr_scheduler (Literal): Type of learning rate scheduler ('constant', 'cosine'). Default is 'cosine'.
- checkpoint (str): Checkpoint file name for saving and loading. Default is "ckpt.pt".
- input_checkpoint (str): Input checkpoint file for resuming training, overrides
checkpointif set. - output_checkpoint (str): Output checkpoint file name for saving, overrides
checkpointif set.