FAQ

Why do I need TabularTransformer?¶

TabularTransformer provides a convenient deep learning framework specifically designed for tabular data domain. For deep learning practitioners, it’s easy to get started with. In contrast, tree-based models use different methodologies that may not be as familiar. Additionally, TabularTransformer can offer competitive performance for large-scale tabular data.

Can the TabularTransformer outperform XGBoost or LightGBM?¶

It depends on the dataset size and complexity. For datasets with fewer than 50,000 samples, tree-based models like XGBoost or LightGBM often perform better. However, when working with millions of samples, the TabularTransformer can potentially outperform them.

The key to unlocking the full potential of the TabularTransformer lies in the quality and size of dataset. It's crucial to invest time—about 90%—into curating the data and deeply understanding the domain problem. Ensuring diversity in the tabular features and maintaining high data quality are essential, as hyperparameter optimization alone offers limited improvements in performance.

Would the TabularTransformer help me win a Kaggle competition?¶

It can certainly help, but winning a competition often involves some luck. To earn a medal, it's usually important to ensemble different models, rather than relying on a single one. A deep understanding of the competition task and data is critical, sometimes a good model alone isn't enough to secure a win.