Google Research has unveiled TabFM, a foundation model designed to perform classification and regression on tabular data without any dataset-specific training, aiming to simplify how organisations build predictive models on structured data. The model was announced on June 30 and is positioned to streamline analytics workflows that traditionally rely on extensive feature engineering, hyperparameter tuning and repeated training cycles for each new dataset.
TabFM reframes tabular prediction as an in-context learning problem, treating labelled examples and the rows to be predicted as a single input context and generating outputs in a single forward pass. This approach allows the system to work with new tables without updating model weights, while still requiring historical rows with known labels to define the task at inference time.
Under the hood, the architecture applies alternating row and column attention over raw table data, compresses each row into a dense vector and then uses a dedicated Transformer to perform in-context learning on these compressed representations. Google trained TabFM entirely on hundreds of millions of synthetic datasets generated using structural causal models, a strategy intended to overcome what its researchers describe as a critical scarcity of diverse, high-quality open-source tabular datasets.
In benchmark evaluations, TabFM ranks first on TabArena, a “living” leaderboard that scores models using an Elo rating system across 38 classification and 13 regression datasets, spanning sample sizes from hundreds to 150,000 rows. A stronger ensemble configuration layers additional techniques, including non-negative least squares, singular value decomposition features and Platt scaling, on top of the base model’s predictions.
As part of the launch, Google has made TabFM’s weights available on Hugging Face under a non-commercial licence, while publishing usage code and samples on GitHub under the Apache 2.0 licence. Google also plans to integrate the model directly into BigQuery, enabling users to run zero-shot classification and regression through a single AI.PREDICT SQL command within the coming weeks, using the same interface they rely on for standard analytical queries.
For data teams, the combination of pre-trained weights, open tooling and native BigQuery access is intended to lower the operational overhead of deploying tabular machine-learning models, particularly for use cases such as churn prediction and fraud detection that are already embedded in enterprise datasets.
Read Article: Government Examines WhatsApp Usernames Feature Amid Fraud Concerns

