The evolution of model training, from our eyes
When we started Predibase, our vision was simple – democratize deep learning. We saw that deep learning models (particularly pretrained neural networks) were incredibly powerful at the companies we worked at like Google, Uber and Apple but training these models was still the domain of expert data scientists and PhDs far outside the grasp of most engineers.
We had an idea of how to change that, inspired by my co-founder’s open source project Ludwig that allowed anyone to declaratively define and train compositional neural networks. It was incredibly cool technology, and made training the latest models much easier –
And then large language models came out and started to democratize deep learning even better.
LLMs felt magical because they came pre-configured with lots of knowledge, enabling developers to generate quality outputs with zero training. But as people have gotten more advanced with LLMs, a new trend has emerged – “post-training”.
So what exactly is post-training, why’s it called that and why do we think it matters?
What is post-training?
When I studied machine learning over a decade ago, you simply trained a model (typically many models using hyperparameter sweeps) and then ran predictions.
Pre-trained DL models like BERT introduced a new paradigm that LLMs took to the next level; a separation between pre-training and post-training.
Put simply, pre-training is the process by which you get the initial model weights for any base LLM – whether it is Llama4 or GPT-4o. It trains the model on how to complete a task, typically token prediction for auto-regressive models like many of the most popular ones based on the transformers architecture – predict what word, token or frame should go next into the blank based on the patterns I’ve seen before.
Pre-trained LLMs are incredibly powerful but also frustratingly general technology.
Once you have your base model however, there are a set of additional training procedures you can apply after the initial model is trained to further improve, adapt and specialize the model to solve specific tasks or understand new domains. These are referred to as post-training.
How do people post-train today?
There are a few well established techniques in post-training
Continued pre-training – as the name suggests, it’s a continuation of the same pre-training techniques that created the base model in the first place but typically in a new domain, or exposure to data the model had not seen during it’s normal pre-training phase.
Supervised Fine-tuning – Supervised fine-tuning is what most people think about when they consider post-training / fine-tuning; it teaches the model how to specialize in a task by giving it lots (typically hundreds or thousands) of examples of desired inputs and outputs.
Reinforcement fine-tuning (read about RFT here) – A newer technique in the market, that looks to teach a model to adapt to a task or domain by incentivizing the right behavior via reward functions, rather than labeled data.
Direct preference optimization (DPO) – a fine-tuning technique that allows users to provide preferred answers (usually expressed as preference pairs between a less ideal and more ideal answer) to help guide the model to the kinds of answers a user would like to see more often.
Reinforcement learning with human feedback – in some ways, RLHF is a combination of multiple of the techniques above. In this process, a user typically gives preference data, which is used to train a reward model that guides the subsequent behavior of the LLM.
Why post-train?
Pre-trained models are powerful, but they are also blunt instruments. They’ve been trained on massive, general-purpose datasets to predict the next token. But they are not tuned to your product, your users, or your domain. That’s where post-training comes in.
Post-training allows you to take a general LLM and push it in a more useful direction:
✅ Specialization. Adapting models to understand your domain-specific language (medical, legal, technical, etc)
✅ Alignment to specific tasks. Making the model behave in the way you want it to, like generate concise answers, follow instructions, summarize in a specific way.
✅ Optimize performance. Speed up inference with smaller, better-tuned weights, improve accuracy, reduce hallucinations.
✅ Data efficiency. With techniques like RFT you can get high-performing models with just a few labeled examples.
With pre-training your model stops being a generalist and starts becoming yours. For most production use cases it is no longer a nice-to-have, it is a requirement.
Where is post-training headed next?
Today, post-training is often thought of a separate process to deploying a model into production. But I believe AI workloads in the future will have these two concepts tied together much more tightly – you deploy a general model into the world and then use collections of data from live prod traffic and human feedback to post-train a model that is specialized for your task.
Predibase today has both of these pieces, as an inference and post-training platform. Customers can deploy any model immediately inside the platform, and also fine-tune, do RFT or continued pre-training for those models.
Today these processes are typically done separately. But over time, where I see the market heading is one where people marry the workflow of running inference and post-training to create a flywheel where models get better as a function of being used.
Want to try post training on your own model? Start here
If this post made you rethink how you fine-tune AI, subscribe.
Got thoughts? I read every comment—drop yours below.