Less Labeling – More Learning. The New Era of fine-tuning with RFT

The Fine-Tuning Evolution From Guesswork to Reinforcement Fine-Tuning

Mar 03, 2025

Hey Builders,
Let’s talk AI. Not the hype—the details.

At the end of 2024, we sat in a board meeting reflecting on the AI trends defining 2025. Agentic AI was dominating the conversation then.

But our audience, the builders, weren’t asking for better agent orchestration frameworks. There were great open source frameworks for that. They mostly wanted a way to get better model quality. This was especially true for those that were building with agents.

Here’s the problem: when AI models don’t generalize well, you chain together three 90% accurate LLM calls in an agentic flow, and suddenly, your use case accuracy drops below 72%.

So what do engineers do? They start with prompt engineering roulette—where adding the word "please" mysteriously changes the output. Eventually, the serious ones move to Supervised Fine-Tuning (SFT), layering more labeled data until gradient descent does its thing.

The AI Fine-Tuning Myth That’s Finally Dying

The #1 problem with fine-tuning? It used to mean one thing:

Collect thousands of labeled examples → fine-tune with supervised learning → pray for better accuracy.

That worked… but only for teams that had endless data and time.

We at Predibase bet on fine-tuning early—so much so that we made T-shirts that said:
"The Future is Fine-Tuned." And we were right—custom models fine-tuned on task-specific data consistently outperform generic prompt-engineered models.

The problem is that most teams don’t have perfect labeled data lying around.
They’re stuck in a loop:

❌ Manually labeling thousands of examples (slow, expensive)
❌ Prompt-engineering hacks (unpredictable, fragile)
❌ Waiting on more data (which may never come)

We watched AI teams struggle with this over and over. Then Reinforcement Fine-Tuning (RFT) came and flipped the script.

Why RFT is blowing up 💥

Instead of brute-force memorization, RFT teaches models to reason by using reward functions that:
✅ Train a model with just a few dozen data points, i. e. encode domain intuition without needing 1K+ examples
✅ Evaluate "chains of thought" per data point (instead of static outputs)
✅ Teach the model to reason, not just memorize, incentivize correctness, formatting, and logic, not just pattern matching.

From our initial experiments, we’ve come up with the following diagram to help inform when and where to use reinforcement fine-tuning:

We are seeing this first-hand:
🚀 Checkr streamlined background checks with fine-tuned AI
📞 Convirza improved call analytics using RFT
📊 Teams are hitting production-grade accuracy with fewer examples

This means models learn the right behavior—not just copy-paste past responses. The results we see are insane.

I am committed to building the platform that helps orgs go from generalized intelligence to specialized models customized for their data and task. RFT has changed the game in how we think this is going to be done, especially for a subset of tasks like code generation, that are easily verifiable and hard to come across high quality labeled data for.

Until next time—keep fine-tuning (the right way).

Dev 🔍

If this post made you rethink how you fine-tune AI, subscribe.
Got thoughts? I read every comment—drop yours below.

That’s it for now. See you in the details.

Less Labeling – More Learning. The New Era of fine-tuning with RFT

The Fine-Tuning Evolution From Guesswork to Reinforcement Fine-Tuning

The AI Fine-Tuning Myth That’s Finally Dying

Why RFT is blowing up 💥

Discussion about this post