Less Data, More Power: RFT is Here, and It is Changing the Game for AI Training
Why Every ML Engineer Should Care About Reinforcement Fine-Tuning
Four years ago, when we started Predibase, we had a simple but ambitious goal: make it easy for any developer to build and deploy models that truly fit the task. Since then we have seen one major roadblock again and again - getting high-quality labeled data.
Today we are launching the first End-to-End Reinforcement Fine-Tuning Platform and eliminating the need for massive labeled datasets.
This is not an incremental improvement - it is a leap forward in how AI models are trained and deployed. Seeing our platform outperform much larger models with reward functions instead of complex prompts as the guide is how I see the future of specialized AI.
This launch is foundational for us. At the very start, we knew model fine-tuning had to be simpler, faster, and more accessible. But over the years, I’ve seen firsthand how many teams, no matter how talented, struggled with the sheer effort required to get great results.
Why RFT and Why Now?
We all got a taste of managed RFT when OpenAI teased their early work, and DeepSeek-R1 proved that reinforcement learning can supercharge LLMs with minimal data. But until now, getting RFT into production has been complex and out of reach for most teams, especially when they lack labeled data.
⚡ That changes today. Our RFT platform lets you fine-tune and deploy enterprise grade AI models without needing thousands of labeled examples.
How It Works
Reinforcement Fine-Tuning takes a different approach from traditional supervised fine-tuning (SFT). Instead of relying solely on labeled data, RFT trains models using reward functions—allowing them to learn dynamically and refine their outputs over time. This is a huge improvement for:
✅ Code generation (where correctness can be objectively verified)
✅ Complex reasoning tasks (where factual accuracy and step-by-step logic matter)
✅ RAG workflows (where ensuring the right answer is retrieved is key)
Now you can:
🚀 Train models without massive labeled datasets
⚡ Fine-tune with just a few dozen examples
🎯 Achieve 20%+ higher accuracy than GPT-4 on specialized tasks
🛠 Deploy high-performance models seamlessly
The Engineering Behind It
Best of both worlds – mixing supervised fine-tuning warm-up with GRPO for reinforcement learning.
LoRAX-powered serving for continuous training – instantly loads fine-tuned checkpoints with near-zero latency and evaluates at each step
Streaming micro-batch optimization – keeps GPU utilization at nearly 100%
Secure reward function execution – lets you customize RFT safely in an isolated environment
How we prove it! The PyTorch-to-Triton Model
To prove the power of RFT, we fine-tuned a small, single-GPU model (Qwen2.5-Coder-32B-instruct) to translate PyTorch to Triton (something even large foundation models struggle with). Then we benchmarked kernel correctness against other larger foundation models including DeepSeek-R1, Claude 3.7 Sonnet and OpenAI o1.
🔥 Results: Our RFT model outperformed OpenAI o1 and DeepSeek-R1 by 66%+ in correctness despite being significantly smaller.
We ran our benchmarks on the Kernelbench dataset, which has 250 different well-defined tasks designed to assess an LLM’s ability to transpile code into a valid, efficient kernel–and our model delivered remarkable results.
I’m very proud of what my team has built, and even more excited for what this unlocks for developers and enterprises.
Until next time—keep fine-tuning (the right way).
Dev 🔍
PS: I’d love for you to try it out and let me know what you think:
Join me for the live webinar - I’ll be doing the demo’ing how to build custom models with RFT on March 27
If this post made you rethink how you fine-tune AI, subscribe.
Got thoughts? I read every comment—drop yours below.