NVIDIA researchers on Friday won a key Kaggle competition many in the sphere treat as a real-time pulse check on humanity’s progress toward artificial general intelligence (AGI).Â
Ivan Sorokin and Jean-Francois Puget, two members of the Kaggle Grandmasters of NVIDIA (KGMoN), got here in first on the Kaggle ARC Prize 2025 public leaderboard with a 27.64% rating by constructing an answer evaluated on the identical dataset behind the ARC-AGI-2 benchmark.Â
The team, which called itself NVARC, fine-tuned a 4B model variant that outperformed far larger, costlier models on the identical benchmark at just 20 cents per task. It showcased not only state-of-the-art results, but a breakthrough in scalable, economical AGI-style reasoning.
The ARC-AGI benchmark measures how well AI systems perform abstract reasoning after which generalize from only a few examples using grid-based visual puzzles. ARC-AGI-2 is a harder, updated version that removes overlap with public training data. It’s explicitly designed to withstand shortcuts and brute-force memorization, making it a sharper test of true systematic abstraction.
The ARC-AGI benchmark has turn into probably the most closely watched indicators of real progress toward general reasoning in AI. Unlike typical machine learning benchmarks, ARC-AGI tasks can’t be solved through scale, memorization, or pattern scraping. Each puzzle is a tiny grid with only a handful of examples, forcing systems to infer abstract rules—and apply them to a brand-new test case. Scores on the more-difficult ARC-AGI-2 are widely viewed as a proxy for a way well an AI system can learn from almost nothing.
That’s why the Kaggle ARC Prize 2025 leaderboard matters: It’s essentially the most open, reproducible arena where researchers test AGI-style reasoning under strict compute and deadlines.
The winning NVIDIA NVARC solution wasn’t powered by giant models or brute-force search. As an alternative, it leaned on three ideas any developer can appreciate: synthetic data, test-time training, and disciplined engineering.
Heavyweight LLM reasoning methods—chain-of-thought, tool use, even RL-style agents—couldn’t fit inside Kaggle’s tight runtime. So NVARC flipped the strategy: Move all complex reasoning offline into an artificial data pipeline, and train smaller models able to running fast during evaluation.
Using staged puzzle generation, concept decomposition, and progressively stronger open-weight models, the team built a various synthetic corpus of ARC-style tasks. Final models only needed to acknowledge and adapt patterns, slightly than execute full program-search logic. Test-time training learns each puzzle’s specifics from its tiny example set—a method that has turn into essential for leading ARC-AGI performance.
The result was a compact, cost-efficient ensemble that outperformed much larger systems and set a brand new bar on ARC-AGI-2, showing how synthetic data and adaptive learning can push reasoning forward.
​To successfully construct out these winning solutions, the team leveraged the NVIDIA NeMo suite of tools, including NeMo RL for scalable reinforcement learning, and NeMo Skills for streamlining SDG pipelines.
Learn more in regards to the technical details from NVARC’s writeup on Kaggle and watch this interview with ARC.
