Reinforcemect Learning

Learn how to Evaluate LLMs and Algorithms — The Right Way

Never miss a brand new edition of , our weekly newsletter featuring a top-notch collection of editors’ picks, deep dives, community news, and more. Subscribe today! All of the labor it takes to integrate large language...

Benchmarking Tabular Reinforcement Learning Algorithms

posts, we explored Part I of the seminal book by Sutton and Barto (*). In that section, we delved into the three fundamental techniques underlying nearly every modern Reinforcement Learning (RL)...

The best way to Train LLMs to “Think” (o1 & DeepSeek-R1)

In September 2024, OpenAI released its o1 model, trained on large-scale reinforcement learning, giving it “advanced reasoning” capabilities. Unfortunately, the small print of how they pulled this off were never shared publicly. Today, nevertheless,...

How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo

Welcome to part 2 of my LLM deep dive. If you happen to’ve not read Part 1, I highly encourage you to ascertain it out first.  Previously, we covered the primary two major stages of...

Reinforcement Learning with PDEs

Previously we discussed applying reinforcement learning to Extraordinary Differential Equations (ODEs) by integrating ODEs inside gymnasium. ODEs are a strong tool that may describe a wide selection of systems but are limited to a...

Recent posts

Popular categories

ASK ANA