on Real-World Problems is Hard
Reinforcement learning looks straightforward in controlled settings: well-defined states, dense rewards, stationary dynamics, unlimited simulation. Most benchmark results are produced under those assumptions.
Observations are partial and noisy, rewards...
As an alternative of a single, centralized computing cluster, 10 billion parameter models have emerged, trained on globally distributed computing hardware. It is alleged that that is the primary time that a 10B large...