Zencoder has hired a bunch of search engine veterans to assist it construct a tool that may analyze large codebases and work out what’s and isn’t relevant. This detailed context reduces hallucinations and improves the standard of code that giant language models can produce, says Filev: “We call it repo grokking.”
Cosine also thinks context is essential. However it draws on that context to create a brand new kind of knowledge set. The corporate has asked dozens of coders to record what they were doing as they worked through tons of of various programming tasks. “We asked them to jot down down all the pieces,” says Pullen: “Why did you open that file? Why did you scroll halfway through? Why did you close up it?” In addition they asked coders to annotate finished pieces of code, marking up sections that will have required knowledge of other pieces of code or specific documentation to jot down.
Cosine then takes all that information and generates a big synthetic data set that maps the everyday steps coders take, and the sources of knowledge they draw on, to finished pieces of code. They use this data set to coach a model to work out what breadcrumb trail it’d must follow to provide a specific program, after which learn how to follow it.
Poolside, based in San Francisco, can be creating an artificial data set that captures the technique of coding, but it surely leans more on a method called RLCE—reinforcement learning from code execution. (Cosine uses this too, but to a lesser degree.)
RLCE is analogous to the technique used to make chatbots like ChatGPT slick conversationalists, often known as RLHF—reinforcement learning from human feedback. With RLHF, a model is trained to provide text that’s more like the type human testers say they favor. With RLCE, a model is trained to provide code that’s more like the type that does what it’s imagined to do when it’s run (or executed).
Gaming the system
Cosine and Poolside each say they’re inspired by the approach DeepMind took with its game-playing model AlphaZero. AlphaZero was given the steps it could take—the moves in a game—after which left to play against itself over and once more, determining via trial and error what sequence of moves were winning moves and which weren’t.
“They let it explore moves at every possible turn, simulate as many games as you possibly can throw compute at—that led all of the strategy to beating Lee Sedol,” says Pengming Wang, a founding scientist at Poolside, referring to the Korean Go grandmaster that AlphaZero beat in 2016. Before Poolside, Wang worked at Google DeepMind on applications of AlphaZero beyond board games, including FunSearch, a version trained to unravel advanced math problems.
When that AlphaZero approach is applied to coding, the steps involved in producing a chunk of code—the breadcrumbs—develop into the available moves in a game, and an accurate program becomes winning that game. Left to play by itself, a model can improve far faster than a human could. “A human coder tries and fails one failure at a time,” says Kant. “Models can try things 100 times without delay.”
