AI2 is developing a big language model optimized for science

Artificial Intelligence

AI2 is developing a big language model optimized for science

admin

May 11, 2023

AI2 is developing a big language model optimized for science

PaLM 2. GPT-4. The list of text-generating AI practically grows by the day.

Most of those models are walled behind APIs, making it inconceivable for researchers to see exactly what makes them tick. But increasingly, community efforts are yielding open source AI that’s as sophisticated, if no more so, than their business counterparts.

The most recent of those efforts is the Open Language Model, a big language model set to be released by the nonprofit Allen Institute for AI Research (AI2) sometime in 2024. Open Language Model, or OLMo for brief, is being developed in collaboration with AMD and the Large Unified Modern Infrastructure consortium, which provides supercomputing power for training and education, in addition to Surge AI and MosaicML (that are providing data and training code).

“The research and technology communities need access to open language models to advance this science,” Hanna Hajishirzi, the senior director of NLP research at AI2, told TechCrunch in an email interview. “With OLMo, we’re working to shut the gap between private and non-private research capabilities and knowledge by constructing a competitive language model.”

One might wonder — including this reporter — why AI2 felt the necessity to develop an open language model when there’s already several to select from (see Bloom, Meta’s LLaMA, etc.). The way in which Hajishirzi sees it, while the open source releases up to now have been beneficial and even boundary-pushing, they’ve missed the mark in various ways.

AI2 sees OLMo as a platform, not only a model — one which’ll allow the research community to take each component AI2 creates and either use it themselves or seek to enhance it. Every little thing AI2 makes for OLMo shall be openly available, Hajishirzi says, including a public demo, training data set and API, and documented with “very limited” exceptions under “suitable” licensing.

“We’re constructing OLMo to create greater access for the AI research community to work directly on language models,” Hajishirzi said. “We imagine the broad availability of all elements of OLMo will enable the research community to take what we’re creating and work to enhance it. Our ultimate goal is to collaboratively construct the perfect open language model on the earth.”

OLMo’s other differentiator, in accordance with Noah Smith, senior director of NLP research at AI2, is a give attention to enabling the model to higher leverage and understand textbooks and academic papers versus, say, code. There’s been other attempts at this, like Meta’s infamous Galactica model. But Hajishirzi believes that AI2’s work in academia and the tools it’s developed for research, like Semantic Scholar, will help make OLMo “uniquely suited” for scientific and academic applications.

“We imagine OLMo has the potential to be something really special in the sector, especially in a landscape where many are rushing to money in on interest in generative AI models,” Smith said. “AI2’s unique ability to act as third party experts gives us a chance to work not only with our own world-class expertise but collaborate with the strongest minds within the industry. Because of this, we expect our rigorous, documented approach will set the stage for constructing the following generation of secure, effective AI technologies.”

That’s a pleasant sentiment, to ensure. But what in regards to the thorny ethical and legal issues around training — and releasing — generative AI? The controversy’s raging across the rights of content owners (amongst other affected stakeholders), and countless nagging issues have yet to be settled within the courts.

To allay concerns, the OLMo team plans to work with AI2’s legal department and to-be-determined outside experts, stopping at “checkpoints” within the model-building process to reassess privacy and mental property rights issues.

“We hope that through an open and transparent dialogue in regards to the model and its intended use, we are able to higher understand find out how to mitigate bias, toxicity, and shine a light-weight on outstanding research questions throughout the community, ultimately leading to one in all the strongest models available,” Smith said.

What in regards to the potential for misuse? Models, which are sometimes toxic and biased to start with, are ripe for bad actors intent on spreading disinformation and generating malicious code.

Hajishirzi said that AI2 will use a mix of licensing, model design and selective access to the underlying components to “maximize the scientific advantages while reducing the chance of harmful use.” To guide policy, OLMo has an ethics review committee with internal and external advisors (AI2 wouldn’t say who, exactly) that’ll provide feedback throughout the model creation process.

We’ll see to what extent that makes a difference. For now, loads’s up within the air — including many of the model’s technical specs. (AI2 did reveal that it’ll have around 70 billion parameters, parameters being the parts of the model learned from historical training data.) Training’s set to start on LUMI’s supercomputer in Finland — the fastest supercomputer in Europe, as of January — in the approaching months.

AI2 is inviting collaborators to assist contribute to — and critique — the model development process. Those interested can contact the OLMo project organizers here.

AI2 is developing a big language model optimized for science

1 COMMENT

LEAVE A REPLY Cancel reply