Goodfire Develops First ‘Visualization Tool’ to Interpret and Modify LLM Black Box

-

(Photo = Goodfire)

Artificial intelligence (AI) startup Goodfire has launched the world’s first tool for understanding and editing the behavior of AI models. It goals to enhance and debug model functions by providing a user interface that decomposes large language models (LLMs) and explains the inner decision-making process behind their output.

VentureBeat reported on the fifteenth (local time) that AI startup Goodfire raised $7 million (about 9.5 billion won) in a seed round to assist develop AI system debugging tools.

The investment was led by Lightspeed Enterprise Partners, with participation from Menlo Ventures, South Park Commons, Work-Bench, Juniper Ventures, Midos Ventures, Bluebird Capital and various angel investors.

Goodfire goals to resolve the issue of accelerating complexity of generative AI models.

The inner workings of AI models are sometimes described as ‘black boxes’, which makes it obscure what sort of response an LLM is generating to a selected prompt.

The startup’s approach to AI explainability is generally known as “mechanistic interpretability,” which involves understanding the inner workings of how AI models reason and make decisions at essentially the most granular level.

To this end, he emphasized that he has brought together experts within the fields of AI interpretability and startup expansion.

Co-founders Eric Ho, CEO of Goodfire, and Dan Balsam, CTO, previously founded the AI-based job search and recruiting startup RippleMatch, while co-founder and chief scientist Tom McGrath previously worked as a senior researcher at Google DeepMind.

The corporate explains that the tool allows developers to effectively map the “brain” of an AI model, just like how neuroscientists use imaging to know what’s occurring contained in the human brain. Using machine interpretability techniques, they explain that they will understand which neurons correspond to different tasks, concepts, and decisions.

Once the brain is mapped, the tool tries to work out which neurons within the model are causing unwanted behavior, essentially visualizing the AI’s ‘brain’ to make it easier to discover the parts which are causing problems.

Finally, developers can use the Goodfire tool’s control system to perform ‘surgeries’ to remove or enhance specific features to change the model’s behavior.

“Like a neurosurgeon fastidiously manipulating specific brain regions, users can improve the model’s functionality, eliminate issues, and fix bugs,” Ho said. “By making AI models more interpretable and modifiable, we’re paving the best way for safer, more reliable, and more helpful AI technologies.”

Meanwhile, research has been focused on understanding the within LLM recently. Google DeepMind previously released a brand new ‘Jumprelu SAE’ architecture that may discover and track individual features of a neural network to explore the within of a giant language model (LLM), and OpenAI and Antropic also released a way to interpret how LLM works using SAE (sparse autoencoder).

Nevertheless, Goodfire is the primary tool to have an interface that lets you visualize and modify how LLMs work. The corporate currently Accept the waiting list It’s in the center.

Reporter Park Chan cpark@aitimes.com

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x