AI Is Not a Black Box (Relatively Speaking)

Summary: Opinion piece for the final TDS audience. I argue that AI is more transparent than humans in tangible ways. Claims of AI being a “black box” lack perspective and comparison to the opacity in studies of human intelligence which in some ways is behind studies of artificial intelligence.

reader, are a black box. Your mind is mysterious. I can’t know the way you’re considering. I can’t know what you’ll do and I can’t know whether your words are honest and whether you justify your actions truthfully and without pretext. We learn to grasp and trust humans from a few years of introspection and experience interacting with others. But experience also tells us that understanding is restricted to those with similar-enough life backgrounds and trust is unwarranted for those with motivations contrary to our own.

Artificial intelligence—while still mysterious—is crystal clear compared. I can probe an AI for its equivalent of thoughts and motivations and know I’m getting the reality. Further, the AI equivalent of “life background”, its training data, and equivalent of “motivations”, its training goal, are mostly if not entirely known and open to scrutiny and evaluation. While we still lack years of experience with modern AI systems, I argue that there isn’t a problem of opacity; on the contrary, the relative transparency of AI systems to inspection, their “white box” nature, generally is a foundation for understanding and trust.

You might have heard of AI as a “black box” in two senses: AI like OpenAI’s ChatGPT or Anthropic’s Claude are black boxes because you can’t inspect their code or parameters (black box ). Within the more general sense, even in the event you could inspect those things ( box access), they might be of little assist in understanding how the AI operates to any generalizable extent. You possibly can follow every instruction that defines ChatGPT and gain no more insight than in the event you merely read its output, a corollary to the Chinese room argument. A (human) mind, nevertheless, is more opaque than even restricted-access AI. Since physical barriers and ethical constraints limit interrogation of the mechanisms of human thought and our models of the brain’s architecture and components are incomplete, the human mind is more of a black box—albeit an organic, carbon-based, “natural” one—than even the proprietary, closed-source AI models. Let’s compare what current science tells us concerning the internal workings of the human brain on the one hand and AI models on the opposite.

Fig 2. fMRI-captured volume of human brain. Functional data not shown. Image by creator; data by Pietrini et al. included under PPDL.

As of 2025, the one static neural structures which have been mapped—those of a fly—have but a tiny fraction of the complexity of the human brain. Functionally, experiments using functional magnetic resonance imaging (fMRI) can pinpoint neural activity right down to about 1mm³ volumes of brain matter. Figure 2 shows an example of the neural structure captured as a part of an fMRI study. The required hardware features a machine price at the least $200,000, regular access to liquid helium, and a supply of very patient humans willing to carry still while a tonne of superconductor spins inches from their heads. While fMRI studies can establish that, for instance, the processing of visual depictions of faces and houses is related to certain brain regions, much of what we all know concerning the functions of the brain is due to literal accidents, that are in fact not ethically scalable. Ethical, less invasive experimental approaches provide relatively low signal-to-noise ratios.

Fig 3. 425k concepts in Gemma2-2B across its 26 layers. Animation highlights each layer in sequence. Image and arrangement by creator; data by Google included under CC BY.

Open source models (white box access), including large language models (LLM), are often sliced and diced (virtually) and otherwise interrogated in rather more invasive ways than possible on humans even with the costliest fMRI machine and sharpest scalpel—this using consumer computer gaming hardware. Each little bit of each neural connection could be inspected and logged repeatedly and consistently under an enormous space of inputs. The AI doesn’t tire in the method, neither is it affected in any way. This level of access, control, and repeatability allows us to extract a large amount of signal from which we will perform much fine-grained evaluation. Controlling what an AI is observing lets us connect familiar concepts to components and processes inside and out of doors of an AI in useful ways:

Associate neural activity with conc epts akin to an fMRI. We are able to tell whether an AI is “considering” about a specific concept. How well can we tell when a human is serious about a specific concept? Figs. 1 and three are two renderings of concepts from GemmaScope which provides annotations google’s Gemma2 model internals to concepts.

Determine the importance of particular inputs to outputs. We are able to tell whether a selected a part of a prompt was necessary in producing an AI’s response. Can we tell whether a human’s decision is impacted by a specific concern?

Attribute conveyance of concepts as paths through an AI. This implies we will tell exactly where in a neural network an idea traveled from input words to eventual outputs. Fig 4 shows an example of such a path trace for a grammatical concept of subject-number agreement. Can we do the identical for humans?

Fig 4. Path through which subject-number agreement is conveyed across the layers of a bidirectional transformer (BERT) model. Image by creator (source).

Humans can, in fact, self-report answers to the primary two questions above. You possibly can ask a hiring manager what they were serious about once they read your résumé or what aspects were necessary of their decision to give you a job (or not). Unfortunately, humans lie, they themselves don’t know the explanations for his or her actions, or they’re biased in ways they will not be aware of. While this can also be the case for generative AI, methods for interpretability within the AI space don’t depend on AI’s answers, truthful, unbiased, self-aware, or otherwise. We don’t must trust the AI’s outputs as a way to tell whether it’s serious about a specific concept. We literally read it off a (virtual) probe stuck onto its neurons. For open source models, that is trivial, laughably so considering what it takes to get this kind of data (ethically) out of a human.

What about closed-source “black box access” AI? Much could be inferred just from black box access. Models’ lineage is understood, and so is their general architecture. Their basic components are standard. They can be interrogated at a rate much higher than a human would put up with, and in a more controlled and reproducible manner. Repeatability under chosen inputs is commonly a substitute for open access. Parts of models could be inferred or their semantics copied by “distillation”. So black-box just isn’t an absolute impediment to understanding and trust, but probably the most immediate approach to make AI more transparent is to permit open access to its entire specification, despite current trends among the many outstanding AI builders.

Humans stands out as the more complex considering machines, so the above comparisons may not seem fair. And we’re more inclined to feel that we understand and may trust humans due to our years of experience being human and interacting with other (presumed) humans. Our experience with various AIs is growing rapidly, and so are their capabilities. While the sizes of the top-performing models are also growing, their general architectures have been stable. There isn’t a indication that we’ll lose the type of transparency into their operation described above, whilst they attain and subsequently surpass human capabilities. There may be also no indication that exploration of the human brain is prone to yield a breakthrough significant enough to render it the less opaque intelligence. AI just isn’t—and sure won’t turn out to be—the black box that the favored human sentiment says it’s.

Piotr Mardziel, head of AI, RealmLabs.AI.

Sophia Merow and Saurabh Shintre contributed to this post.

AI Is Not a Black Box (Relatively Speaking)

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

The First Multilingual LLM Debate Competition

MIT within the media: 2025 in review

Introducing the Open Leaderboard for Japanese LLMs!

ChatLLM Presents a Streamlined Solution to Addressing the Real Bottleneck in AI

AI Is Not a Black Box (Relatively Speaking)

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.