. Part 1, “What Is a Knowledge Graph — and Why It Matters” is accessible here.
In Part 1, we described how structured knowledge enabled healthcare’s progress. This text examines why healthcare, greater than every other industry, was capable of construct that structure at scale.
Healthcare is probably the most mature industry within the use of data graphs for a number of fundamental reasons. At its core, medicine is grounded in empirical science (biology, chemistry, pharmacology) which makes it possible to ascertain a shared understanding of the forms of things that exist, how they interact, and causality. In other words, healthcare lends itself naturally to ontology.
The industry also advantages from a deep culture of shared controlled vocabularies. Scientists and clinicians are natural librarians. By necessity, they meticulously list and categorize all the pieces they will find, from genes to diseases. This emphasis on classification is reinforced by a commitment to empirical, reproducible commentary, where data have to be comparable across institutions, studies, and time.
Finally, there are structural forces which have accelerated maturity: strict regulation; strong pre-competitive collaboration; sustained public funding; and open data standards. All of those aspects incentivize shared standards and reusable knowledge relatively than isolated, proprietary models.
Together, these aspects created the conditions for healthcare to construct durable, shared semantic infrastructure—allowing knowledge to build up across institutions, generations, and technologies.
Ontologies
Humans have at all times tried to know how the world works. After we observe and report the identical thing repeatedly, and agree that it’s true, we develop a shared understanding of reality. This process is formalized in science using the scientific method. Scientists develop a hypothesis, conduct an experiment, and evaluate the outcomes empirically. In this manner, humans have been developing an implicit medical ontology for 1000’s of years.
Otzi, the caveman discovered in 1991, who lived 5,300 years ago, was discovered with an antibacterial fungus in his leggings, more likely to treat his whipworm infection (Kirsch and Ogas 4). Even cavemen had some understanding that plants could possibly be used to treat ailments.
Eventually, scientists realized that it wasn’t the plant itself that was treating the ailment, but compounds contained in the plant, and that they may mess with the molecular structure of those compounds within the lab and make them stronger or more practical. This was the start of organic chemistry and the way Bayer invented Aspirin (by tweaking Willow bark) and Heroin (by tweaking opium from poppies) (Hager 75; Kirsch and Ogas 69). This added a brand new class to the ontology: compounds. With each latest scientific breakthrough, our understanding of the natural world evolved, and we updated our ontology accordingly.

Over time, medicine developed a layered ontology, where each latest class didn’t replace the previous one but prolonged it. The ontology grew to incorporate pathogens after scientists Fritz Schaudinn and Erich Hoffmann discovered the underlying reason behind syphilis was a bacterium called We learned microbes could possibly be found almost in every single place and a few of them could kill bacteria, like penicillin, so microbes were added to our theory.

We learned that DNA accommodates genes, which encode proteins, which interact with biological processes and risk aspects. Every major advance in medicine added latest classes of things to our shared understanding of reality and compelled us to reason about how those classes interact. Long before computers, healthcare had already built a layered ontology. Knowledge graphs didn’t introduce this manner of pondering; they merely gave it a proper, computational substrate.
Today, we’ve ontologies for anatomy (Uberon), genes (Gene Ontology), chemical compounds (ChEBI) and a whole lot of other domains. Repositories corresponding to BioPortal and the OBO Foundry provide access to well over a thousand biomedical ontologies.
Controlled vocabularies
Once a category of things was defined, medicine immediately began naming and cataloging every instance it could find. Scientists are great at cataloging and defining instances of classes. the primary pharmacopoeia, was accomplished in 70 CE. It was a book of about 600 plants and about 1000 medicines. When chemists began working with organic compounds within the lab, they created 1000’s of recent molecules that needed to be cataloged. In response, the primary volume of the was released in 1881. This handbook catalogued all known organic compounds, their reactions and properties, and grew to contain hundreds of thousands of entries.

This pattern repeats throughout the history of drugs. Each time our understanding of the natural world improved, and a brand new class was added to the ontology, scientists began cataloging the entire instances of that class. Following Louis Pasteur’s finding in 1861 that germs cause disease, people began cataloging all of the pathogens they may find. In 1923, the primary version of was published, which contained a few thousand unique bacteria species.

The identical pattern repeated with the invention of genes, proteins, risk aspects, and adversarial effects. Today, we’ve wealthy controlled vocabularies for conditions and procedures (SNOMED CT), diseases (ICD 11), adversarial effects (MedDRA), drugs (RxNorm), compounds (CheBI and PubChem), proteins (UniProt), and genes (NCBI Gene). Most large pharma firms work with dozens of those third-party controlled vocabularies.
Somewhat confusingly, ontologies and controlled vocabularies are sometimes blended in practice. Large controlled vocabularies steadily contain instances from multiple classes together with a light-weight semantic model (ontology) that relates them. SNOMED CT, for instance, includes instances of diseases, symptoms, procedures, and clinical findings, in addition to formally defined relationships corresponding to and . In doing so, it combines a controlled vocabulary with ontological structure, effectively functioning as a knowledge graph in its own right.
Regulations
Following a mass poisoning that killed 107 people because of an improperly prepared “elixir” in 1937, the US government gave the Food and Drug Administration (FDA) increased regulatory powers (Kirsch 97). The Federal Food, Drug, and Cosmetic Act of 1938 had requirements on how drugs ought to be labeled and required that drug manufacturers submit safety data and a press release of “intended use” to the FDA. This helped the US largely avoid the thalidomide tragedy within the late Nineteen Fifties in Europe, where a tranquilizer was prescribed to pregnant women to treat anxiety, trouble sleeping, and morning sickness—despite not ever being tested on pregnant women. This caused the “largest anthropogenic medical disaster ever”, during which 1000’s of ladies suffered miscarriages and greater than 10,000 babies were born with severe deformities.
While the US largely avoided this due to FDA reviewer caution, it also exposed gaps within the system. The Kekauver-Harris Amendments to the Federal Food, Drug, and Cosmetic Act in 1962 now required proof that drugs were each secure and effective. The increased strength of the FDA in 1938, and again in 1962, forced healthcare to standardize on the meaning of terms. Drug firms were forced to agree upon indications (what’s the drug meant for), conditions (what does the drug treat), adversarial effects (what other conditions have been related to this drug) and clinical outcomes. Increased regulatory pressure also required replicable, well-controlled studies for all claims made a few drug. Regulation did not only demand safer drugs; it demanded shared meaning.
Observational data
These regulatory changes did not only affect approval processes; they fundamentally reshaped how medical observations were generated, structured, and compared. To make clinical evidence comparable, reviewable, and replicable, data standards for clinical trials became codified through organizations just like the Clinical Data Interchange Standards Consortium (CDISC). CDISC defines how clinical observations, endpoints, and populations have to be represented for regulatory review. Likewise, the FDA turned the shared terminologies cataloged in controlled vocabularies from best practice to mandatory.
Pre-competitive collaboration
Certainly one of the enabling aspects that has led healthcare to dominate in knowledge graphs is pre-competitive collaboration. Numerous the work of healthcare is grounded in natural sciences like biology and chemistry which are treated as a public good. Firms still compete on products, but most consider a big portion of their research “pre-competitive.” Organizations just like the Pistoia Alliance facilitate this collaboration by providing neutral forums to align on shared semantics and infrastructure (see data standards section below).
Public funding
Public funding has been essential to constructing healthcare’s knowledge infrastructure. Governments and public research institutions have invested heavily within the creation and maintenance of ontologies, controlled vocabularies, and large-scale observational data that no single company could afford constructing alone. Agencies corresponding to the National Institutes of Health (NIH) fund a lot of these assets as public goods, leaving healthcare with a wealthy, open knowledge base able to be connected and reasoned over using knowledge graphs.
Data standards
Healthcare also embraced open data standards early, ensuring shared knowledge could possibly be represented and reused across systems and vendors. Standards from the World Wide Web Consortium (W3C) made medical knowledge machine-readable and interoperable, allowing semantic models to be shared independently of any single system or vendor. By anchoring meaning in open standards relatively than proprietary schemas, healthcare enabled knowledge graphs to operate as shared, long-lived infrastructure relatively than isolated implementations. Standards ensured that meaning could survive system upgrades, vendor changes, and a long time of technological churn.
Conclusion
None of those aspects alone explains healthcare’s maturity; it’s their interaction over a long time—ontology shaping vocabularies, regulation enforcing evidence, funding sustaining shared infrastructure, and standards enabling reuse—that made knowledge graphs inevitable relatively than optional. Long before modern AI, healthcare invested in agreeing on what things mean and the way observations ought to be interpreted. In the ultimate a part of this series, we’ll explore why most other industries lack these conditions—and what they will realistically borrow from healthcare’s path.
In regards to the creator: Steve Hedden is the Head of Product Management at TopQuadrant, where he leads the strategy for EDG, a platform for knowledge graph and metadata management. His work focuses on bridging enterprise data governance and AI through ontologies, taxonomies, and semantic technologies. Steve writes and speaks commonly about knowledge graphs, and the evolving role of semantics in AI systems.
Bibliography
Hager, Thomas. . Harry N. Abrams, 2019.
Isaacson, Walter. . Simon & Schuster, 2021.
Kirsch, Donald R., and Ogi Ogas. . Arcade, 2017.
