What Is a Knowledge Graph — and Why It Matters

Summary

in the primary half of the Nineteenth century, and you are feeling an almost paralyzing ache in your abdomen. You now have a alternative. You learn to live with that pain for the remainder of your life (which can only be weeks or months away depending on what’s causing that ache) otherwise you enterprise to the doctor, a nightmarish experience potentially involving tortuous treatments like bloodletting, laxatives, induced vomiting, or downing vials of mercury (Hager 52).

There isn’t a knowledge about how diseases spread, so going right into a crowded hospital could mean exposure to smallpox and cholera (Kirsch and Ogas 80). In the event you are unlucky enough to wish surgery (or have a physician prescribe an unneeded one—again, there is sort of no knowledge of disease pathways), there shall be no anesthesia. Finding one of the best surgeon likely means finding the fastest one, who can work as rapidly as possible to attenuate the time orderlies need to restrain you whilst you’re shrieking and writhing on the table. In the event you survive the surgery, you continue to have a major probability of dying of an infection since there’s no knowledge of germ theory and so no aseptic techniques (Kirsch and Ogas 45). And for those who’re a pregnant woman, you possibly can expect the maternity ward to be much more fucked up. Nearly 15 percent of babies born within the UK within the mid-Nineteenth century died at birth.

Compare that with the medical care provided in any developed country today, and let’s just say, we’ve come a great distance. The infant mortality rate in developed countries is now lower than 6 per 1,000 live births, or 0.6 percent. The common life expectancy in developed countries is generally higher than 80 in comparison with about 40 within the mid-Nineteenth century. Now we have drugs or other treatments for just about all of essentially the most common diseases, and humanity is curing more day by day. The long run looks much more promising, especially with the increasing capabilities of AI and the funding behind them. The Chan Zuckerberg Initiative (CZI), for instance, goals to assist scientists cure, prevent, or manage all diseases by the top of the twenty first century.

How has healthcare made this progress? And why does healthcare proceed to draw disproportionate investment in AI today? It’s not simply higher data; it’s higher structure around knowledge. Long before computers, medicine began developing shared understandings of diseases and causal relationships, controlled vocabularies to catalog real-world entities, and data standards to make sure observations were empirical and replicable. Taken together, these frameworks form what we’d now recognize as a knowledge graph.

At a high level, knowledge graphs solve a recurring set of problems that change into unavoidable as domains scale:

Search and retrieval across fragmented systems, formats, and terminologies
Discovery and design in complex, interconnected systems
Reuse and repurposing of existing knowledge and assets
Decision support under uncertainty, with explainable reasoning
Advice and personalization grounded in domain semantics
Governance, traceability, and regulatory compliance

Mature domain knowledge graphs in healthcare are the rationale drugs may be designed to focus on specific diseases, why your doctor knows concerning the negative unwanted side effects of a drug in Japan even when it goes by a distinct name there, and why physicians can aggregate and learn from observations from thousands and thousands of clinical encounters and experiments, often in real-time.

On this three-part series, I hope to supply some context and insights around how knowledge graphs (and their precedents) have worked in healthcare, how healthcare became the industry leader in knowledge graphs, and share some potential lessons for other industries grappling with similar challenges.

What’s a knowledge graph?

An ontology defines classes and the relationships between classes; it’s the idea underpinning the knowledge graph. In medicine, classes are things like pathogens, diseases, and medicines. The ontology defines the constraints and causal assumptions for a way this stuff relate. For instance, pathogens are organisms and may cause diseases. Drugs are chemical substances that may goal pathogens and, potentially, inhibit diseases. The ontology deals with classes somewhat than instances–it doesn’t tell you which ones pathogens cause which diseases or which drugs inhibit which pathogens.

The instances are defined as controlled vocabularies. Controlled vocabularies are catalogs of instances of the classes defined within the ontology. For instance, there are millions of known pathogens that may cause diseases in humans: every thing from viruses to bacteria to parasites. There are also 1000’s of medicine and 1000’s of diseases. These instances of classes are cataloged and maintained by experts and are commonly updated as we learn more about them. Some controlled vocabularies in healthcare are known as ‘omics’ because they’re about things that end with the suffix “omics” similar to genomics, proteomics, and metabolomics.

The way in which we learn more concerning the world is thru remark, and in healthcare those observations are treated as . Clinical trials and laboratory experiments produce observational data that justify, refine, or refute claims about how entities in our controlled vocabularies relate to one another. How will we know that the pathogen causes the disease syphilis? Because scientists did an experiment and measured the consequence and produced evidence. How will we know that Salvarsan targets and destroys and cures syphilis? Because scientists ran clinical studies and measured the results of treating syphilis patients with Salvarsan.

Connecting entities like this creates a graph. Entities in a graph are sometimes called nodes, and the connections are called edges. Graphs can contain thousands and thousands of nodes and edges, and with this structure, patterns begin to emerge. For instance, you possibly can discover an important or impactful nodes in a graph, distinguish clusters of nodes which are deeply connected, or find the shortest paths between different entities. These techniques (sometimes called graph analytics) are widely utilized in medicine as part of what’s often called network medicine to discover disease mechanisms and potential therapeutic targets (Barabási, Gulbahce, Loscalzo, 2011). That is all possible with a graph, but since now we have an ontology, now we have greater than only a graph. Now we have a knowledge graph.

Connections in a knowledge graph represent explicit assertions concerning the world: facts. The knowledge graph isn’t just saying, It’s saying It also states that These two facts, combined with the logic encoded within the ontology, enable the knowledge graph to infer a brand new relationship or fact—namely, that . That is often called reasoning or the power to derive “logical consequences from a set of facts or axioms.” Knowledge graphs excel at this because they make each the facts and the principles for combining them explicit.

Medicine has been using this information management structure for many years. Scientists do experiments and learn recent things. The findings of those experiments result in updates within the controlled vocabularies and/or relationships between entities within the controlled vocabularies. Gene X is expounded to protein Y, which is involved within the biological process Z. Because the variety of entities and relationships grow, so does our knowledge. Sometimes, but much less regularly, the ontology changes. A considerable change in an ontology just isn’t just an incremental increase in knowledge, but often a change in the way in which we understand the world.

Healthcare is the leader in knowledge graphs since it excels in all three of those layers. It has spent many years refining causal models for a way the natural world works; meticulously cataloging thousands and thousands of diseases, drugs, proteins, and every thing else relevant for medicine; and conducting empirical, replicable experiments with standardized data outputs. These foundations were reinforced by strong regulatory pressure that mandated standardization and comparability of evidence, widespread pre-competitive collaboration and public funding, and early adoption of open, vendor-neutral semantic standards. Combined, these aspects created the conditions by which knowledge graphs could thrive as core infrastructure somewhat than experimental technology.

What problems do knowledge graphs solve?

Once you’ve entities mapped together, validated with real-world evidence, and grounded in causal pathways, you’ve a knowledge graph, and you possibly can do every kind of cool stuff. I’ll undergo a number of the most outstanding use cases of data graphs in healthcare today and the way they might apply to other domains.

Search

Probably essentially the most common use case for knowledge graphs is search. Modern healthcare requires the power to retrieve relevant, connected context across heterogeneous and multimodal data. Suppose you’re employed at a big pharmaceutical company and you should know every thing a few given drug. It is advisable to repurpose this drug, assess its safety risk, or compare it with a competitor. Or, perhaps the FDA asked you for details about it. You’d have to go looking in relational databases for experimental data, content management systems for clinical trial reports, and multiple third-party databases for established public or industry knowledge. Not only is the info scattered across disconnected systems and in several formats (relational, text, slides, audio), the drug might also go by different names. The corporate can have outsourced clinical trials to a UK company who called it by its generic name, for instance.

As generative AI has change into more widely adopted, retrieval has emerged as a critical capability in every industry. Large Language Models (LLMs) were trained on loads of data, but not your data, so the power to retrieve relevant internal context is crucial when working with these models. We now call this context engineering: “the art and science of filling the context window with just the appropriate information at each step of an agent’s trajectory,” as described by Lance Martin of LangChain.

Healthcare is uniquely well positioned to reap the benefits of this recent era of AI due to its longstanding investment in knowledge graphs. Tasks like filing regulatory reports are rather a lot easier for those who are capable of retrieve the relevant internal context, evidence, and facts. There are corporations, like Weave, who’re using knowledge graphs to do exactly this. They use the ability of the graph to retrieve the relevant information and an LLM to summarize and answer the regulatory questions, enabling automated report generation.
Large financial organizations like Morgan Stanley, Bloomberg, HSBC, and JPMorgan Chase are also using knowledge graphs to unify data silos to construct research assistants and advanced search capabilities for his or her employees and clients.

Discovery and Design

By understanding the way in which different entities interact, each in theory and within the lab, scientists working in drug discovery can design drugs for purpose. Fairly than testing different compounds blindly, hoping they find something useful, drug hunters can now work backwards from a desired consequence (similar to lowering blood pressure) to discover candidate compounds, while accounting for patient differences (genetics, age, sex), interconnected systems, and potential hostile effects, all while complying with regulatory constraints. Most of the world’s largest pharmaceutical corporations, including AbbVie, AstraZeneca, GSK, Pfizer, Merck, Novartis, Novo Nordisk, Roche, and Sanofi use knowledge graphs for drug discovery. There are also corporations who focus exclusively on curating healthcare knowledge graphs for drug discovery like BioRelate and BenevolentAI.

This same kind of problem appears in lots of other industries. Banks often must create financial products (e.g., structured notes) that achieve a desired consequence (e.g., higher yield with limited downside) while accounting for interconnected systems, mitigating hostile effects, and complying with regulatory constraints. Likewise, public policy practitioners often must create interventions that achieve a desired consequence (e.g., reducing poverty) while accounting for various local contexts (e.g., geography, culture, climate), interconnected systems, and potential hostile effects.

Repurposing

Fairly than designing a wholly recent drug to realize an consequence, it is typically easier to repurpose an existing drug. When Dr. David Fajgenbaum was diagnosed with a rare immune disorder while still in medical school, he was told he had weeks to live and a priest was called in to read him his last rites. While there was not enough time to design a brand new drug, there was time to repurpose something off the shelf. That’s exactly what he did. He found a drug originally meant to forestall organ transplant rejection and used it on himself. His disease has been in remission for 11 years, he finished medical school, and commenced the nonprofit Every Cure to “be certain that patients don’t suffer while potential treatments hide in plain sight.” Every Cure uses, amongst other techniques, knowledge graphs.

Drug repurposing is about taking an existing product, understanding its underlying structure, and safely applying it in a brand new context. Public policy follows the identical pattern. Practitioners discover interventions that worked in a single context, understand why they worked, and reapply them elsewhere. Likewise, many corporations are sitting on a gold mine of information, collected for some purpose long forgotten. But by understanding the meaning and context of the info, it will possibly be repackaged and reused for various purposes.

Decision support

Healthcare professionals often depend on decision support systems to help in making decisions that include many interconnected aspects and incomplete data (Yang, et al., Al Khatib et al., Zhang et al.). On daily basis, physicians must make decisions about treat and diagnose their patients based on limited, evolving information. A person patient’s electronic health records (EHR) may be sparse and have limited predictive power (Yang, et al.). Knowledge graphs give the physician the power to attach EHRs with controlled vocabularies (diseases, symptoms, drugs) and observational data from previous studies and, increasingly, patient-generated data from wearables (Al Khatib, et al.).

This helps the physician make more informed diagnoses and treatment recommendations by grounding decisions in what is understood from related cases, populations, and clinical evidence, while still accounting for the precise context of the patient. These are especially useful since the underlying reasoning may be made explicit and explainable, in contrast to many black box AI solutions. Firms like Evidently are constructing decision support tools, powered by knowledge graphs and AI, to attach patient data across EHRs and existing clinical insights to assist clinical practitioners make higher, more informed, and explainable decisions in real time.
Other industries are also using knowledge graphs to power decision support tools. The MITRE Corporation, the R&D organization, publishes MITRE ATT&CK, a knowledge graph of adversary tactics and techniques for decision support in cybersecurity operations. OpenCorporates, is an open legal-entity knowledge graph that’s utilized by corporations like Encompass for decision support regarding due diligence.

Recommender systems

While decision support focuses on diagnostic accuracy, safety, and adherence to clinical guidelines, recommender systems in healthcare concentrate on personalizing and prioritizing options for patients. These systems often depend on patient-centric knowledge graphs (sometimes called Individualized Knowledge Graphs or Personalized Health Knowledge Graphs) to integrate medical history, EHR data, reference knowledge, and data from wearables. Fairly than determining whether a clinical decision is correct, recommender systems surface and rank relevant options similar to treatment plans, lifestyle interventions, follow-up actions, or care pathways which are most appropriate for a selected patient at a given moment.

Other industries use recommender systems powered by knowledge graphs and semantic technology even greater than healthcare. Almost every thing you purchase and every thing you watch is fed to you via suggestion systems. Online retailers like Amazon use them to suggest stuff you may wish to purchase, streaming services like Netflix use them to serve up your next binge-watch, and LinkedIn uses them to recommend jobs to candidates and candidates to recruiters.

Governance

Healthcare is a highly regulated industry. Drug corporations must comply with regulations to make sure they’re monitoring and assessing any potential hostile effects of their drugs; something called pharmacovigilance. In addition they store individuals’ health data, which is incredibly private and sensitive, and want to comply with regulations covering this just like the California Consumer Privacy Act (CCPA) or the General Data Protection Regulation (GDPR). To do that, they concentrate on something called data lineage—the systematic tracking of how data is generated, transformed, and used across systems. Knowledge graphs facilitate good data governance by connecting domain knowledge to knowledge concerning the organization itself, similar to business processes, org structure, ownership, roles, and policies. Organizations can then trace how data moves through systems, discover who’s liable for it, understand which teams are allowed to make use of it and for what purposes, and implement governance rules (Oliveira, et al.).

Financial services firms, like those in healthcare, depend on knowledge graph approaches to support enterprise data governance. Recent research proposes extending these same foundations to AI governance by linking data, policies, and decisions in a unified semantic layer. In regulated environments, governance just isn’t a secondary concern—it’s the mechanism by which trust, accountability, and explainability are enforced at scale.

Conclusion

Knowledge graphs will not be a recent invention, nor are they a side effect of recent AI. They’re a way of organizing knowledge that enables intending to be shared, evidence to build up, and reasoning to stay explicit as understanding evolves. By separating theory (ontologies), instances (controlled vocabularies), and evidence (observational data), knowledge graphs make it possible to construct systems that do greater than store facts—they support discovery, explanation, reuse, and trust.

Long before large language models, healthcare invested heavily in defining shared concepts, cataloging the natural world, and standardizing how observations are documented and evaluated. Over time, these practices created dense, interconnected knowledge structures that may very well be prolonged, queried, and reasoned over as recent discoveries emerged. Modern AI systems are powerful precisely because they are actually being layered on top of this foundation, not because they replace it.

In the subsequent a part of this series, I’ll look more closely at healthcare became the worldwide leader in knowledge graph maturity. That story includes regulatory pressure, pre-competitive collaboration, public funding of shared knowledge, and early commitment to open standards. In the ultimate part, I’ll step back from healthcare entirely and explore what other industries (finance, policy, manufacturing, energy, and others) can learn from this trajectory as they try and construct AI-ready systems of their very own.

The central claim is easy: progress at scale depends less on smarter models than on higher structure. Healthcare learned this lesson early. Others are actually being forced to learn it quickly.

Concerning the writer: Steve Hedden is the Head of Product Management at TopQuadrant, where he leads the strategy for EDG, a platform for knowledge graph and metadata management. His work focuses on bridging enterprise data governance and AI through ontologies, taxonomies, and semantic technologies. Steve writes and speaks commonly about knowledge graphs, and the evolving role of semantics in AI systems.

Bibliography

Al Khatib, Hassan S., et al. “Patient-centric knowledge graphs: a survey of current methods, challenges, and applications.” 7 (2024): 1388479.

Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011 Jan;12(1):56-68. doi: 10.1038/nrg2918. PMID: 21164525; PMCID: PMC3140052.

Hager, Thomas. . Harry N. Abrams, 2019.

Isaacson, Walter. . Simon & Schuster, 2021.

Kirsch, Donald R., and Ogi Ogas. . Arcade, 2017.

Oliveira, Miguel AP, et al. “Semantic Modelling of Organizational Knowledge as a Basis for Enterprise Data Governance 4.0–Application to a Unified Clinical Data Model.” (2023).

Rajabi, E.; Kafaie, S. Knowledge Graphs and Explainable AI in Healthcare. 2022, , 459. https://doi.org/10.3390/info13100459

Yang, Carl, et al. “A review on knowledge graphs for healthcare: Resources, applications, and guarantees.” (2023).

Yong Zhang, Ming Sheng, Rui Zhou, Ye Wang, Guangjie Han, Han Zhang, Chunxiao Xing, Jing Dong. “HKGB: An Inclusive, Extensible, Intelligent, Semi-auto-constructed Knowledge Graph Framework for Healthcare with Clinicians’ Expertise Incorporated.” Information Processing & Management (2020). https://doi.org/10.1016/j.ipm.2020.102324.

What Is a Knowledge Graph — and Why It Matters

Summary

What’s a knowledge graph?