Home Artificial Intelligence Palantir Foundry — The Data Operating System That’s Not Talked About Enough Treating Data Like Code

Palantir Foundry — The Data Operating System That’s Not Talked About Enough Treating Data Like Code

0
Palantir Foundry — The Data Operating System That’s Not Talked About Enough
Treating Data Like Code

The world of information engineering is filled with debates, with different schools of thought vying for supremacy. There are lots of discussions around data modeling, e.g., Inmon vs. Kimball, but in addition around tooling, whether Fivetran + DBT are rent-seeking, or the everlasting debate of Snowflake vs. Databricks. Today I need to shed some light on a tool I actually have rarely seen discussed in any LinkedIn posts, Medium articles, or HackerNews threads.

That tool is Palantir Foundry. Some may need heard of it already, but what exactly is Foundry? While Palantir is primarily known for its work with the federal government, additionally they have a industrial offering, Foundry the Data Operating System. Before everything, it’s designed to be a whole end-to-end data platform, a mesh of services containing every little thing you have to manage your data, from ingestion to transformation, storage, visualization, and Machine Learning. But that is barely the tip of the iceberg.

Foundry takes a radically different approach to traditional cataloging and schema design, specializing in semantics and kinetics. You create an Ontology, defining objects, their properties, and the links between them. It connects your data to the actual world, serving as a digital twin and as a shared language between all of the stakeholders of a corporation. This permits, e.g., business users to construct applications on top of the Ontology to completely operationalize the information without considering of information as data frames but as real-world objects.

The Ontology is such a groundbreaking concept that it may very well be the topic of an article by itself. Nonetheless, in this text, I need to focus on three features that make Palantir Foundry stand out to me.

Before we dive into the features of Palantir Foundry, it’s necessary to notice that I actually have yet to personally use Foundry. My knowledge relies on conversations with users, videos, and documentation.

It’s also price mentioning that there are valid explanation why Palantir Foundry is talked about lower than other data platforms. For one, it’s not an off-the-shelf SaaS product with pay-per-use pricing. As an alternative, it requires expensive licenses which may be out of reach for some organizations. Moreover, Palantir hasn’t at all times been transparent about its product capabilities, although that is changing as they proceed to enhance its documentation and communication with users.

With that said, let’s take a better have a look at three unique features of Palantir Foundry.

Software developers often utilize version control systems to coordinate their work on a codebase. This permits multiple engineers to securely contribute to the identical codebase without interfering with one another’s work. Foundry approaches data in an analogous way that software developers approach code. It enables many individuals to interact with the identical data and make changes without disrupting others’ progress using branching. It means that you can diverge from the most important path and work on data inside your branch. After making desired changes, you possibly can merge your branch back into the most important branch once you’re satisfied with the outcomes.

Allow us to imagine you’re working on a feature branch and need to commit a change to the most important branch, e.g., rename a column in a Spark Job. You need to use the Compare feature of Foundry to match the output dataset on the feature branch to the dataset in your most important branch. The comparison includes lots of helpful dataset stats, like size, file count, and row count. You’ll be able to even compare on column level, the share of NULL values, Min/Max, or Mean. This means that you can be certain that your changes are purely metadata related on this use case.

In the event you are sure your changes are able to be merged, you create a pull request showing you many beneficial things. Firstly you possibly can see, as you’re used to, the files which were modified by the commit. But now the magic begins! Foundry will provide an impact evaluation showing you the affected datasets, their schema changes, and whether the affected datasets pass pre-defined health checks.

Impact evaluation

But there may be more! Since Foundry captures the lineage of every dataset, there may be a pipeline review for every pull request, visualizing the changes made to the pipeline.

Lineage review

Those features not only help people working on the pipelines but in addition people who are reviewing pull requests, helping them to be certain that no unexpected changes occur.

Foundry provides a service called Foundry Functions that permits users to interact with objects from the ontology via a TypeScript AWS Lambda-like function.

One in every of the critical advantages of Foundry Functions is that it generates interfaces for objects imported from the Ontology. This makes it easy to interact with those objects by accessing their properties, traversing links between them, or aggregating collections. Typical use cases are:

  • Custom aggregations for dashboards.
  • The calculation of custom metrics.
  • Even complex edits to the Ontology itself.

You’ll be able to reuse functions across multiple services, equivalent to Foundry Workshop, which allows users to construct no-code applications.

Besides running unit tests robotically on commit, it also provides a live preview tab that enables users to check their functions in real time, making the event process iterative and simple.

Preview Tab

Foundry Functions brings collaboration in an enterprise to a complete recent level. It allows regular Software Engineers to work directly on the information provided by Data Engineers without fascinated with columns and rows in a table.

Quiver is a service that gives advanced analytical and dashboarding capabilities on top of your Ontology. You’ll be able to construct dashboards that could be shared across your organization and even embedded in other services. It is analogous to products like Tableau or PowerBI.

One feature that makes it stand out to me is its two modes.

First is the canvas mode, which could be very familiar to people who have already used traditional BI tools, which provides an empty view you possibly can fill with visuals. Then there may be Graph Mode, where it’s getting interesting.

Allow us to imagine you’re inheriting the ownership of a dashboard, and the creator has already left the corporate. The dashboard displays some numbers and KPIs, and you’re tasked with changing some. Here is where Graph mode comes into play. It provides a dependency graph of your analyses, showing the lineage of each visual contained, allowing the user to grasp why a selected visual shows data. I’ve definitely been in such a situation, and I can’t stress enough how handy this feature would have been.

Graph Mode showing the lineage of a dashboard

Foundry is an enormous product, and there may be lots and plenty to discuss. I hope you enjoyed the article and located the features as exciting and unique as I do.

LEAVE A REPLY

Please enter your comment!
Please enter your name here