The Great Data Closure: Why Databricks and Snowflake Are Hitting Their Ceiling

-

Introduction

an information company really grow?

This week what would have been news a 12 months ago was not news. Snowflake invested in AtScale, a provider of semantic layer services in a strategic investment within the waning company’s history. An odd move, given the commitment to the open semantic interchange or “OSI” (one more acronym or .yaa) which appears to be metricflow masquerading as something else.

Meanwhile, Databricks, the AI and Data company, invested in AI-winner and all-round VC paramore Loveable — the rapidly growing vibe-coding company from Sweden.

Starting a enterprise arm is a tried-and-tested route for enterprises. Everybody from Walmart and Hitachi to banks like JPMorgan and Goldman Sachs, and naturally the hyperscalers — MSFT, GOOG — themselves have enterprise arms (though strangely not AWS).

The advantages are clear. An investment right into a round can provide the proper of first refusal. It offers each parties influence around complementary roadmap features in addition to clear distribution benefits. “Synergy” is the word utilized in boardrooms, though it’s the less insidious and friendly younger brother of central cost cutting so prevalent in PE quite than venture-backed businesses.

It should due to this fact come as no surprise to see that Databricks are branching out outside of Data. In spite of everything (and Ali has been very open about this), the team understands the solution to grow the corporate is thru latest use cases, most notably AI. While Dolly was a flop, the jury is out on the partnership with OpenAI. AI/BI, in addition to Databricks Applications, are promising initiatives designed to bring more friends into the tent — outside of the core SYSADMIN cluster administrators.

Snowflake meanwhile could also be trying an analogous tack but with differing levels of success. Except for Streamlit, it will not be clear what value its acquisitions are truly bringing. Openflow, Neolithic Nifi under-the-hood, will not be well received. Fairly, it’s the interior developments equivalent to the embedding of dbt core into the Snowflake platform that look like gaining more traction.

In this text, we’ll dive into the several aspects at play and make some predictions for 2026. Let’s get stuck in!

Growth through use cases

Databricks has an issue. An enormous problem. And that’s equity.

Because the fourth-largest privately held company on the planet, on the tender age of 12 its employees require liquidity. And liquidity is dear (see this excellent article).

To make good on its internal commitments, Databricks needed perhaps $5bn+ when it did this raise. The quantity it needs per 12 months is critical. It’s due to this fact simply not an choice to stop raising money without firing employees and cutting costs.

The expansion is staggering. In the newest series L (!) the corporate cites 55% yearly period-on-period growth resulting in a valuation of over $130bn. The corporate must proceed to boost money to pay its opex and equity, but there may be one other constraint which is valuation. At this point Databricks’ ability to boost money is practically a bellwether for the industry, and so there may be a vested interest for everybody involved (the list is gigantic) to maintain things up.

Source: previous article

The dream is to proceed growing the corporate as this may sustain the valuation — valuations are tied to revenue growth. Which brings us back to make use of cases.

The clear use cases, as shown here, are roughly:

  • Big data processing and spark
  • Inside this, Machine Learning workloads
  • AI workloads
  • Data warehousing
  • Ingestion or Lakeflow (Arcion we suspect was perhaps a bit early)
  • Business Intelligence
  • Applications

It’s value noting these sectors are all forecasted to grow at around 15–30% all in, per the overwhelming majority of market reports (an example here). This reflects the underlying demand for more data, more automation, and more efficiency which I imagine is ultimately justified, especially within the age of AI.

It might appear to point, due to this fact, that the underside or “floor” for Databricks could be a few growth of 15–30%, and with it perhaps a 40% haircut to valuation multiples (assuming linear correlation; yes, yes, assumptions, assumptions — some more info here), barring in fact any exogenous shocks to the system equivalent to OpenAI going out of business or war.

The bull lies within the two A’s: AI use cases and Applications.

AI as a way out

If Databricks can successfully partner with the model providers and turn into the de-facto engine for hosting models and running the associated workflows, it could possibly be massive.

Handkerchief maths — the revenue is $4.8bn RR growing at 55%. Say we’re growing at 30% in regular state, we’re missing 25%. 25% of $4.8 is $1.2bn. Where can this come from? Supposedly AI products and warehousing is already over $2bn (see here). What happens next 12 months when Databricks is at $6bn and we’d like to grow 50% and due to this fact need $3bn? Is the business going to double the AI part?

Confluent is a benchmark. It’s the biggest Kafka/stream processing company, with a revenue of about $1.1bn annualised. It grows about 25% y-o-y but traded at about 8x revenue and sold to IBM for $11bn, so about 11x revenue. Even with its loyal fanbase and robust adoptions for AI use cases (see for instance marketecture from Sean Falconer.), it will still struggle to place one other $250m of annual growth on yearly.

Applications are one other story. Those who construct data-intensive applications should not people who generally construct internal-facing products, a task often borne by in-house teams of software engineers or consultants. These are teams that already know learn how to do that, and know learn how to do it well, with existing technology specifically designed for its purpose, namely core engineering primitives like React, Postgres (self-hosted) and Fast API.

An information engineer could log in to Loveable, spin up Neon-Postgres, a declarative spark ETL pipeline, and front-end in Databricks. They might. But will they need to add this to their ever-increasing backlog? I’m unsure.

The purpose is the core business will not be growing fast enough to sustain the present valuation so additional lines of business are required. Databricks is sort of a golden goose on the craps table, who continues to avoid rolling the unutterable number. They’ll now proceed making increasingly bets, while all those across the table proceed to profit.

Databricks is topped out as a data-only company.

We’ve written before about ways they might have moved out of this. Spark-structured streaming was an obvious alternative, however the ship has sailed, and it’s firms like Aiven and Veverica which might be now in pole position for the Flink race.

📚 Read: What to not miss in Real-time Data and AI in 2025 📚

To turn into a model-serving company or an ‘AI Cloud’ seems also a tall order. Coreweave, Lambda, and naturally Nebius are all on course to actually challenge the hyperscalers here.

An AI cloud is fundamentally driven by a high availability of GPU-optimised compute. This doesn’t just mean leasing EC2 instances from Jeff Bezos. It means sliding into Jensen Huang’s DMs and buying a ton of GPUs.

Nebius has about 20,000, with one other 30,000 on the best way — this Yahoo report thinks the numbers are higher. All of the AI Clouds lease space in data centres in addition to constructing their very own. Inference, unlike spark, will not be a commodity due to the immense software, hardware, and logistical challenges required.

Allow us to not forget that Nebius owns just over 25% of Clickhouse — each teams being very software engineering-led and Russian; the Yandex Alumni Club.

If there may be one thing we have now learned it’s that it is simpler to go up the worth chain than down it. I wrote about this funnel perhaps two years ago now however it seems truer than ever.

Snowflake easily eats into dbt. Databricks has easily eaten into Snowflake’s warehouse revenue. Microsoft will eat into Databricks’. And in turn, with raw data centre power, NVIDIA and Meta partnerships, and a military of the most effective developers within the business, Nebius can eat into the hyperscalers.

Data warehousing under attack

With every passing day proprietary data warehousing platforms seem increasingly unlikely to be the technical end for AI and Data infrastructure.

Salesforce are increasing levies, databases are supporting cross-query capabilities, CDOs are running Duck DB in Snowflake itself.

Even Bill Inmon acknowledges warehousing firms missed the warehousing!

While convenient, there may be a scale at which enterprises and even late stage start-ups are demanding greater openness, greater flexibility and cheaper compute.

At Orchestra we’ve seen this first-hand. The businesses taking a look at technologies equivalent to Iceberg are overwhelmingly massive. From the biggest telecom providers to the Booking.com’s of this world (who occur to make use of and love Snowflake; more on this later), traditional data warehousing is unlikely to proceed dominating the share of budget it has done for the last decade.

There are just a few ways Snowflake has also tried to expand its core offering:

  • Support for managed iceberg; open compute engine
  • Data cataloging (Select *)
  • Applications (streamlit)
  • Spark and other types of compute like containers
  • AI agents for Analysts AKA snowflake intelligence
  • Transformation (i.e. dbt)

Satirically for a proprietary engine provider, it will appear that Iceberg is a big growth avenue, in addition to AI. See more from TT here.

Snowflake customers adore it.

Data Pangea

I believe the definitions of the pioneers, early adopters, late adopters, and laggards are changing.

Early Adopters now include a heavy real-time component and AI-first approach to the stack. That is more likely to revert to Machine Learning as people realise AI will not be a hammer for each nail.

These firms need to partner with just a few large vendors, and have a high appetite for constructing in addition to buying software. They may have no less than one vendor within the streaming/AI, query engine and analytics space. A superb example is booking.com, or perhaps Fresha, who uses Snowflake, Starrocks, and Kafka (I loved the article below).

📚 Read: Exploring how modern streaming tools power the following generation of analytics with StarRocks. 📚

Early Adopters can have the standard analytics stack after which one other area. They lack the dimensions to totally buy-in to an enterprise-wide data and AI strategy, so give attention to those use-cases they work. Automation, Reporting.

The old “early adopters” would have had the Andreesen Horowitz data stack. That, I’m afraid, isn’t any longer cool, or in. That was the old architecture. The late adopters have the final stack.

The laggards? Who knows. They may probably go together with whoever their CTO knows essentially the most. Be it Informatica (see this incredible reddit post), Fabric, or even perhaps GCP!

The following step: chaos for smaller vendors

Quite a lot of firms are changing tack. Secoda were acquired by Atlassian, Select Star were acquired by Snowflake. Arch.dev, the creators of Meltano, shut-down and passed the project to Matatika. From the massive firms to the small, slowing revenue growth combined with massive pressure from bloated VC rounds make constructing a “Modern-Data Stack”-style company an untenable approach.

📚 Read: The Final Voyage of the Modern Data Stack | Can the Context Layer for AI provide catalogs with the last chopper out of Saigon? 📚

What would occur when the Databricks and Snowflake growth numbers finally begin to slow, as we argue they need to here?

What would occur if there was a big exogenous market shock or OpenAI ran out of cash faster than expected?

What happens as Salesforce increase taxes and hence tools like Fivetran and dbt increase in price ?

An ideal storm for migrations and re-architecturing is brewing. Data infrastructure is amazingly sticky, which suggests in difficult times, firms raise prices. EC2 spot instances have not likely modified much in price over time, and so neither too has data infra compute — and yet even AWS are raising prices of GPUs.

The marginal cost of onboarding an extra tool is becoming very high. We used to construct all the pieces ourselves because it was the one way. But having one tool for each problem doesn’t work either.

Image the creator’s

We should always not forget that Parkinson’s law applies to IT budgets too. Regardless of the budget is, the budget will get spent. Imagine for those who had a tool that helped you automate more things with AI while reducing your wareouse bill and reducing your BI Licenses (typically a big 25–50% P&L budget line) — what do you do?

You don’t pat yourself on the back — you spend it. You spend it on more stuff, doing more stuff. You’ll likely push your Databricks and Snowflake bill back up. But you’ll have more to point out for it.

Consolidation is driving funds back into centre of gravities. These are Snowflake, Databricks, GCP, AWS and Microsoft (and to a lesser extent, palantir). This spells chaos for many smaller vendors.

Conclusion — brace for easier architecture

The Salesforce Tax is a pivotal moment in our industry. Firms like Salesforce, SAP, and ServiceNow all have an immense amount of knowledge and enough clout to maintain it there.

As Data People, anyone who has done a migration from Salesforce to Netsuite knows that migrating these tools might be the largest, most costly, and most painful move anyone faces of their skilled careers.

Salesforce charging infrastructure service providers fees will raise prices, which in turn, combined with the increasingly precarious house of cards we see in AI and Data, all point towards massive consolidation.

ServiceNow’s acquisition of Data.World, I believe, provides some clarity into why we’ll see data teams make more use of existing tooling, simplifying architecture in the method. Data.World is a provider of data graphs and ontologies. By mapping the ServiceNow data schema to an ontology, a gargantuan task, ServiceNow could find yourself with half-decent AI and agents running inside ServiceNow.

AgentForce and Data360 is Salesforce’s attempt, and supposedly already has $1.4bn in revenue, though we suspect it includes loads of legacy in there too.

These providers don’t actually need data running around as AI use cases in Snowflake or Databricks. They need the Procurement Specialists, Finance Professionals, and Marketing Gurus staying in platforms — and so they have the means to make them stay.

This will not be financial advice and this will not be a crazy prediction. To predict that Snowflake and Databricks will find yourself growing more along the analyst consensus is hardly difficult.

But the concept that the largest data firms’ growth is on the verge of slowing is difficult. It challenges the rhetoric. It challenges the AI maximalist discourse.

We’re entering the era of the Great Data Closure. While the AI maximalists dream of a borderless future, the fact is a heavy ceiling built by the incumbents’ gravity. On this latest landscape, the winner isn’t the one with the most effective set of tools, however the folks that take advantage of what they’ve.

About Me

I’m the CEO of Orchestra. We help Data People construct, run and monitor their pipelines easily.

Yow will discover me on Linkedin here.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x