Every thing You Must Know In regards to the Recent Power BI Storage Mode

The aim of this text to offer the reply to the query: “Which one is ‘higher’ — Import or Direct Lake?” since it’s unattainable to reply, as there is no such thing as a one solution to “rule-them-all”… While Import (still) must be a default selection typically, there are particular scenarios by which you would possibly decide to take the Direct Lake path. The foremost goal of the article is to offer details about how Direct Lake mode works behind the scenes and shed more light on various Direct Lake concepts.

If you ought to learn more about how Import (and DirectQuery) compares to Direct Lake, and when to decide on one over the opposite, I strongly encourage you to look at the next video: https://www.youtube.com/watch?v=V4rgxmBQpk0

Now, we are able to start…

I don’t find out about you, but after I watch movies and see some breathtaking scenes, I’m at all times wondering — how did they do THIS?! What sort of tricks did they pull out of their sleeves to make it work like that?

And, I even have this sense when watching Direct Lake in motion! For those of you who may not have heard concerning the recent storage mode for Power BI semantic models, or are wondering what Direct Lake and Allen Iverson have in common, I encourage you to begin by reading my previous article.

The aim of this one is to demystify what happens behind the scenes, how this “thing” actually works, and provide you with a touch about some nuances to take into account when working with Direct Lake semantic models.

Feels like a dream, right? So, let’s try to look at different concepts that enable this dream to come back true…

Framing (aka Direct Lake “refresh”)

Probably the most common query I’m hearing nowadays from clients is — how can we refresh the Direct Lake semantic model? It’s a good query. Since they’ve been counting on Import mode for years, and Direct Lake guarantees an “import mode-like performance”… So, there must be an analogous process in place to maintain your data up to this point, right?

Well, ja-in… (What the heck is that this now, I hear you wondering😀). Germans have an ideal word (one in every of many, to be honest) to define something that might be each “Yes” and “No” (ja=YES, nein=NO). Chris Webb already wrote a great blog post on the subject, so I won’t repeat things written there (go and skim Chris’s blog, that is top-of-the-line resources for learning Power BI). My idea is for example the method happening within the background and emphasize some nuances that is perhaps impacted by your decisions.

But, first things first…

Syncing the info

When you create a Lakehouse in Microsoft Fabric, you’ll robotically get two additional objects provisioned — SQL Analytics Endpoint for querying the info within the lakehouse (yes, you may write T-SQL to READ the info from the lakehouse), and a default semantic model, which incorporates from the lakehouse. Now, what happens when a brand new table arrives within the lakehouse? Well, it depends:)

When you open the Settings window for the SQL Analytics Endpoint, and go to the Default Power BI semantic model property, you’ll see the next option:

Image by writer

This setting means that you can define what happens when a brand new table arrives at a lakehouse. By default, this table WILL NOT be robotically included within the default semantic model. And, that’s the primary point relevant for “refreshing” the info in Direct Lake mode.

At this moment, I even have 4 delta tables in my lakehouse: DimCustomer, DimDate, DimProduct, and FactOnlineSales. Since I disabled auto-sync between the lakehouse and the semantic model, there are currently no tables within the default semantic model!

This implies I first have to add the info to my default semantic model. Once I open the SQL Analytics Endpoint and decide to create a brand new report, I’ll be prompted so as to add the info to the default semantic model:

Okay, let’s examine what happens if a brand new table arrives within the lakehouse? I’ve added a brand new table in lakehouse: DimCurrency.

But, after I decide to create a report on top of the default semantic model, there is no such thing as a DimCurrency table available:

I’ve now enabled the auto-sync option and after just a few minutes, the DimCurrency table appeared within the default semantic model objects view:

So, this sync option allows you to come to a decision if the brand new table from the lakehouse might be robotically added to a semantic model or not.

Syncing = Adding recent tables to a semantic model

But, what happens with the info itself? Meaning, if the info within the delta table changes, do we’d like to refresh a semantic model, like we needed to do when using Import mode to have the newest data available in our Power BI reports?

It’s the appropriate time to introduce the concept of framing. Before that, let’s quickly examine how our data is stored under the hood. I’ve already written concerning the Parquet file format intimately, so here it’s just necessary to take into account that our delta table DimCustomer consists of a number of parquet files (on this case two parquet files), whereas delta_log enables versioning — tracking of all of the changes that happened to DimCustomer table.

I’ve created a brilliant basic report back to examine how framing works. The report shows the name and email address of the client Aaron Adams:

I’ll now go and alter the e-mail address in the info source, from aaron48 to aaron048:

Let’s reload the info into Fabric lakehouse and check what happened to the DimCustomer table within the background:

A brand new parquet file appeared, while at the identical time in delta_log, a new edition has been created.

Once I’m going back to my report and hit the Refresh button…

This happened because my default setting for semantic model refresh was configured to enable change detection within the delta table and robotically update the semantic model:

Now, what would occur if I disable this selection? Let’s check… I’ll set the e-mail address back to aaron48 and reload the info within the lakehouse. First, there may be a new edition of the file in delta_log, the identical as within the previous case:

And, if I query the lakehouse via the SQL Analytics Endpoint, you’ll see the newest data included (aaron48):

But, if I’m going to the report and hit Refresh… I still see aaron048!

Since I disabled the automated propagation of the newest data from the lakehouse (OneLake) to the semantic model, I even have only two options available to maintain my semantic model (and, consequentially, my report) intact:

Enable the “Keep your Direct Lake data up to this point” option again
Manually refresh the semantic model. After I say manually, it could possibly be really manually, by clicking on the Refresh now button, or by executing refresh programmatically (i.e. using Fabric notebooks, or REST APIs) as a part of the orchestration pipeline

Why would you ought to keep this selection disabled (like I did in the newest example)? Well, your semantic model normally consists of multiple tables, representing the serving layer for the top user. And, you don’t necessarily need to have data within the report updated in sequence (table by table), but probably after your entire semantic model is refreshed and synced with the source data.

This technique of keeping the semantic model in sync with the newest version of the delta table is known as .

Within the illustration above, you see files currently “framed” within the context of the semantic model. Once the brand new file enters the lakehouse (OneLake), here’s what should occur to be able to have the newest file included within the semantic model.

The semantic model have to be “reframed” to incorporate the newest data. This process has multiple implications that you need to be aware of. First, and most vital, at any time when framing occurs, all the info currently stored within the memory (we’re talking about cache memory) is dumped out of the cache. That is of paramount importance for the subsequent concept that we’re going to discuss — transcoding.

Next, there is no such thing as a “real” data refresh happening with framing…

Because the Direct Lake “refresh” is only a metadata refresh, it’s normally a low-intensive operation that shouldn’t eat an excessive amount of time and resources. Even when you’ve got a billion-row table, don’t forget — you usually are not refreshing a billion rows in your semantic model — you refresh only about that massive table…

Transcoding — Your on-demand cache magic

Nice, now that you recognize learn how to sync data from a lakehouse together with your semantic model (syncing), and learn how to include the newest “data about data” within the semantic model (framing), it’s time to know what really happens behind the scenes once you place your semantic model into motion!

That is the selling point of Direct Lake, right? Performance of the Import mode, but without copying the info. So, let’s examine the concept of Transcoding…

Let me stop here and put the sentence above within the context of Import mode:

Loading data into memory (cache) is something that ensures a blazing-fast performance of the Import mode
In Import mode, in case you haven’t enabled a Large Format Semantic Model feature, your entire semantic model is stored in memory (it must fit memory limits), whereas in Direct Lake mode, are stored in memory!

To place it simply: bullet point one implies that once Direct Lake columns are loaded into memory, this is completely the identical as Import mode (the one potential difference stands out as the way data is sorted by VertiPaq vs the way it’s sorted within the delta table)! Bullet point two implies that the cache memory footprint of the Direct Lake semantic model could possibly be significantly lower, or within the worst case, the identical, as that of the Import mode (I promise to point out you soon). Obviously, this lower memory footprint comes with a price, and that’s the waiting time for the primary load of the visual containing data that should be “transcoded” on-demand from OneLake to the semantic model.

Before we dive into examples, you is perhaps wondering: how does this thing work? How can it’s that data stored within the delta table might be read by the Power BI engine the identical way because it was stored in Import mode?

The reply is: there may be a process called , which happens on the fly when a Power BI query requests the info. This will not be too expensive a process, because the data in Parquet files is stored very similarly to the way in which VertiPaq (a columnar database behind Power BI and AAS) stores the info. On top of it, in case your data is written to delta tables using the v-ordering algorithm (Microsoft’s proprietary algorithm for reshuffling and sorting the info to attain higher read performance), transcoding makes the info from delta tables look the exact same as if it were stored within the proprietary format of AAS.

Let me now show you the way paging works in point of fact. For this instance, I’ll be using a healthcare dataset provided by Greg Beaumont (MIT license. Go and visit Greg’s GitHub, it’s full of wonderful resources). The actual fact table incorporates ca. 220 million rows, and my semantic model is a well-designed star schema.

Import vs Direct Lake

The concept is the next: I even have two an identical semantic models (same data, same tables, same relationships, etc.) — one is in Import mode, while the opposite is in Direct Lake.

Import mode on the left, Direct Lake mode on the appropriate

I’ll now open a Power BI Desktop and connect with each of those semantic models to create an analogous report on top of them. I want the Performance Analyzer tool within the Power BI Desktop, to capture the queries and analyze them later in DAX Studio.

I’ve created a really basic report page, with just one table visual, which shows the entire variety of records per 12 months. In each reports, I’m ranging from a blank page, as I would like to ensure that that nothing is retrieved from the cache, so let’s compare the primary run of every visual:

As it’s possible you’ll notice, the Import mode performs barely higher through the first run, probably due to the transcoding cost overhead for “paging” the info for the primary time in Direct Lake mode. I’ll now create a 12 months slicer in each reports, switch between different years, and compare performance again:

There is largely no difference in performance (numbers were moreover tested using the Benchmark feature in DAX Studio)! This implies, once the column from the Direct Lake semantic model is paged into memory, it behaves the exact same as within the Import mode.

Nonetheless, what happens if we include the extra column within the scope? Let’s test the performance of each reports once I put the Total Drug Cost measure within the table visual:

And, it is a scenario where Import easily outperforms Direct Lake! Don’t forget, in Import mode, your entire semantic model was loaded into memory, whereas in Direct Lake, only columns needed by the query were loaded in memory. In this instance, since Total Drug Cost wasn’t a part of the unique query, it wasn’t loaded into memory. Once the user included it within the report, Power BI needed to spend a while to transcode this data on the fly from OneLake to VertiPaq and page it within the memory.

Memory footprint

Okay, we also mentioned that the memory footprint of the Import vs Direct Lake semantic models may vary significantly. Let me quickly show you what I’m talking about. I’ll first check the Import mode semantic model details, using VertiPaq Analyzer in DAX Studio:

As it’s possible you’ll see, the scale of the semantic model is nearly 4.3 GB! And, the most costly columns…

“Tot_Drug_Cost” and “65 or Older Total” columns take almost 2 GB of your entire model! So, in theory, even when nobody ever uses these columns within the report, they may still take their justifiable share of RAM (unless you enable a Large Semantic Model option).

I’ll now analyze the DIrect Lake semantic model using the identical approach:

Oh, wow, it’s 4x less memory footprint! Let’s quickly check the most costly columns within the model…

Let’s briefly stop here and examine the outcomes displayed within the illustration above. The “Tot_Drug_Cst” column takes practically your entire memory of this semantic model — since we used it in our table visual, it was paged into memory. But, take a look at all the opposite columns, including the “65 or Older Total” that previously consumed 650 MBs in Import mode! It’s now 2.4 KBs! It’s only a metadata! So long as we don’t use this column within the report, it’ll not eat any RAM.

This means, if we’re talking about memory limits in Direct Lake, we’re referring to a ! Provided that the query exceeds the memory limit of your Fabric capability SKU, it’ll fall back to Direct Query (in fact, assuming that your configuration follows the default fallback behavior setup):

Table from official MS Learn documentation

This can be a key difference between the Import and DIrect Lake modes. Going back to our previous example, my Direct Lake report would work just tremendous with the bottom F SKU (F2).

“You’re hot you then’re cold… You’re in you then’re out…”

There may be a famous song by Katy Perry “Hot N Cold”, where the refrain says: “You’re hot you then’re cold… You’re in you then’re out…” This perfectly summarizes how columns are being treated in Direct Lake mode! The last concept that I would like to introduce to you is the column “temperature”.

This idea is of paramount importance when working with Direct Lake mode, because based on the column temperature, the engine decides which column(s) stay in memory and that are kicked out back to OneLake.

Marc Lelijveld already wrote a great article on the subject, so I won’t repeat all the main points that Marc perfectly explained. Here, I just want to point out you learn how to check the temperature of specific columns of your Direct Lake semantic model, and share some suggestions and tricks on learn how to keep the “fire” burning:)

SELECT DIMENSION_NAME
, COLUMN_ID
, DICTIONARY_SIZE
, DICTIONARY_TEMPERATURE
, DICTIONARY_LAST_ACCESSED
FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS
ORDER BY DICTIONARY_TEMPERATURE DESC

The above query against the DMV Discover_Storage_Table_Columns can provide you with a fast hint of how the concept of “Hot N Cold” works in Direct Lake:

As it’s possible you’ll notice, the engine keeps relationship columns’ dictionaries “warm”, due to the filter propagation. There are also columns that we utilized in our table visual: Yr, Tot Drug Cst and Tot Clms. If I don’t do anything with my report, the temperature will slowly decrease over time. But, let’s perform some actions inside the report and check the temperature again:

I’ve added the Total Claims measure (based on the Tot Clms column) and altered the 12 months on the slicer. Let’s take a take a look at the temperature now:

Oh, wow, these three columns have a temperature 10x higher than the columns not utilized in the report. This fashion, the engine ensures that essentially the most continuously used columns will stay in cache memory, in order that the report performance might be the most effective possible for the top user.

Now, the fair query could be: what happens once all my end users go home at 5 PM, and nobody touches Direct Lake semantic models until the subsequent morning?

Well, the primary user can have to “sacrifice” for all of the others and wait slightly bit longer for the primary run, after which everyone can profit from having “warm” columns ready within the cache. But, what if the primary user is your manager or a CEO?! No bueno:)

I even have excellent news — there may be a trick to pre-warm the cache, by loading essentially the most continuously used columns upfront, as soon as your data is refreshed in OneLake. My friend Sandeep Pawar wrote a step-by-step tutorial on learn how to do it (Semantic Link to the rescue), and you must definitely consider implementing this method if you ought to avoid a foul experience for the primary user.

Conclusion

Direct Lake is admittedly a groundbreaking feature introduced with Microsoft Fabric. Nonetheless, since it is a brand-new solution, it relies on an entire recent world of concepts. In this text, we covered a few of them that I consider a very powerful.

To wrap up, since I’m a visible person, I prepared an illustration of all of the concepts we covered:

Thanks for reading!

Every thing You Must Know In regards to the Recent Power BI Storage Mode

Framing (aka Direct Lake “refresh”)

Syncing the info

Syncing = Adding recent tables to a semantic model

Transcoding — Your on-demand cache magic

Import vs Direct Lake

Memory footprint

“You’re hot you then’re cold… You’re in you then’re out…”

Conclusion

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Dispatch: Partying at certainly one of Africa’s largest AI gatherings

OpenAI enters browser war with Atlas

Scaling Recommender Transformers to a Billion Parameters

Creating AI that matters

Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI

Every thing You Must Know In regards to the Recent Power BI Storage Mode

Framing (aka Direct Lake “refresh”)

Syncing the info

Syncing = Adding recent tables to a semantic model

Transcoding — Your on-demand cache magic

Import vs Direct Lake

Memory footprint

“You’re hot you then’re cold… You’re in you then’re out…”

Conclusion

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.