What Is a Query Folding in Power BI and Why should You Care?

-

a question folding?” “Does your query fold?”… Perhaps someone asked you those questions, but you were like: “Query…Whaaaat?!”

Or, possibly you’ve heard about query folding in Power BI, but didn’t know the right way to make the most of it in real-life scenarios.

In case you recognized yourself in (at the least) considered one of the 2 situations specified above, then please proceed reading this text.

High-quality, you’re curious to search out out what a Query folding is. But, first things first…Before you proceed, we’d like to determine some theoretical foundations, which is able to put the Query folding feature in the right context.

Data Shaping

 and why it’s considered one of the important thing concepts in the information preparation phase. Now, I would love to expand on that in a (possibly) unusual way:

I suppose you all know in regards to the book written by Thomas More, called “Utopia”.

In that story, all the things is ideal and everyone seems to be satisfied. In a super world, let’s call it “Data Utopia”, we’ve clean, high-quality data that just flies into our reports “as-is”, with no need to perform any type of face-lifting or transformations along the way in which. Unfortunately, “Data Utopia” can exist only in books — the fact is crueler — as we’ve to take care of quite a few challenges while nurturing our data.

That being said, considered one of the important thing concepts that we’ve to soak up is Data Shaping. Data shaping is the method you need to perform when you get accustomed to your data, and turn out to be aware of possible pitfalls inside the data you’re planning to make use of in your  solution.

I’ve intentionally used the term “Business Intelligence” as an alternative of “Power BI”, as this can be a general concept that needs to be used outside of Power BI solutions too.

In easy words, data shaping is the technique of data consolidation, BEFORE it becomes a part of your data model. The important thing thing to take into accout is the word: BEFORE! So, one would perform data shaping before the information goes into the report itself. Data shaping might be done at different places, and, depending on where you apply data shaping techniques, at different time limits in the course of the data preparation process.

Source Database — That is essentially the most obvious selection and typically essentially the most desirable scenario. It relies on traditional data warehousing principles of Extracting-Transforming-Loading (ETL) data. On this scenario, you define what data you desire to extract (not all data from the database is required, and it’s normally not an excellent idea to import all the information). Then, you select in case your data must be transformed along the way in which, to fit your reporting needs higher — for instance, do you desire to perform currency conversion, or do you must conform country and city names?

Do you recognize the town in the next image?

Image by Lukas Kloeppel on Pexels

Yes, it’s Latest York. Or, is it NYC? Or, is it Latest York City? Which considered one of these three names is correct? Yes, all of them are correct — but should you import the information into your data model like this, you’ll get incorrect results — as each Latest York, NYC, and Latest York City shall be treated as a separate entity. This, and plenty of more potential caveats, should be solved in the course of the Data Shaping phase, and that’s why it’s essential to spend a while massaging your data.

Power Query

In case you don’t perform data transformations on the source side, the following station is Power Query — it’s the built-in tool inside Power BI, that lets you perform every kind of transformations to your data. In keeping with Microsoft’s official documentation, you’ll be able to apply greater than 300 different transformations!

The important thing advantage of Power Query is that you may perform complex data transformations with little or no coding skills! Moreover, all steps you’ve applied in the course of the data transformation process are being saved, so each time you refresh your dataset, those steps shall be routinely applied to shape your data and prepare it for consumption via reports.

Under the hood of Power Query is a Mashup engine, that permits your data shaping to run easily. Power Query uses a really powerful M language for data manipulation. And, now you’re probably asking yourselves, what does all this story about data shaping, Power Query, Mashup engine, M language, etc. need to do with Query folding? I don’t blame you, it’s a good query, but we’ll come back soon to reply it.

What’s a Query folding?

For some data sources, corresponding to relational databases, but in addition non-relational data sources, for instance, OData, AD, or Exchange, the Mashup engine is in a position to “translate” M language to a language that the underlying data source will “understand” — typically, it’s SQL.

Photo by Josh Sorenson on Pexels

By pushing complex calculations and transformations on to a source, Power Query leverages the capabilities of the robust relational database engines, which are built to address large volumes of knowledge in essentially the most efficient way.

That ability of Power Query’s Mashup engine to create a single SQL statement combining all M statements behind your transformations is what we call Query folding.

Or, let`s make it easy: .

Data sources that support Query folding

As already mentioned, essentially the most obvious beneficiary of query folding is relational database sources, corresponding to SQL Server, Oracle, or MySQL. Nonetheless, it`s not only that SQL databases make the most of the query folding concept. Essentially, any data source that supports some type of querying language can possibly make the most of query folding. Those other data sources are OData, SSAS, SharePoint lists, Exchange, and Entra ID.

Alternatively, if you use data sources corresponding to Excel files, BLOB storage files, flat files, etc. in your Power BI datasets, the query can’t fold.

Data Transformations that support Query folding

Nonetheless, on the subject of data sources that support query folding typically, it’s essential to take into accout that not all transformations might be folded and pushed to a knowledge source. So, simply to be clear, the incontrovertible fact that a SQL database supports query folding doesn’t necessarily mean that your query will fold! There are some Power Query transformations that simply can`t be pushed to a SQL database engine.

Fairly often, some subtle differences within the Power Query transformations might be decisive in the ultimate consequence, and whether your query will fold or not. I’ll show you a number of of those subtle differences in the next sections.

Generally speaking, the next transformations, when applied in Power Query, might be “translated” to a single SQL statement:

  • Removing columns
  • Renaming columns
  • Filtering rows, with static values or Power Query parameters, as they’re treated as WHERE clause predicates in SQL
  • Grouping and summarizing, that are such as SQL’s Group by clause
  • Merging of foldable queries based on the identical source, as this operation might be translated to JOIN in SQL. After I said, merging of foldable queries — which means it’s going to work should you are joining two SQL server tables, but it’s going to not work should you try to hitch a SQL table and an Excel file
  • Appending foldable queries based on the identical source — this transformation pertains to the UNION ALL operator in SQL
  • Adding custom columns with easy logic. What does easy logic mean? Using M functions which have equivalents in SQL language, for instance, mathematical functions, or text manipulation functions
  • Pivot and Unpivot transformations

Alternatively, some transformations that can prevent the query from folding are:

  • Merging queries based on different sources, as explained previously
  • Appending (union-ing) queries based on different sources — similar logic as within the previous case
  • Adding custom columns with complex logic or using some M functions that don’t have a counterpart in SQL
  • Adding index columns
  • Changing a column data type. This one is a typical “it depends” case. I’ll show you soon what it relies on, but just take into accout that changing a column data type might be each a foldable and a non-foldable transformation

Now, let’s examine why it is crucial to realize this behavior — or, possibly it’s higher to say, why must you care if the query folds or not?

Why must you care about Query folding?

Once you’re using Import mode in Power BI, the information refresh process will work more efficiently when the query folds, each by way of refresh speed and resource consumption.

In case you are working with DirectQuery or Dual storage mode, as you’re targeting the SQL database directly, all of your transformations MUST fold — or your solution won’t work.

Finally, query folding can be of key importance for Incremental refresh — it’s so essential that Power BI will warn you once it determines that question folding can’t be achieved. It can not break your incremental refresh “per-se”, but without query folding in place, an incremental refresh wouldn’t serve its major purpose — to scale back the quantity of knowledge that should be refreshed in your data model — as without query folding, Mashup engine must retrieve all data from the source after which apply subsequent steps to filter the information.

With all these in mind, you need to are likely to achieve query folding each time possible.

Slow report — don’t blame Query folding!

One essential disclaimer here, and that is considered one of the important thing takeaways from this series of blog posts: in case your report is slow, or your visuals need quite a lot of time to render, or your data model size is large, query folding has nothing to do with it!

Provided that your data refresh or incremental refresh is slow and inefficient, you need to investigate your Power Query steps in additional depth.

All or nothing?

Just a few more things to take into accout regarding query folding. It’s not an all-or-nothing process. Meaning if you might have, let’s say, 10 transformation steps inside Power Query, and your query folds until the sixth step, you’ll still get some profit from partial query folding. Nonetheless, once the query folding is broken, it might probably’t be achieved anymore.

Image by writer

To simplify, if you might have 10 transformation steps, and your query folding is broken within the fifth step, all previous steps will fold, but once the folding is broken, it might probably’t be achieved again, even when you might have transformations that support query folding by default in steps 6 to 10 — like in our example where filtering needs to be a foldable step, these steps won’t fold. Keep that in mind, and check out to push all non-foldable steps down the pipeline as much as possible.

How do you realize if the query folds?

Okay, now we aren’t rookies anymore. We all know what query folding is, why we should always strive to realize it, and a few subtle tricks that could make an enormous difference.

Now, it’s time to learn the right way to check if the precise query folds or not. The primary and most blatant way is to right-click on the step and check what the  option looks like.

If it’s greyed out, this step probably doesn’t fold. Alternatively, should you are in a position to click on this selection, that implies that your query will fold. I suppose you’re possibly confused with the word: PROBABLY!

Image by writer

But, that’s the right word, as you’ll be able to’t be 100% sure that if the  option is disabled, your query doesn’t fold. I’ll show you later how this selection can trick us into pondering that the query folding was broken, although, in point of fact, folding actually occurs.

As an alternative, when you desire to make sure in case your query folds or not, you should utilize the  feature inside Power Query Editor, or , like an excellent old and reliable option to check the queries sent to a database by the Power BI engine.

Moreover, there may be a cool feature in Power Query Online, where each step is marked with the icon that shows if that step folds, doesn’t fold, or is unknown. As I said, this feature is on the market only in Power Query Online at this moment, so let’s hope that the Power BI team will implement it within the Desktop version soon.

Image by writer

The devil is in the small print…

High-quality…You’ve probably heard in regards to the saying that the devil is in the small print. Now, it’s time to know how little nuances could make an enormous difference in our data transformation process.

Let’s start with some of the curious cases in Power Query editor…

Devil #1 — Merge Join

This one could be very interesting, as you’ll hardly assume what is occurring within the background. Let’s say that I would like to mix two of my queries into one. I’ll use the  sample database, and I want to merge the FactInternet Sales and DimCustomer tables.

I’ll remove a few of the columns from my fact table, and keep only the CustomerKey column, as this can be a foreign key to a DimCustomer table, and the Sales Amount column. I’ll join the DimCustomer table because it is, with none additional steps before merging.

Image by writer

Merging tables is such as JOIN operation in SQL. Essentially, we elect the column on which we wish to perform MERGE operation, and the kind of join (left, outer, or inner).

Image by writer

The issue is that by default, if you’re merging two queries, Power Query will generate a nested join statement, which may’t be properly translated in SQL.

Image by writer

If I am going to the Tools tab and click on on Diagnose Step, I can see that the Mashup engine fired two separate queries to my underlying SQL Server database — in other words, those two queries couldn’t be executed as a single SQL statement, and that implies that query didn’t fold!

Image by writer

How will we solve this? Let’s just select a blank query and write our M code by hand to realize the exact same result.

Image by writer

The important thing thing is that we’ll use the same, but still different M function: .

We at the moment are using Table.Join function – Image by writer

All function arguments are the exact same as previously, and let’s now check the consequence.

You remember once I told you that when the  is greyed out, your query probably doesn’t fold, however it’s not 100% correct. And, that is an excellent example. In case you take a have a look at , it still shows that our query doesn’t fold…

Image by writer

…but let’s go to Diagnostics and check if that’s true.

Image by writer

Oh, boy, we were tricked — this step indeed folded! As you’ll be able to see within the illustration above, we’ve a single SQL query generated and sent to a SQL Server source database to be executed.

So, we found two devils in this instance — the primary one was a join type, which we were in a position to solve by tweaking the routinely generated M code. And, the opposite one was the wrong behavior of the  option. I’ll show you in the following a part of the series yet another example when  lies.

Query folding in Power BI — tricks, lies & ultimate performance test

I assume you at the moment are accustomed to the concept of query folding in Power BI, and particularly with its importance for data refresh and incremental refresh processes. We’ve also began to scratch some interesting behaviors of Power Query transformations, and on this final a part of the article, I’ll show you a number of more interesting findings.

Finally, we’ll wrap it up with the last word performance test — I’ll show you the precise numbers behind two similar queries — one folds, and the opposite doesn’t!

Changing Data types

One of the crucial common transformations in Power Query is changing data type. It’s a widely known best practice to make use of proper data types in your data model — for instance, should you don’t need hours, minutes, and seconds level of granularity in your reports, you have to be higher off eliminating them and changing the information kind of that column from Date/Time to Date only.

Nonetheless, the road to hell is paved with good intentions:)…So, let me show you one subtle difference that may cause your query to turn out to be rattling slow, although you’ve stuck with the advice to make use of a correct data type!

Image by writer

As you’ll be able to spot within the illustration above, my OrderDate column is of Date/Time data type. And, I would like to modify it to Date only. There are (at the least) two possible options to do that — the primary one is to right-click on the column, expand the drop-down for the Change Type option (like I did within the illustration), and choose Date type (slightly below the Date/Time):

Image by writer

Just a few essential things happened here, so let me explain each of those:

  1. Within the Applied Steps pane, you’ll be able to notice that our transformation step had been recorded
  2. Within the column itself, you’ll be able to see that the time portion disappeared
  3. After I’ve opened the View Native Query dialog box, you’ll be able to see that the Mashup engine nicely translated our transformation to a T-SQL  function
  4. The M formula applied to this transformation step is: 

Let’s now examine the opposite option to vary data kind of our column:

Image by writer

Slightly below our previous Change Type option, there may be a Transform option. When you expand the drop-down, you’ll be able to see the Date Only transformation. Let’s click on it and check what happens:

Image by writer

Looks quite similar, does it? But, let’s walk through all of the things that happened now:

  1. As an alternative of the Modified Type step, we now have a step called Extracted Date
  2. The column itself looks the exact same as within the previous example — no time part in there
  3. Ooops, the query doesn’t fold anymore! As you’ll be able to see, the View Native Query option is greyed out!
  4. This time, M formula applied is: 

So, one single different word within the M formula ( vs ) affected our query so hard that it couldn’t be translated to SQL!

Takeover from this story: watch out, and be careful if you’re selecting options for changing data types!

Liar, Liar…

I’ve promised within the previous a part of the article that I’ll show you yet another example when the View Native Query option can idiot you into pondering that question folding was broken, even when in point of fact it’s not true…

Let’s say that we wish to maintain only the highest X rows from our table. In my case, I would like to preserve the highest 2000 rows from my fact table:

Image by writer

Once I’ve applied this step and checked the View Native Query, I can realize that my query folds, as my transformation was translated to a TOP clause in SQL:

Image by writer

Now, let’s say that I would like to use Absolute value transformation on my Sales Amount column. Normally, this transformation easily folds, as there may be an ABS function in T-SQL:

Image by writer

Nonetheless, if I right-click on this step, I’ll see that the View Native Query option is greyed out, so I might assume that this step broke my query folding!

Image by writer

Let’s check this in our Query Diagnostics tool:

Image by writer

Oh, my God! This step folded indeed! So, we were tricked by the View Native Query option again!

The important thing takeover here is: each time you’re assuming that a selected transformation step might be folded (like in this instance, after we knew that SQL has an ABS function to support our transformation), double-check what really happens under the hood!

The last word performance test

Okay, if I didn’t manage to persuade you up to now, why you need to strive to realize query folding, let me now pull my last ace up my sleeve!

I would like to point out you the difference in data refresh performance between the queries that return the exact same results — considered one of them folds, and the opposite doesn’t!

Test #1 Query folding ON

For this testing, I’ll use the FactOnlineSales table from the Contoso sample database. This table has around 12.6 million rows, and it’s good to exhibit the magnitude of importance of the query folding concept.

In the primary example, I’ve applied 9 different transformation steps, and all of them are foldable, as you’ll be able to see in the next illustration:

Image by writer

Don’t listen to the SQL code that the Mashup engine generated: should you are a SQL skilled, in fact, you may write rather more optimal SQL code — nevertheless, take into accout that with auto-generated scripts by the Mashup engine, you aren’t getting the !

I’ll hit Close & Apply and switch on my stopwatch to measure how much time my data refresh lasts.

Image by writer

This question took 32 seconds to load 2.8 million records in my Power BI report. Data was loaded in batches of 100.000–150.000 records, which is an excellent indicator that the query folding is in place.

Test #2 Query folding OFF

Now, I’ll return to Power Query Editor, and intentionally break query folding on the third step (remember the instance above with changing Date/Time type to Date), using the transformation for which I do know that just isn’t foldable:

Image by writer

Truth to be said, I’ll achieve a partial folding here, as first two steps will fold, but all subsequent steps after the Extracted Date transformation won’t fold!

Let’s activate the stopwatch again and check what happens:

Image by writer

The very first thing to note: this question took 4 minutes and 41 seconds to load into our Power BI report, which is roughly  than in our previous case when the query folded. This time, batches of loaded data were between 10.000 and 20.000 records.

But, what’s much more concerning — you’ll be able to see that the full variety of records loaded was almost 11 million!!! As an alternative of two.8 million within the previous example! Why is it happening? Well, within the previous sections, I explained that when the Mashup engine can’t translate M language to SQL, it needs to tug ALL the information (from the moment when the query folding was broken), and THEN apply transformations on the entire chunk of imported data!

The outcome is strictly the identical — we’ve 2.830.017 records in our Power BI report — but, with query folding in place, all obligatory transformations were performed on the SQL database side, and the Mashup engine got an already prepared data set. While within the second scenario, after we broke the query folding, the Mashup engine pulled the entire remaining rows (approx. 11 million), and only after that was it in a position to apply other transformation steps.

And, this was only a basic example, with one single table, and never so big by way of data volume! Simply imagine the magnitude of implications on a bigger dataset, with multiple tables in it.

Conclusion

Well, we covered quite a bit in this text. We learned in regards to the data shaping concept, we introduced Power Query fundamentals, and we also learned what query folding is and why we should always do our greatest to realize it.

I’ve also shared with you some basic examples and neat tricks on the right way to achieve query folding in some common use cases.

In the long run, please remember that the query folding is a piece in progress, and people from the Power BI team are continually improving this feature. So, it might probably occur that a few of the issues with query folding I’ve shown you listed below are resolved within the meantime. Due to this fact, make sure to not sleep to this point with the newest improvements.

Thanks for reading!

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x