Why You Should Not Replace Blanks with 0 in Power BI

-

watching Jeffrey Wang as a live stream guest with Reid Havens, and one in every of the dozen wonderful things that Jeffrey shared with the audience was the list of optimizations that the DAX engine performs when creating an optimal query plan for our measures.

And, the one which caught my attention was regarding the so-called “Sparse measures”:

Screenshot from the live stream on YouTube

To make it easy, when you define the measure, Formula Engine in VertiPaq will add an implicit NonEmpty filter to the query, which should enable the optimizer to avoid full cross-join of dimension tables and scan only those rows where records for the mixture of your dimension attributes really exist. For folk coming from the MDX world, the NonEmpty function may look familiar, but let’s see how it really works in DAX.

The thing that almost all resonated with me was when Jeffrey advised against replacing BLANKs with zeroes (or whatever explicit values) in Power BI calculations. I’ve already written how you may handle BLANKs and replace them with zeroes, but in this text, I need to give attention to the possible performance implications of this decision.

Setting the stage

Before we start, one vital disclaimer: the suggestion not to switch BLANK with 0 is just that — a suggestion. If the business request is to display 0 as a substitute of BLANK, it doesn’t necessarily mean that it’s best to refuse to do it. In most scenarios, you’ll likely not even notice a performance decrease, but it would rely upon multiple various factors…

Let’s start by writing our easy DAX measure:

Sales Amt 364 Products =
CALCULATE (
    [Sales Amt],
    FILTER ( ALL ( 'Product'[ProductKey] ), 'Product'[ProductKey] = 364 )
)

Using this measure, I need to calculate the overall sales amount for the product with ProductKey = 364. And, if I put the worth of this measure within the Card visual, and switch on Performance Analyzer to ascertain the times for handling this question, I get the next results:

Image by creator

DAX query took only 11ms to execute, and once I switched to DAX Studio, the xmSQL generated by the Formula Engine was quite easy:

Image by creator

And, if I take a take a look at the Query plan (physical), I can see that the Storage Engine found just one existing combination of values to return our data:

Image by creator

Adding more ingredients…

Nonetheless, let’s say that the business request is to investigate data for Product Key 364 on a every day level. Let’s go and add dates to our report:

Image by creator

This was again very fast! I’ll now check the metrics inside the DAX Studio:

Image by creator

This time, the query was expanded to incorporate a Dates table, which affected the work Storage Engine needed to do, as as a substitute of finding just one row, this time, the number is different:

Image by creator

In fact, you won’t notice any difference in performance between these two scenarios, because the difference is simply a number of milliseconds.

But that is only the start; we are only warming up our DAX engine. In each of those cases, as you might see, we see only “filled” values — that combination of rows where each of our requirements are satisfied — product key’s 364 and only those dates where we had sales for this product — if you happen to look thoroughly within the illustration above, dates aren’t contiguous and a few are missing, comparable to January twelfth, January 14th to January twenty first and so forth.

It is because Formula Engine was smart enough to eliminate the dates where product 364 had no sales using the NonEmpty filter, and that’s why the variety of records is 58: we’ve got 58 distinct dates where sales of product 364 weren’t blank:

Image by creator

Now, let’s say that business users also need to see those dates in-between, where product 364 hadn’t made any sales. So, the concept is to display 0$ amount for all those dates. As already described within the previous article, there are multiple alternative ways to switch the BLANKs with zeroes, and I’ll use the  function:

Sales Amt 364 Products with 0 = COALESCE([Sales Amt 364 Products],0)

Principally, the COALESCE function will check all of the arguments provided (in my case, there is simply one argument) and replace the primary BLANK value with the worth you specified. Simply said, it would check if the worth of the Sales Amt 364 Products is BLANK. If not, it would display the calculated value; otherwise, it would replace BLANK with 0.

Image by creator

Wait, what?! Why am I seeing all of the products, after I filtered every part out, except product 364? Let alone that, my table now took greater than 2 seconds to render! Let’s check what happened within the background.

Image by creator

As a substitute of generating one single query, now we’ve got 3 of them. The primary one is precisely the identical as within the previous case (58 rows). Nonetheless, the remaining queries goal the Product and Dates tables, pulling all of the rows from each tables (The product table accommodates 2517 rows, while the Dates table has 1826). Not only that, take a take a look at the query plan:

Image by creator

4.6 million records?! Why on Earth does it occur?! Let me do the mathematics for you: 2.517 * 1.826 = 4.596.042…So, here we had a full cross-join between Product and Dates tables, forcing each tuple (combination of date-product) to be checked! That happened because we forced the engine to return 0 for each single tuple that may otherwise return blank (and consequentially be excluded from scanning)!

This can be a simplistic overview of what happened:

Image by creator

Consider it or not, there’s a chic solution to point out blank values out-of-the-box (but, not with 0 as a substitute of BLANK). You may just simply click on the Date field and decide to :

Image by creator

This can display the blank cells too, but without performing a full cross-join between the Product and Dates tables:

Image by creator

We will now see all of the cells (even blanks) and this question took half the time of the previous one! Let’s check the query plan generated by the Formula Engine:

Image by creator

Not all scenarios are catastrophic!

Truth to be said, we could’ve rewritten our measure to exclude some unwanted records, however it would still not be an optimal way for the engine to eliminate empty records.

Moreover, there are specific scenarios wherein replacing BLANKs with zero won’t cause a major performance decrease.

Let’s examine the next situation: we’re displaying data about the overall sales amount for each single brand. And I’ll add my sales amount measure for product 364:

Image by creator

As you may expect, that was quite fast. But, what is going to occur after I add my measure that replaces BLANKs with 0, which caused havoc within the previous scenario:

Image by creator

Hm, looks like we didn’t should pay any penalty by way of performance. Let’s check the query plan for this DAX query:

Image by creator

Conclusion

As Jeffrey Wang suggested, it’s best to steer clear of replacing blanks with zeroes (or with some other explicit values), as it will significantly affect the query optimizer’s ability to eliminate unnecessary data scanning. Nonetheless, if for any reason you could substitute a blank with some meaningful value, watch out when and easy methods to do it.

As usual, it is dependent upon many alternative features — for columns with low cardinality, or while you’re not displaying data from multiple different tables (like in our example, when we wanted to mix data from Product and Dates tables), or visual types that don’t have to display numerous distinct values (i.e. card visual) — you may get away without paying the performance price. Alternatively, if you happen to use tables/matrices/bar charts that show a variety of distinct values, ensure to ascertain the metrics and query plans before you deploy that report back to a production environment.

Thanks for reading!

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x