Take a Deep Dive into Filtering in DAX

-

We all the time use filters when developing DAX expressions, reminiscent of DAX measures, or when writing DAX queries.

But what happens exactly after we apply filters?

This piece is strictly about this query.

I’ll start with easy queries and add variants to explore what happens under the hood.

I take advantage of DAX Studio and the choice to point out server timings for every query.

In case you would like to learn more about this feature and easy methods to interpret the outcomes, read the primary article within the References section at the tip of this piece.

Let’s start with the bottom query:

EVALUATE
	CALCULATETABLE(
				SUMMARIZECOLUMNS('Product'[BrandName]
						,"Online Sales", [Sum Online Sales]
						)
					)
Figure 1 – Executing the query and the outcomes. The outcomes themselves should not essential. However the execution statistics and the SE queries are (Figure by the Creator)

Once we activate the server timings and execute the query, we get the execution statistics and the Storage Engines (SE) query/queries needed to get the info:

Figure 2 – The execution statistics for the query. We see the Storage Engine query to retrieve the outcomes (Figure by the Creator)

As you possibly can see, we want just one Storage Engine (SE) query to retrieve the outcomes.

The query completes in just 47 ms and is served almost entirely by the SE (95.7%).

The more time the SE can spend on a question, the higher, since it is the component that retrieves data from the info stores and tables.
Furthermore, the SE can use multiple CPU cores, whereas the Formula Engine (FE) can use just one. We cannot examine exactly what happens within the FE as easily as we will with SE queries.

You may learn more concerning the difference between these two engines within the article mentioned above.

A brief note:

A couple of months ago, I wrote an article here with a really similar title. But, while that one was only about date filters with Time Intelligence functions, this one goes one step deeper into the rabbit hole.

That is far more generic than that one.

For those who missed it, I added the article link and extra resources on the present topic to the References section below.

Add easy filters

Next, we add a straightforward filter for the product color red to the query:

EVALUATE
	CALCULATETABLE(
				SUMMARIZECOLUMNS('Product'[BrandName]
						,"Online Sales", [Sum Online Sales]
						)
				,'Product'[ColorName] = "Red"
				)

Here is the query and the outcomes restricted to the product color red:

Figure 3 – The query and the outcomes for the info restricted to the product color red (Figure by the Creator)

Once we take a look at the query statistics, we see this:

Figure 4 – The query statistics and the SE query with the filter for red products (Figure by the Creator)

As you possibly can see, the whole query is executed in a single SE query.

The filter is within the query’s WHERE clause. Subsequently, only the restricted data is retrieved.

That is visible within the “Rows” column, as only 14 rows are returned from this question.

But what happens after we use the FILTER() function to filter the products:

EVALUATE
	CALCULATETABLE(
				SUMMARIZECOLUMNS('Product'[BrandName]
						,"Online Sales", [Sum Online Sales]
						)
				,FILTER('Product'
						,'Product'[ColorName]  = "Red")
				)

As you would possibly know, using the FILTER() function just isn’t advisable attributable to how it really works.

You may learn more about this topic within the second article linked within the References section below.

The result doesn’t change:

Figure 5 – The query and the outcomes for the info restricted by the FILTER() function to the product color red (Figure by the Creator)

But how does it affect the execution plan and the SE queries?

Figure 6 – The query statistics and the SE query with the filter for red products, but with the FILTER() function (Figure by the Creator)

As you possibly can see, on this case, the SE optimizes the query, yielding the identical execution plan as before.

But, as we alter our code, we are going to see that using FILTER() isn’t all the time an excellent idea.

Add multiple filters

Now, what happens after we add multiple filters to a question?

EVALUATE
    CALCULATETABLE(
                SUMMARIZECOLUMNS('Product'[BrandName]
                        ,"Online Sales", [Sum Online Sales]
                        )
                ,'Product'[ColorName]  = "Red"
                ,'Geography'[ContinentName] = "Europe"
                )

While the result just isn’t that interesting to us, let’s take a look at the query statistics:

Figure 7 – Query statistics when applying multiple filters to an expression (Figure by the Creator)

Again, the query will be served by a single SE query that accommodates each filters.

The query executes so quickly that the FE time percentage is comparatively high, yet it still only takes 6ms.

When changing the query to make use of the FILTER() function, the SE query doesn’t change either:

Figure 8 – Query statistics when applying multiple filters with FILTER (Figure by the Creator)

This shows that, with this sort of query, the engine can optimize execution to search out probably the most efficient solution to fulfill the DAX query.

Anyway, the result doesn’t change. It’s similar in each cases, appropriately, because we don’t change the filter per se. But please be patient with me; I’m getting back to the FILTER() function and why it’s essential to grasp its effects in a moment.

Moving filters into measures

Next, let’s see what happens when the filter is moved into the measure.

Until now, the query was built in order that the measure [Sum Online Sales] received its filter from outside.

Let’s do that:

DEFINE 
MEASURE 'All Measures'[Online Sales A. Datum] =
		CALCULATE(
			SUMX('Online Sales', ( 'Online Sales'[UnitPrice] * 'Online Sales'[SalesQuantity]) - 'Online Sales'[DiscountAmount] )
			,'Product'[BrandName] = "A. Datum"
			)


EVALUATE
	CALCULATETABLE(
				SUMMARIZECOLUMNS('Product'[BrandName]
						,"Online Sales A. Datum", [Online Sales A. Datum]
						)
				)

As you possibly can see, the filter is applied contained in the measure [Online Sales A. Datum].

After all, the resulting number is similar in each row of the result, because the Brand is about as “A. Datum”:

Figure 9 – The identical result for every brand with the measure containing a filter (Figure by the Creator)

However the execution is barely different:

Figure 10 – Query statistics for the measure containing the filter (Figure by the Creator)

This time, we’ve got two SE queries.

  1. The query to get the sales for the Brand “A. Datum”. This question accommodates the filter for that brand.
  2. The second query is used to get the list for all brands within the result set.

The primary query is most vital to us, since it still shows the filter for the brand set inside the measure.

This question will be fully served by the SE with a straightforward filter in a really efficient way.

But, typically, we wish so as to add multiple measures to a question (or a visible in a report).

What happens after we add the [Sum Online Sales] measure to the query?

The result just isn’t particularly essential, because it shows one column with sales for every brand and one other with sales for the filtered brand.

However the query statistics are interesting:

Figure 11 – Query statistics for the query with one measure containing a filter and one other without. Where is the filter set within the measure? (Figure by the Creator)

As you possibly can see within the red-marked line within the SE query, the Brand filter isn’t any longer present.

Since the engine recognizes that the filter within the measure is applied to the identical column because the one within the query, it moves the filter to the FE and returns the result.

Now, what happens after we filter one other column within the measure, for instance, the colour:

DEFINE 
MEASURE 'All Measures'[Online Sales Red] =
        CALCULATE(
            SUMX('Online Sales', ( 'Online Sales'[UnitPrice] * 'Online Sales'[SalesQuantity]) - 'Online Sales'[DiscountAmount] )
            ,'Product'[ColorName] = "Red"
            )


EVALUATE
    CALCULATETABLE(
                SUMMARIZECOLUMNS('Product'[BrandName]
                        ,"Online Sales", [Sum Online Sales]
                        ,"Online Sales Red", [Online Sales Red]
                        )
                )

Again, the result just isn’t particularly interesting. We’re excited about the query statistics:

Figure 12 – Query statistics for 2 measures with one filtering a column not contained within the result set (Figure by the Creator)

As you possibly can see, this time we’ve got two queries by BrandName. One without and one with the filter for the colour.

Each queries return the identical variety of rows (14) – one for every Brand.

The FE handles combining the 2 results right into a single table.

The whole query continues to be served mainly by the SE, which is superb.

But now, let’s add the FILTER() function to the Filter:

For this instance, I alter the measure to filter for 2 values with the IN operator:

,'Product'[BrandName] IN { "A. Datum", "Adventure Works" }

On this variant, the SE query is just like the ones before.

The filter is passed directly into the query’s WHERE clause.

But what happens when I alter it to this:

,FILTER('Product'

       ,'Product'[BrandName] IN { "A. Datum", "Adventure Works" }

       )

Initially, the result changes:

Figure 13 – Results of the measures with and without FILTER(). In red, the outcomes with the easy filter, and in blue, the outcomes for the measure with FILTER() (Figure by the Creator)

The rationale is that FILTER() works completely in another way.

It retains the prevailing filter context and adds a brand new one.

I explained this behavior in one other article that I added because the second link within the References section below.

Furthermore, the SE cannot handle this in a single query anymore:

Figure 14 – The query statistics for using the FILTER() function within the measure. Now, multiple queries are needed. (Figure by the Creator)

The primary two queries retrieve the values for the brand to filter (See the queries marked in pink).

Notice the big variety of rows (324 and a couple of’560) returned by the primary two queries. That is the materialization of intermediate results needed to perform the calculation.

The third query uses these intermediate results to filter the info (marked in red).

The results of the third query is just two rows—the 2 rows we see in the general result.

As described in my other article, FILTER() should be used with care.

Not only is it considerably slower, however it also works completely in another way from a straightforward filter.

Anyway, I can restore the previous behavior by adding an ALL() within the FILTER() call:

Figure 15 – Adding an ALL() inside the FILTER() call restores the semantics of the easy filter. But why would you do that? (Figure by the Creator)

I don’t wish to hide the undeniable fact that this instance is special, because the filter applied affects the identical column as utilized in the query.

When changing the query to filter the country, the engine can optimize the execution and use the easy form again:

Figure 16 – Here we see that when filtering columns different from the columns utilized in the DAX query, the engine can optimize the execution and fall back to use a straightforward filter. Within the blue inset, you see the outcomes (Figure by the Creator)

As you possibly can see, the engine optimizes the execution of the query and falls back to a straightforward filter when filtering columns that differ from those utilized in the DAX query. Within the blue inset, you see the outcomes.

I see this type of filtering fairly often when developers who should not as proficient write DAX measures.

Using the FILTER() function looks intuitive, but it may possibly yield incorrect or confusing results and is slower than a straightforward filter. I strongly recommend reading my article linked below about this function, in addition to the dax.guide documentation and the articles linked on SQLBI.com.

Moreover, I actually have to type far more than when using a straightforward filter.

As a lazy guy, that is a very important reason not to make use of FILTER() when it’s unnecessary.

Add a fancy filter

Finally, I would like to point out what happens when applying a filter using a DAX function, reminiscent of CONTAINSSTRING().

EVALUATE
	CALCULATETABLE(
				SUMMARIZECOLUMNS('Product'[BrandName]
						,"Online Sales", [Sum Online Sales]
						)
				,CONTAINSSTRING('Online Sales'[SalesOrderNumber], "202402252C")
				)

Such a question is executed while you use a slicer in your report back to filter for a particular order and retrieve the brands of the purchased products.

Because the result just isn’t essential at this point, let’s directly take a look at the query statistics:

Figure 17 – Query statistics for the query using a DAX function to filter the result. You may see that nearly the whole lot of the execution is performed within the FE (Figure by the Creator)

While the query took greater than 6 seconds to finish, 99.6% of the time was spent by the FE executing the CONTAINSSTRING() function to search out matching rows in the info. This operation could be very CPU-intensive, because the FE can use just one core. After I execute this question on my laptop, it takes greater than 2 seconds longer.

I deliberately selected a slow function to exhibit its effects.

However the SE was still in a position to execute the query with a single query. Nevertheless, the positive effect of this fact is negligible on this case.

Conclusion

While it just isn’t my intention to present you advice on what to do and what to not do, I wanted to point out you the implications of the various ways to jot down DAX code and apply filters in your measures or queries.

The DAX engine(s) are very efficient in optimizing the queries, but they’ve limitations.

Subsequently, we must all the time take care when writing our DAX code.

If the performance is poor or the code written by another person looks strange, we must always analyze it to find out easy methods to improve it.

I wanted to point out you easy methods to do it and what to search for when analyzing your DAX code.

Remember:

  • The Storage engine (SE) can use multiple CPU cores.
  • The more work is completed by the SE, the higher.
  • The SE can execute only easy aggregations and basic math functions (like +, -, x, and /)
  • Try to scale back the workload on the Formula Engine (FE)
  • The FE can use just one CPU core.
  • Try to scale back the materialization of information (The Rows column within the query statistics).
  • Try to scale back the variety of SE queries.

I do know that the necessities will force us to jot down DAX code, which just isn’t optimal.

Even worse, the Report designers might add logic to the report that causes a poor performance.

In such cases, eliminate that logic and check the response time again. It could be value exploring making a dedicated measure for such cases. Do not forget that it is feasible to create local measures in a report that’s connected to a Semantic model via a life connection.

But most significantly: Take your time when writing DAX code. You may save time by avoiding the necessity to optimize your DAX code, which was written in a rush. I speak from experience. It is a very bad feeling.

I hope you learned something recent.

References

To learn the main points about easy methods to interpret the outcomes of the Server Timings in DAX Studio, read this piece:

Are you interested by easy methods to use the FILTER() function appropriately? Read this:

One other DAX function that may harm performance is KEEPFILTERS(). To learn more concerning the KEEPFILTERS() function, read this piece:

Here, the mentioned piece about date filters:

An interesting blog post by Data Mozart concerning the Storage engine:

Like in my previous articles, I take advantage of the Contoso sample dataset. You may download the ContosoRetailDW Dataset at no cost from Microsoft here.

The Contoso Data will be used freely under the MIT License, as described on this document. I modified the dataset to shift the info to contemporary dates.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x