The Ultimate Guide to Power BI Aggregations

used a Composite model feature in Power BI, you would possibly have already heard about one other extremely necessary and powerful concept – Aggregations! It’s because in lots of scenarios, especially with enterprise-scale models, aggregations are a natural “ingredient” of the composite model.

Nevertheless, as composite models feature will also be leveraged without aggregations involved, I assumed that it will make sense to elucidate the aggregations concept in a separate article.

Before we explain how aggregations work in Power BI and try some specific use cases, let’s first answer the next questions:

Before we come to make clear these two points, it’s necessary to have in mind that there are two several types of aggregations in Power BI.

were, until a few years ago, the one aggregation type in Power BI. Here, you’re in command of defining and managing aggregations, regardless that Power BI later robotically identifies aggregations when executing the query.
are considered one of the brisker features in Power BI. With the automated aggregations feature enabled, you possibly can grab a coffee, sit and chill out, as machine learning algorithms will collect the information about probably the most ceaselessly running queries in your reports and robotically construct aggregations to support those queries.

The necessary distinction between these two types, in fact besides the incontrovertible fact that with Automatic aggregations you don’t must do anything except to show this feature on in your tenant, is licensing limitations. While User-defined aggregations will work with each Premium and Pro, automatic aggregations at this moment require a Premium license.

Any more, we’ll discuss user-defined aggregations only, just keep that in mind.

Okay, here’s a short explanation of aggregations and the way in which they work in Power BI. Here is the scenario: you have got a big, very large fact table, which can contain a whole lot of tens of millions, and even billions of rows. So, how do you handle analytical requests over such an enormous amount of knowledge?

Image by writer

You just create aggregated tables! In point of fact, it’s a really rare situation, or let’s say it’s more of an exception than a rule, that the analytic requirement is to see the person transaction, or individual record as the bottom level of detail. In most scenarios, you need to perform evaluation over summarized data: like, how much revenue we had on a particular day? Or, what was the whole sales amount for product X? Further, how much did customer X spent in total?

Moreover, you possibly can aggregate the information over multiple attributes, which is generally the case, and summarize the figures for a particular date, customer, and product.

When you’re wondering what’s the purpose in aggregating the information…Well, the ultimate goal is to scale back the variety of rows and consequentially, reduce the general data model size, by preparing the information prematurely.

So, if I would like to see the whole sales amount spent by customer X on product Y in the primary quarter of the 12 months, I can make the most of having this data already summarized prematurely.

Key “Ingredient” – Make Power BI “aware” of the aggregations!

Okay, that’s one side of the story. Now comes the more interesting part. Creating aggregations per-se shouldn’t be enough to hurry up your Power BI reports – it’s essential to make Power BI aware of aggregations!

Only one remark before we proceed further: aggregation awareness is something that may work only, and provided that the unique fact table uses DirectQuery storage mode. We’ll come soon to elucidate how you can design and manage aggregations and how you can set the correct storage mode of your tables. At this moment, just have in mind that the unique fact table ought to be in DirectQuery mode.

Let’s start constructing our aggregations!

As you could see within the illustration above, our model is fairly easy – consisting of 1 fact table (FactOnlineSales) and three dimensions (DimDate, DimStore, and DimProduct). All tables are currently using DirectQuery storage mode.

Let’s go and create two additional tables that we are going to use as aggregated tables: the primary one will group the information on date and product, while the opposite will use date and store for grouping:

/*Table 1: Agg Data per Date & Product */
SELECT DateKey
       ,ProductKey
       ,SUM(SalesAmount) AS SalesAmount
       ,SUM(SalesQuantity) AS SalesQuantity 
FROM FactOnlineSales 
GROUP BY DateKey
        ,ProductKey

/*Table 2: Agg Data per Date & Store */
SELECT DateKey
       ,StoreKey
       ,SUM(SalesAmount) AS SalesAmount
       ,SUM(SalesQuantity) AS SalesQuantity 
FROM FactOnlineSales 
GROUP BY DateKey
        ,StoreKey

I’ve renamed these queries to Sales Product Agg and Sales Store Agg respectively and closed the Power Query editor.

As we wish to get one of the best possible performance for the vast majority of our queries (these queries that retrieve the information summarized by date and/or product/store), I’ll switch the storage mode of the newly created aggregated tables from DirectQuery to Import:

Now, these tables are loaded into cache memory, but they’re still not connected to our existing dimension tables. Let’s create relationships between dimensions and aggregated tables:

Before we proceed, let me stop for a moment and explain what happened after we created relationships. When you recall our previous article, I’ve mentioned that there are two varieties of relationships in Power BI: regular and limited. This is essential: each time there’s a relationship between the tables from different source groups (Import mode is one source group, DirectQuery is one other), you should have a limited relationship! With all its limitations and constraints.

But, I even have excellent news for you! If I switch the storage mode of my dimension tables to Dual, that signifies that they’ll be also loaded into cache memory, and depending on which fact table provides the information on the query time, dimension table will behave either as Import mode (if the query targets Import mode fact tables), or DirectQuery (if the query retrieves the information from the unique fact table in DirectQuery):

As you could notice, there aren’t any more limited relationships, which is unbelievable!

So, to wrap up, our model is configured as follows:

The unique FactOnlineSales table (with all of the detailed data) – DirectQuery
Dimension tables (DimDate, DimProduct, DimStore) – Dual
Aggregated tables (Sales Product Agg and Sales Store Agg) – Import

Awesome! Now we’ve got our aggregated tables and queries should run faster, right? Beep! Fallacious!

Table visual incorporates exactly these columns that we pre-aggregated in our Sales Product Agg table – so, why on Earth does Power BI runs a DirectQuery as a substitute of getting the information from the imported table? That’s a good query!

Remember once I told you at first that we want to make Power BI of the aggregated table, so it will possibly be utilized in the queries?

Let’s return to Power BI Desktop and do that:

Right-click on the Sales Product Agg table and select the Manage aggregations option:

Just a few necessary remarks here: to ensure that aggregations to work, ! In my case, I had to alter the information style of the SalesAmount column in my aggregated table from “Decimal number” to “Fixed decimal number”.

Moreover, you see the message written in red: which means, when you create an aggregated table, it would be hidden from the tip user! I’ve applied the exact same steps for my second aggregated table (Store), and now these tables are hidden:

Let’s return now and refresh our report page to see if something modified:

Nice! No DirectQuery this time, and as a substitute of just about 2 seconds needed to render this visual, this time it took only 58 milliseconds! Furthermore, if I grab the query and go to DAX Studio to envision what’s happening…

As you see, the unique query was mapped to focus on the aggregated table from import mode, and the message “Match found” clearly says that the information for the visual got here from the Sales Product Agg table! Regardless that our user doesn’t have a clue that this table even exists within the model!

The difference in performance, even on this relatively small dataset, is large!

Multiple aggregated tables

Now, you’re probably wondering why I’ve created two different aggregated tables. Well, let’s say that I even have a question that displays the information for various stores, also grouped by date dimension. As an alternative of getting to scan 12.6 million rows in DirectQuery mode, the engine can easily serve the numbers from the cache, from the table that has just just a few thousand rows!

Essentially, you possibly can create multiple aggregated tables in the information model – not only combining two grouping attributes (like we had here with Date+Product or Date+Store), but including additional attributes (for instance, including Date and each Product and Store within the one aggregated table). This fashion, you’ll increase the granularity of the table, but in case your visual must display the numbers each for product and store, you’ll give you the chance to retrieve results from the cache only!

In our example, as I don’t have pre-aggregated data on the extent that features each product and store, if I include a store within the table, I’m losing the advantage of having aggregated tables:

So, with a view to leverage aggregations, it’s essential to have them defined on the very same level of grain because the visual requires!

Aggregation Precedence

There’s yet one more necessary property to know when working with aggregations – precedence! When the Manage aggregations dialog box opens, there’s an choice to set the aggregation precedence:

This value “instructs” Power BI on which aggregated table to make use of in case the query may be satisfied from multiple different aggregations! By default, it’s set to 0, but you possibly can change the worth. The upper the number, the upper the precedence of that aggregation.

Why is that this necessary? Well, consider a scenario where you have got the foremost fact table with billion of rows. And, you create multiple aggregated tables on different grains:

Aggregated table 1: groups data on Date level – has ~ 2000 rows (5 years of dates)
Aggregated table 2: groups data on Date & Product level – has ~ 100.000 rows (5 years of dates x 50 products)
Aggregated table 3: groups data on Date, Product & Store level – has ~ 5.000.000 rows (100.000 from the previous grain x 50 stores)

Now, let’s say that the report visual displays aggregated data on the date level only. What do you think that: is it higher to scan Table 1 (2.000 rows) or Table 3 (5 million rows)? I consider you realize the reply:) In theory, the query may be satisfied from each tables, so why depend on the arbitrary alternative of Power BI?!

As an alternative, whenever you’re creating multiple aggregated tables, with different levels of granularity, be sure to set the precedence value in a way that tables with lower granularity get priority!

Conclusion

Aggregations are one of the vital powerful features in Power BI, especially in scenarios with large datasets! Despite the incontrovertible fact that each composite models feature and aggregations may be used independently of one another, these two are frequently utilized in synergy, to supply probably the most optimal balance between performance and having all the main points of the information available!

Thanks for reading!

The Ultimate Guide to Power BI Aggregations

Key “Ingredient” – Make Power BI “aware” of the aggregations!