Efficiency, readability, and scalability
Even though it’s called a question language, SQL is able to not only querying databases but additionally performing efficient data evaluation and manipulation. It isn’t a surprise that SQL is embraced by the information science community.
In this text, we are going to find out about a really handy SQL feature, which allows for writing cleaner and more efficient queries. This I-wish-I-knew-this-earlier feature is the GROUPING SETS
, which might be regarded as an extension of the GROUP BY
function.
We’ll learn the difference between them in addition to the advantage of using GROUPING SETS
over the GROUP BY
function but first, we’d like a dataset to work on.
I created a SQL table from the Melbourne housing dataset available on Kaggle with a public domain license. The primary 5 rows of the table looks as follows:
The GROUP BY function
We are able to use the function to calculate aggregate values per group or distinct values in a column or multiple columns. As an example, the next query returns the typical price for every listing type.
SELECT
type,
AVG(price) AS avg_price
FROM melb
GROUP BY type
The output of this question is:
Multiple groupings
Let’s say you desire to see the typical price for every region within the northern area, which might be achieved by utilizing the GROUP BY
function as follows:
SELECT
regionname,
AVG(price) AS avg_price
FROM melb
WHERE regionname LIKE 'Northern%'
GROUP BY regionname
The output:
Consider a case where you desire to see the typical price of various house types in these two regions in the identical table. You may achieve this by writing two groupings and mixing the outcomes with UNION ALL.
SELECT
regionname,
'all' AS type,
AVG(price) AS average_price
FROM melb
WHERE regionname LIKE 'Eastern%'
GROUP BY regionname
UNION ALL
SELECT
regionname,
type,
AVG(price) AS average_price
FROM melb
WHERE regionname LIKE 'Eastern%'
GROUP BY regionname, type
ORDER BY regionname, type
What the query does is to calculate the typical price for every region first. Then, in a separate query, it groups the rows by each region name and kind and calculates the typical price for every group. The union combines the output of those two queries.
For the reason that first query doesn’t have the kind column, we create it manually with a worth of “all”. Finally, the combined results are ordered by the region name and the kind.
The output of this question:
The primary row for every region shows the region average and the next rows show the typical price for various house types.
We had to put in writing two separate queries because we cannot have different queries in a GROUP BY
statement unless we use GROUPING SETS.
GROUPING SETS
Let’s rewrite the previous query using GROUPING SETS.
SELECT
regionname,
type,
AVG(price) as average_price
FROM melb
WHERE regionname LIKE 'Eastern%'
GROUP BY
GROUPING SETS (
(regionname),
(regionname, type)
)
ORDER BY regionname, type
The output:
The output is similar apart from the null values in the kind column which might easily get replaced with “all”.
Using the GROUPING SETS
has two major benefits:
- It’s shorter and more intuitive which makes the code easier to debug and manage
- It’s more efficient and performant than writing separate queries and mixing the outcomes because SQL scans the tables for every query.
Final thoughts
We regularly disregard query readability and efficiency. We’re joyful if the query returns the specified data.
Efficiency is something we at all times have to have in mind. The impact of writing bad queries could also be tolerated when querying a small database. Nonetheless, when the information size becomes large, bad queries may result in serious performance issues. In an effort to make ETL processes scalable and easy-to-manage, we’d like to adapt best practices. The GROUPING SETS
is considered one of these best practices.
You may turn out to be a Medium member to unlock full access to my writing, plus the remainder of Medium. For those who already are, don’t forget to subscribe if you happen to’d prefer to get an email at any time when I publish a latest article.
Thanks for reading. Please let me know if you will have any feedback.