Home Artificial Intelligence An Easy Guide to Master Moving Average and Running Total in SQL

An Easy Guide to Master Moving Average and Running Total in SQL

8
An Easy Guide to Master Moving Average and Running Total in SQL

Scenario

Say, We now have a table named “ARTICLES” that captures day by day view counts for various articles from 1st June 2023 to tenth June 2023. Each entry within the table represents a date, article title, and the variety of views recorded on that day. You’ll find the source data and code file in my GitHub Repository,

Here is the sample data,

--sample data from table ARTICLES
SELECT
*
FROM
ARTICLES
LIMIT 10;
Image by writer

Probably the most convenient and easy approach to compute each moving average and running total in SQL is by utilising Window Functions. To revisit the concepts of Window Functions and Aggregate Functions, you may read the detailed explanation available here:

Calculating Running Total using Window Function

Continuing to our demo, imagine you have to discover a cumulative sum of the whole views on article “Aggregate Functions in SQL” by the top of every day,

--total number for views for "Aggregate Functions in SQL" by end of every day
SELECT
`DATE`,
ARTICLE_TITLE,
NO_OF_VIEWS,
SUM(NO_OF_VIEWS) OVER (ORDER BY `DATE`
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RUNNING_TOTAL
FROM
ARTICLES
WHERE
ARTICLE_TITLE = "Aggregate Functions in SQL";
Image by writer

Here, we’ve used OVER() clause within the above query, it is important because it identifies a function as a Window Function and its purpose is to define a selected group of rows (a window) on which the Window Function will perform its calculations. But wait, that’s not all.

Within the code above we’ve also used a FRAME clause as,

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

What’s it? Essentially, the Window Function relies on either ROW or RANGE to find out which values must be considered for the calculation throughout the partition, by specifying the starting and ending points of the chosen subset.

So here, the FRAME clause specifies the dimensions of the frame – the worth of the present row and the values of all of the rows above the present row – on which the SUM(NO_OF_VIEWS) must be performed. It keeps adding the worth of “NO_OF_VIEWS” because it goes along, giving us a running total for every row ordered by DATE.

In the instance above, if we omit the RANGE clause, the result will remain unchanged. Are you able to guess why? At any time when we use ORDER BY clause in a Window Function, the default frame is, ‘RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW’. Nonetheless, personally, I find it helpful to incorporate the RANGE or ROW clause because it enhances clarity and understanding. Inevitably, there can be a degree in the longer term when another person inherits your code. It’s all the time advisable to put in writing your code in a way that makes it easier for others to know and work with it.

You may read more about FRAME clause here,

Now let’s do the identical evaluation for all of the articles – discover a cumulative sum of the whole views of every article by the top of every day,

--running total
SELECT
`DATE`,
ARTICLE_TITLE,
NO_OF_VIEWS,
SUM(NO_OF_VIEWS) OVER (PARTITION BY ARTICLE_TITLE
ORDER BY `DATE`
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RUNNING_TOTAL
FROM
ARTICLES;
GIF by writer

Here, we divided the information into partitions based on the ARTICLE_TITLE. Then, we performed a SUM(NO_OF_VIEWS) calculation for every partition as defined by the FRAME clause.

For a greater understanding, please seek advice from the image provided below. It illustrates the calculations performed for a single partition. The identical logic and calculations are applicable to all other partitions as well.

8 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here