PostgreSQL: Query Optimization for Mere Humans

-

We are going to use it for example of a straightforward query: we would like to count the variety of users that don’t have Twitter handles.

EXPLAIN ANALYZE
SELECT COUNT(*) FROM users WHERE twitter != '';
We will see the execution plan returned from the EXPLAIN ANALYZE clause

It looks cryptic at first, and It’s even longer than our query, and that on a small example of real-world execution plans will be overwhelming should you don’t focus 😭.

Nevertheless it does provide useful information. We will see that the query execution took 1.27 seconds, while the query planning took only 0.4 milli-seconds (negligible time).

We will see the time the query planning and execution took

The execution plan is structured as an inverse tree. In the following figure, you’ll be able to see the execution plan is split into different nodes each one in every of which represents a special operation whether it’s an Aggregation or a Scan.

We will see the time the query planning and execution took

There are numerous sorts of nodes operations, from Scan related (‘Seq Scan’, ‘Index Only Scan’, etc…), Join related( ‘Hash Join’, ’Nested Loop’, etc…), Aggregation related (‘GroupAggregate’, ’Aggregate’, etc…) and others ( ‘Limit’, ‘Sort’, ‘materialize’, etc..). Fortunately you could remember any of this.

Pro Tip #3 💃: Focus is essential, look only on nodes which are problematic.

Pro Tip #4 💃: Cheat ! on the problematic nodes search what they mean within the explain glossary.

Now, let’s drill down into how we all know which node is the problematic one.

There’s numerous information we are able to see on each node

Let’s drill all the way down to what those metrics actually mean.

  • Actual Loops: the variety of loops the identical node executed is 1. To get the entire time and rows, the actual time and rows should be multiplied by loops values.
  • Actual Rows: the actual variety of produced rows of the Aggregate node is 1 (per-loop average and now we have loops is 1).
  • Plan Rows: the estimated variety of produced rows of the Aggregate node is 1. The estimated variety of rows will be off depending on statistics.
  • Actual Startup Time: the time it took to return the primary row in milliseconds of the Aggregate node is 1271.157 (aggregated and includes previous operations).
  • Startup Cost: arbitrary units that represent the estimated time to return the primary row of the Aggregate node is 845110(aggregated and includes previous operations).
  • Actual Total Time: the time it took to return all of the rows in ms of the Aggregate node is 1271.158 (per-loop average and now we have loops is 1 and aggregated and include previous operations).
  • Total Cost: arbitrary units that represent the estimated time to return all of the rows of Aggregate node is 845110 (aggregated).
  • Plan Width: the estimated average size of rows of the Aggregate node is 8 bytes.

Pro Tip #5 💃: be wary of loops, remember to multiply loops whenever you care about Actual Rows and Actual Total Time.

We are going to drill in the following section on a practical example.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x