Home Artificial Intelligence Add One Line of SQL to Optimise Your BigQuery Tables

Add One Line of SQL to Optimise Your BigQuery Tables

0
Add One Line of SQL to Optimise Your BigQuery Tables

Clustering: A straightforward technique to group similar rows and forestall unnecessary data processing

In my previous article, I explained the way to optimise SQL queries using partitioning:

Now, I’m writing the sequel! (Dad joke, anyone?)

This text will have a look at clustering: one other powerful optimisation technique you should use in BigQuery. Like partitioning, clustering can assist you write more performant queries which can be quicker and cheaper to run. If you ought to develop your SQL toolkit and construct those higher-level Data Science skills, that is an ideal place to start out.

In BigQuery, a clustered table is a table that keeps similar rows grouped together in physical “blocks”.

For instance, picture a table called user_signups that keeps track of all of the people registering an account on a fictitious website. It’s got 4 columns:

  • registration_date: the date on which the user created an account
  • country: the country where the user is predicated
  • tier: the user’s plan (“Free” or “Paid”)
  • username: the user’s username

If we wanted, we could cluster the table by country in order that users from the identical country are stored nearby one another within the table:

LEAVE A REPLY

Please enter your comment!
Please enter your name here