Methods to Scale Your LLM Usage

has perhaps been an important word on the subject of Large Language Models (LLMs), with the discharge of ChatGPT. ChatGPT was made so successful, largely due to scaled pre-training OpenAI did, making it a robust language model.

Following that, Frontier LLM labs began scaling the post-training, with supervised fine-tuning and RLHF, where models got increasingly higher at instruction following and performing complex tasks.

And just once we thought LLMs were about to plateau, we began doing inference-time scaling with the discharge of reasoning models, where spending gave huge improvements to the standard of outputs.

This infographic highlights the foremost contents of this text. I’ll first discuss why you need to scale your LLM usage, highlighting how it might result in increased productivity. Continuing, I’ll specify how you may increase your LLM usage, covering techniques like running parallel coding agents and using deep research mode in Gemini 3 Pro. Image by Gemini

I now argue we must always proceed this scaling with a brand new scaling paradigm: usage-based scaling, where you scale how much you’re using LLMs:

Run more coding agents in parallel
At all times start a deep research on a subject of interest
Run information fetching workflows

Should you’re not firing off an agent before going to lunch, or going to sleep, you’re wasting time

In this text, I’ll discuss why scaling LLM usage can result in increased productivity, especially when working as a programmer. Moreover, I’ll discuss specific techniques you need to use to scale your LLM usage, each personally, and for firms you’re working for. I’ll keep this text high-level, aiming to encourage how you may maximally utilize AI to your advantage.

Why you need to scale LLM usage

We’ve already seen scaling be incredibly powerful previously with:

pre-training
post-training
inference time scaling

The rationale for that is that it seems the more computing power you spend on something, the higher output quality you’ll achieve. This, in fact, assumes you’re in a position to spend the pc effectively. For instance, for pre-training, with the ability to scale computing relies on

Large enough models (enough weights to coach)
Enough data to coach on

Should you scale compute without these two components, you won’t see improvements. Nevertheless, should you do scale all three, you get amazing results, just like the frontier LLMs we’re seeing now, for instance, with the discharge of Gemini 3.

I thus argue you need to look to scale your personal LLM usage as much as possible. This might, for instance, be firing off several agents to code in parallel, or starting Gemini deep research on a subject you’re serious about.

In fact, the usage must still be of value. There’s no point in starting a coding agent on some obscure task you might have no need for. Moderately, you need to start a coding agent on:

A linear issue you never felt you had time to sit down down and do yourself
A fast feature was requested within the last sales call
Some UI improvements, , today’s coding agents handle easily

This image shows scaling laws, showing how we will see increased performance with increased scaling. I argue the identical thing will occur when scaling our LLM usage. Image from NodeMasters.

In a world with abundance of resources, we must always look to maximise our use of them

My foremost point here is that the brink to perform tasks has decreased significantly for the reason that release of LLMs. Previously, if you got a bug report, you had to sit down down for two hours in deep concentration, eager about the way to solve that bug.

Nevertheless, today, that’s now not the case. As a substitute, you may go into Cursor, put within the bug report, and ask Claude Sonnet 4.5 to try and fix it. You’ll be able to then come back 10 minutes later, test if the issue is fixed, and create the pull request.

What number of tokens are you able to spend while still doing something useful with the tokens

Methods to scale LLM usage

I talked about why you need to scale LLM usage by running more coding agents, deep research agents, and every other AI agents. Nevertheless, it might be hard to consider exactly what LLMs you need to fire off. Thus, on this section, I’ll discuss specific agents you may fire off to scale your LLM usage.

Parallel coding agents

Parallel coding agents are one among the best ways to scale LLM usage for any programmer. As a substitute of only working on one problem at a time, you begin two or more agents at the identical time, either using Cursor agents, Claude code, or every other agentic coding tool. This is often made very easy to do by utilizing Git worktrees.

For instance, I typically have one foremost task or project that I’m working on, where I’m sitting in Cursor and programming. Nevertheless, sometimes I get a bug report coming in, and I robotically route it to Claude Code to make it seek for why the issue is going on and fix it if possible. Sometimes, this works out of the box; sometimes, I even have to assist it a bit.

Nevertheless, the associated fee of starting this bug fixing agent is super low (I can literally just copy the Linear issue into Cursor, which may read the problem using Linear MCP). Similarly, I even have a script robotically researching relevant prospects, which I even have running within the background.

Deep research

Deep research is a functionality you need to use in any of the frontier model providers like Google Gemini, OpenAI ChatGPT, and Anthropic’s Claude. I prefer Gemini 3 deep research, though there are numerous other solid deep research tools on the market.

Each time I’m serious about learning more a couple of topic, finding information, or anything similar, I fire off a deep research agent with Gemini.

For instance, I used to be serious about finding some prospects given a selected ICP. I then quickly pasted the ICP information into Gemini, gave it some contextual information, and had it start researching, in order that it could run while I used to be working on my foremost programming project.

After 20 minutes, I had a transient report from Gemini, which turned out to contain a great deal of useful information.

Creating workflows with n8n

One other option to scale LLM usage is to create workflows with n8n or any similar workflow-building tool. With n8n, you may construct specific workflows that, for instance, read Slack messages and perform some motion based on those Slack messages.

You can, for example, have a workflow that reads a bug report group on Slack and robotically starts a Claude code agent for a given bug report. Or you would create one other workflow that aggregates information from plenty of different sources and provides it to you in an easily readable format. There are essentially limitless opportunities with workflow-building tools.

There are a lot of other techniques you need to use to scale your LLM usage. I’ve only listed the primary few items that got here to mind for me after I’m working with LLMs. I like to recommend at all times keeping in mind what you may automate using AI, and the way you may leverage it to grow to be more practical. Methods to scale LLM usage will vary widely from different firms, job titles, and lots of other aspects.

Conclusion

In this text, I’ve discussed the way to scale your LLM usage to grow to be a more practical engineer. I argue that we’ve seen scaling work incredibly well previously, and it’s highly likely we will see increasingly powerful results by scaling our own usage of LLMs. This may very well be firing off more coding agents in parallel, running deep research agents while eating lunch. Basically, I imagine that by increasing our LLM usage, we will grow to be increasingly productive.

👉 Find me on socials:

📚 Get my free Vision Language Models ebook

💻 My webinar on Vision Language Models

📩 Subscribe to my newsletter

🧑‍💻 Get in contact

🔗 LinkedIn

🐦 X / Twitter

✍️ Medium