Faster Is Not All the time Higher: Selecting the Right PostgreSQL Insert Strategy in Python (+Benchmarks)

demonstrates that it’s perfectly possible to insert 2M records per second into Postgres. As an alternative of chasing micro-benchmarks, in this text we’ll step back to ask a more vital query: Which abstractions actually matches our workload?

We’ll have a look at 5 ways to insert data into Postgres using Python. The goal shouldn’t be to look just at insert speeds and crown a winner but to know the trade-offs between abstraction, safety, convenience and performance.

In the long run you’ll understand:

the strengths and weaknesses of ORM, Core and driver-level inserts
when performance actually matters
learn how to select the appropriate tool without over-engineering

Why fast inserts matter

High-volume insert workloads show up in all places:

loading tens of millions of records
syncing data from external APIs
backfilling analytics tables
ingesting events or logs into warehouses

Small inefficiencies compound quickly. Turning a 3-minute insert job right into a 10-second one can reduce system load, unlock staff and improve overall throughput.

That said, faster doesn’t routinely mean higher. When workloads are small sacrificing clarity and safety for marginal gains rarely pays off.

Understanding performance matters and is the true goal.

Which tool can we use to insert with?

To confer with our Postgres database we want a database driver. In our case that is psycopg3 with SQLAlchemy layered on top. Here’s a fast distinction:

Psycopg3 (the motive force)

psycopg3 is a low-level PostgreSQL driver for Python. This can be a very thin abstraction with minimal overhead that talks to Postgres directly.
The trade-off is responsibility: you write SQL yourself, manage bathing and handle correctness explicitly.

SQLAlchemy

SQLAlchemy sits on top of database drivers like psycopg3 and provides two layers:

1) SQLAlchemy Core
That is the SQL abstraction and execution layer. It’s database-agnostic which implies that you just write Python expressions and Core will translate them into SQL in the right database-dialect (PostgreSQL / SQL Server / SQLite) and safely binds parameters.

2) SQLAlchemy ORM
ORM is built on top of Core and abstracts much more. It maps Python classes to tables, tracks object state and handles relationships. The ORM is very productive and protected, but all that bookkeeping introduces overhead, especially for bulk operations.

Briefly:
All three exist on a spectrum. On one side there’s ORM, which takes lots of work out of your hands an provides lots of safety at the price of overhead. On the opposite side there’s the Driver could be very bare-bones but provides maximum throughput. Core is correct in the center and provides you a pleasant balance of safety, performance and control.

Simply said:

ORM helps you employ the Core more easily
Core helps you employ the Driver more safely and database-agnostic

The benchmark

To maintain the benchmark fair:

each method receives data in the shape its designed for
(ORM objects for ORM,dictionaries for Core, tuples for the Driver)
only the time spent moving data from Python into Postgres is measured
no method is penalized for conversion work
The database exists in the identical environment as our Python script; this prevents out benchmark from begin bottle-necked by upload speed e.g.

The goal shouldn’t be to “find the fastest insert” but to know what each method does well.

Insertion times per batch size for five different methods

1) Faster is at all times higher?

What is healthier? A Ferrari or a Jeep?

This will depend on the
Should you’re traversing a forest go along with the Jeep. Should you want be the primary across the finish line, the Ferrari is a greater alternative.

The identical applies with inserting. Shaving 300 milliseconds off a 10-second insert may not justify extra complexity and risk. In other cases, that gain is totally price it.

In some cases, the fastest method on paper is the if you account for:

maintenance cost
correctness guarantees
cognitive load

2) What’s your Starting Point?

The appropriate insertion strategy less on row count and more on what your data already looks like

The ORM, Core and the motive force are usually not competing tools. They’re optimized for various purposes:

Method	Purpose
ORM (`add_all`)	Business logic, correctness, small batches
ORM(`bulk_save_object`)	ORM objects at scale
Core (`execute`)	Structured data, light abstraction
Driver (`executemany`)	Raw rows, high throughput
Driver (`COPY`)	Bulk ingestion, ETL, firehose workloads

An ORM excels in CRUD-heavy applications where clarity and safety are most vital. Think of internet sites and API’s. Performance is generally “ok” and clarity matters more.

Core shines in situations where you wish control without writing raw SQL. Think data ingestion, batch jobs, analytics pipelines and performance-sensitive services like ETL jobs.
You already know exactly what SQL you wish but you don’t need to manage connections or dialect differences yourself.

The Driver is optimized for optimum throughput; extremely large writes like writing tens of millions of rows for ML training sets, bulk loads, database maintenance or migrations or low-latency ingestion services.

The driving force minimizes extraction and python overhead and provides you the best throughput. The downside is that you’ve gotten to manually write SQL, making it easy to make mistakes.

3) Don’t mismatch abstractions

The ORM isn’t slow. COPY isn’t magic

Performance problems appear after we force data through an abstraction it’s not designed for:

Using Core with SQLAlchemy ORM objects – >slow because of conversion overhead
Using ORM with tuples – >awkward and brittle
ORM bulk in ETL process – >wasted overhead

Sometimes dropping to a lower level can actually performance.

When to decide on which?

Rule of thumb:

Layer	Use it when…
ORM	You’re constructing an application (correctness and productivity)
Core	You’re moving or transforming data (balance between safety and speed)
Driver	You’re pushing performance limits (raw power and full responsibility)

Conclusion

In data and AI systems, performance isn’t limited by the database. It is proscribed by how well our code aligns with the form of the information and the abstractions we elect.

ORM, Core and Driver-level APIs form a spectrum from high-level safety to low-level power. All are excellent tools when utilized in the context they’re designed for.

The actual challenge isn’t knowing which is fasted, it’s in choosing the appropriate tool for you situation.

I hope this text was as clear as I intended it to be but when this shouldn’t be the case please let me know what I can do to make clear further. Within the meantime, take a look at my other articles on every kind of programming-related topics.

Completely happy coding!

— Mike

P.s: like what I’m doing? Follow me!

Faster Is Not All the time Higher: Selecting the Right PostgreSQL Insert Strategy in Python (+Benchmarks)

Why fast inserts matter

Which tool can we use to insert with?

Psycopg3 (the motive force)

SQLAlchemy

The benchmark

1) Faster is at all times higher?

2) What’s your Starting Point?

3) Don’t mismatch abstractions

When to decide on which?

Conclusion

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Why Your Multi-Agent System is Failing: Escaping the 17x Error Trap of the “Bag of Agents”

BLOOMZ on Habana Gaudi2 Accelerator

Multi-Attribute Decision Matrices, Done Right

Accelerating Stable Diffusion Inference on Intel CPUs

xAI’s Grok Imagine climbs the leaderboards

Faster Is Not All the time Higher: Selecting the Right PostgreSQL Insert Strategy in Python (+Benchmarks)

Why fast inserts matter

Which tool can we use to insert with?

Psycopg3 (the motive force)

SQLAlchemy

The benchmark

1) Faster is at all times higher?

2) What’s your Starting Point?

3) Don’t mismatch abstractions

When to decide on which?

Conclusion

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.