Python 3.14 and the End of the GIL

of essentially the most eagerly awaited releases in recent times, is finally here. The rationale for that is that several exciting enhancements have been implemented on this release, including:

Sub-interpreters. These have been available in Python for 20 years, but to make use of them, you needed to drop right down to coding in C. Now they could be used straight from Python itself.

T-Strings. Template strings are a brand new method for custom string processing. They use the familiar syntax of f-strings, but, unlike f-strings, they return an object representing each the static and interpolated parts of the string, as a substitute of an easy string.

A just-in-time compiler. This remains to be an experimental feature and mustn’t be utilized in production systems; nonetheless, it guarantees a performance boost for specific use cases.

There are a lot of more enhancements in Python 3.14, but this text shouldn’t be about those or those we mentioned above.

As an alternative, we might be discussing what might be essentially the most anticipated feature on this release: free-threaded Python, also often called GIL-free Python. Note that regular Python 3.14 will still run with the GIL enabled, but you may download (or construct) a separate, free-threaded version. I’ll show you how one can download and install it, and thru several coding examples, reveal a comparison of run times between regular and GIL-free Python 3.14.

What’s the GIL?

Lots of you might be aware of the Global Interpreter Lock (GIL) in Python. The GIL is a mutex—a locking mechanism—used to synchronise access to resources, and in Python, ensures that just one thread is executing bytecode at a time.

On the one hand, this has several benefits, including making it easier to perform thread and memory management, avoiding race conditions, and integrating Python with C/C++ libraries.

However, the GIL can stifle parallelism. With the GIL in place, true parallelism for CPU-bound tasks across multiple CPU cores inside a single Python process shouldn’t be possible.

Why this matters

In a word, “performance”.

Because free-threaded execution can use all of the available cores in your system concurrently, code will often run faster. As data scientists and ML or data engineers, this is applicable not only to your code but in addition to the code that builds the systems, frameworks, and libraries that you just depend on.

Many machine learning and data science tasks are CPU-intensive, particularly during model training and data preprocessing. The removal of the GIL could lead on to significant performance improvements for these CPU-bound tasks.

A variety of popular libraries in Python face constraints because they’ve needed to work across the GIL. Its removal could lead on to:-

Simplified and potentially more efficient implementations of those libraries
Recent optimisation opportunities in existing libraries
Development of recent libraries that may take full advantage of parallel processing

Installing the free-threaded Python version

In the event you’re a Linux user, the one solution to obtain free threading Python is to construct it yourself. If, like me, you’re on Windows (or macOS), you may install it using the official installers from the Python website. Through the process, you’ll have an option to customize your installation. Search for a checkbox to incorporate the free-threaded binaries. This can install a separate interpreter that you may use to run your code without the GIL. I’ll reveal how the installation works on a 64-bit Windows system.

To start, click the next URL:

https://www.python.org/downloads/release/python-3140

And scroll down until you see a table that appears like this.

Image from Python website

Now, click on the Windows Installer (64-bit) link. Once the executable has been downloaded, open it and, on the primary installation screen that’s displayed, click on the Customize Installation link. Note that I also checked the Add Python.exe to path checkbox.

On the subsequent screen, select the optional extras you desire to add to the installation, then click Next again. At this point, it is best to see a screen like this,

Make sure the checkbox next to Download free-threaded binaries is chosen. I also checked the Install Python 3.14 for all users option.

Click the Install button.

Once the download has finished, within the install folder, search for a Python application file with a ‘t’ on the tip of its name. That is the GIL-free version of Python. The applying file, called Python, is the regular Python executable. In my case, the GIL-free Python was called Python3.14t. You’ll be able to check that it’s been accurately installed by typing this right into a command line.

C:Usersthoma>python3.14t

Python 3.14.0 free-threading construct (tags/v3.14.0:ebf955d, Oct  7 2025, 10:13:09) [MSC v.1944 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

In the event you see this, you’re all set. Otherwise, check that the installation location has been added to your PATH environment variable and/or double-check your installation steps.

As we’ll be comparing the GIL-free Python runtimes with the regular Python runtimes, we must always also confirm that this can also be installed accurately.

C:Usersthoma>python
Python 3.14.0 (tags/v3.14.0:ebf955d, Oct  7 2025, 10:15:03) [MSC v.1944 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

GIL vs GIL-free Python

Example 1 — Finding prime numbers

Type the next right into a Python code file, e.g example1.py

#
# example1.py
#

import threading
import time
import multiprocessing

def is_prime(n):
    """Check if a number is prime."""
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def find_primes(start, end):
    """Find all prime numbers within the given range."""
    primes = []
    for num in range(start, end + 1):
        if is_prime(num):
            primes.append(num)
    return primes

def employee(worker_id, start, end):
    """Employee function to seek out primes in a selected range."""
    print(f"Employee {worker_id} starting")
    primes = find_primes(start, end)
    print(f"Employee {worker_id} found {len(primes)} primes")

def fundamental():
    """Predominant function to coordinate the multi-threaded prime search."""
    start_time = time.time()

    # Get the variety of CPU cores
    num_cores = multiprocessing.cpu_count()
    print(f"Variety of CPU cores: {num_cores}")

    # Define the range for prime search
    total_range = 2_000_000
    chunk_size = total_range // num_cores

    threads = []
    # Create and begin threads equal to the variety of cores
    for i in range(num_cores):
        start = i * chunk_size + 1
        end = (i + 1) * chunk_size if i < num_cores - 1 else total_range
        thread = threading.Thread(goal=employee, args=(i, start, end))
        threads.append(thread)
        thread.start()

    # Wait for all threads to finish
    for thread in threads:
        thread.join()

    # Calculate and print the whole execution time
    end_time = time.time()
    total_time = end_time - start_time
    print(f"All staff accomplished in {total_time:.2f} seconds")

if __name__ == "__main__":
    fundamental()

The is_prime function checks if a given number is prime.

The find_primes function finds all prime numbers inside a given range.

The employee function is the goal for every thread, finding primes in a selected range.

The fundamental function coordinates the multi-threaded prime search:

It divides the whole range into the variety of chunks corresponding to the variety of cores the system has (32 in my case).
Creates and starts 32 threads, each searching a small a part of the range.
Waits for all threads to finish.
Calculates and prints the whole execution time.

Timing results

Let’s see how long it takes to run using regular Python.

C:Usersthomaprojectspython-gil>python example1.py
Variety of CPU cores: 32
Employee 0 starting
Employee 1 starting
Employee 0 found 6275 primes
Employee 2 starting
Employee 3 starting
Employee 1 found 5459 primes
Employee 4 starting
Employee 2 found 5230 primes
Employee 3 found 5080 primes
...
...
Employee 27 found 4346 primes
Employee 15 starting
Employee 22 found 4439 primes
Employee 30 found 4338 primes
Employee 28 found 4338 primes
Employee 31 found 4304 primes
Employee 11 found 4612 primes
Employee 15 found 4492 primes
Employee 25 found 4346 primes
Employee 26 found 4377 primes
All staff accomplished in 3.70 seconds

Now, with the GIL-free version:

C:Usersthomaprojectspython-gil>python3.14t example1.py
Variety of CPU cores: 32
Employee 0 starting
Employee 1 starting
Employee 2 starting
Employee 3 starting
...
...
Employee 19 found 4430 primes
Employee 29 found 4345 primes
Employee 30 found 4338 primes
Employee 18 found 4520 primes
Employee 26 found 4377 primes
Employee 27 found 4346 primes
Employee 22 found 4439 primes
Employee 23 found 4403 primes
Employee 31 found 4304 primes
Employee 28 found 4338 primes
All staff accomplished in 0.35 seconds

That’s a powerful start. A 10x improvement in runtime.

Example 2 — Reading multiple files concurrently.

In this instance, we’ll use the concurrent.futures model to read multiple text files concurrently and count and display the variety of lines and words in each.

Before we do this, we want some data files to process. You need to use the next Python code to try this. It generates 1,000,000 random, nonsensical sentences each and writes them to twenty separate text files, sentences_01.txt, sentences_02.txt, etc.

import os
import random
import time

# --- Configuration ---
NUM_FILES = 20
SENTENCES_PER_FILE = 1_000_000
WORDS_PER_SENTENCE_MIN = 8
WORDS_PER_SENTENCE_MAX = 20
OUTPUT_DIR = "fake_sentences" # Directory to save lots of the files

# --- 1. Generate a pool of words ---
# Using a small list of common words for variety.
# In an actual scenario, you may load a much larger dictionary.
word_pool = [
    "the", "be", "to", "of", "and", "a", "in", "that", "have", "i",
    "it", "for", "not", "on", "with", "he", "as", "you", "do", "at",
    "this", "but", "his", "by", "from", "they", "we", "say", "her", "she",
    "or", "an", "will", "my", "one", "all", "would", "there", "their", "what",
    "so", "up", "out", "if", "about", "who", "get", "which", "go", "me",
    "when", "make", "can", "like", "time", "no", "just", "him", "know", "take",
    "people", "into", "year", "your", "good", "some", "could", "them", "see", "other",
    "than", "then", "now", "look", "only", "come", "its", "over", "think", "also",
    "back", "after", "use", "two", "how", "our", "work", "first", "well", "way",
    "even", "new", "want", "because", "any", "these", "give", "day", "most", "us",
    "apple", "banana", "car", "house", "computer", "phone", "coffee", "water", "sky", "tree",
    "happy", "sad", "big", "small", "fast", "slow", "red", "blue", "green", "yellow"
]

# Ensure output directory exists
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"Beginning to generate {NUM_FILES} files, each with {SENTENCES_PER_FILE:,} sentences.")
print(f"Total sentences to generate: {NUM_FILES * SENTENCES_PER_FILE:,}")
start_time = time.time()

for file_idx in range(NUM_FILES):
    file_name = os.path.join(OUTPUT_DIR, f"sentences_{file_idx + 1:02d}.txt")
    
    print(f"nGenerating and writing to {file_name}...")
    file_start_time = time.time()
    
    with open(file_name, 'w', encoding='utf-8') as f:
        for sentence_idx in range(SENTENCES_PER_FILE):
            # 2. Construct fake sentences
            num_words = random.randint(WORDS_PER_SENTENCE_MIN, WORDS_PER_SENTENCE_MAX)
            
            # Randomly pick words
            sentence_words = random.decisions(word_pool, k=num_words)
            
            # Join words, capitalize first, add a period
            sentence = " ".join(sentence_words).capitalize() + ".n"
            
            # 3. Write to file
            f.write(sentence)
            
            # Optional: Print progress for big files
            if (sentence_idx + 1) % 100_000 == 0:
                print(f"  {sentence_idx + 1:,} sentences written to {file_name}...")
                
    file_end_time = time.time()
    print(f"Finished {file_name} in {file_end_time - file_start_time:.2f} seconds.")

total_end_time = time.time()
print(f"nAll files generated! Total time: {total_end_time - start_time:.2f} seconds.")
print(f"Files saved within the '{OUTPUT_DIR}' directory.")

Here's what the beginning of sentences_01.txt looks like,

Recent then coffee have who banana his their how 12 months also there i take.
Phone go or with over who one at phone there on will.
With or how my us him our sad as do be take well way with green small these.
Not from the 2 that so good slow recent.
See look water me do recent work recent into on which be tree how an would out sad.
By be into then work into we they sky slow that every one who also.
Come use would have back from as after in back he give there red also first see.
Only come so well big into some my into time its banana for come or what work.
How only coffee out solution to just tree when by there for computer work people sky by this into.
Than say out on it how she apple computer us well then sky sky day by other after not.
You completely happy know a slow for for completely happy then also with apple think look go when.
As who for than two we up any can banana at.
Coffee a up of up these green small this us give we.
These we do because how know me computer banana back phone way time in what.

OK, now we will time how long it takes to read those files. Here is the code we’ll be testing. It simply reads each file, counts the lines and words, and outputs the outcomes.

import concurrent.futures
import os
import time

def process_file(filename):
    """
    Process a single file, returning its line count and word count.
    """
    try:
        with open(filename, 'r') as file:
            content = file.read()
            lines = content.split('n')
            words = content.split()
            return filename, len(lines), len(words)
    except Exception as e:
        return filename, -1, -1  # Return -1 for each counts if there's an error

def fundamental():
    start_time = time.time()  # Start the timer

    # List to carry our files
    files = [f"./data/sentences_{i:02d}.txt" for i in range(1, 21)]  # Assumes 20 files named file_1.txt to file_20.txt

    # Use a ThreadPoolExecutor to process files in parallel
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        # Submit all file processing tasks
        future_to_file = {executor.submit(process_file, file): file for file in files}

        # Process results as they complete
        for future in concurrent.futures.as_completed(future_to_file):
            file = future_to_file[future]
            try:
                filename, line_count, word_count = future.result()
                if line_count == -1:
                    print(f"Error processing {filename}")
                else:
                    print(f"{filename}: {line_count} lines, {word_count} words")
            except Exception as exc:
                print(f'{file} generated an exception: {exc}')

    end_time = time.time()  # End the timer
    print(f"Total execution time: {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    fundamental()

Timing results

Regular Python first.

C:Usersthomaprojectspython-gil>python example2.py

./data/sentences_09.txt: 1000001 lines, 14003319 words
./data/sentences_01.txt: 1000001 lines, 13999989 words
./data/sentences_05.txt: 1000001 lines, 13998447 words
./data/sentences_07.txt: 1000001 lines, 14004961 words
./data/sentences_02.txt: 1000001 lines, 14009745 words
./data/sentences_10.txt: 1000001 lines, 14000166 words
./data/sentences_06.txt: 1000001 lines, 13995223 words
./data/sentences_04.txt: 1000001 lines, 14005683 words
./data/sentences_03.txt: 1000001 lines, 14004290 words
./data/sentences_12.txt: 1000001 lines, 13997193 words
./data/sentences_08.txt: 1000001 lines, 13995506 words
./data/sentences_15.txt: 1000001 lines, 13998555 words
./data/sentences_11.txt: 1000001 lines, 14001299 words
./data/sentences_14.txt: 1000001 lines, 13998347 words
./data/sentences_13.txt: 1000001 lines, 13998035 words
./data/sentences_19.txt: 1000001 lines, 13999642 words
./data/sentences_20.txt: 1000001 lines, 14001696 words
./data/sentences_17.txt: 1000001 lines, 14000184 words
./data/sentences_18.txt: 1000001 lines, 13999968 words
./data/sentences_16.txt: 1000001 lines, 14000771 words
Total execution time: 18.77 seconds

Now for the GIL-free version

C:Usersthomaprojectspython-gil>python3.14t example2.py

./data/sentences_02.txt: 1000001 lines, 14009745 words
./data/sentences_03.txt: 1000001 lines, 14004290 words
./data/sentences_08.txt: 1000001 lines, 13995506 words
./data/sentences_07.txt: 1000001 lines, 14004961 words
./data/sentences_04.txt: 1000001 lines, 14005683 words
./data/sentences_05.txt: 1000001 lines, 13998447 words
./data/sentences_01.txt: 1000001 lines, 13999989 words
./data/sentences_10.txt: 1000001 lines, 14000166 words
./data/sentences_06.txt: 1000001 lines, 13995223 words
./data/sentences_09.txt: 1000001 lines, 14003319 words
./data/sentences_12.txt: 1000001 lines, 13997193 words
./data/sentences_11.txt: 1000001 lines, 14001299 words
./data/sentences_18.txt: 1000001 lines, 13999968 words
./data/sentences_14.txt: 1000001 lines, 13998347 words
./data/sentences_13.txt: 1000001 lines, 13998035 words
./data/sentences_16.txt: 1000001 lines, 14000771 words
./data/sentences_19.txt: 1000001 lines, 13999642 words
./data/sentences_15.txt: 1000001 lines, 13998555 words
./data/sentences_17.txt: 1000001 lines, 14000184 words
./data/sentences_20.txt: 1000001 lines, 14001696 words
Total execution time: 5.13 seconds

Not quite as impressive as our first example, but still excellent, showing a greater than 3x improvement.

Example 3 — matrix multiplication

We’ll use the threading module for this. Here is the code we’ll be running.

import threading
import time
import os

def multiply_matrices(A, B, result, start_row, end_row):
    """Multiply a submatrix of A and B and store the lead to the corresponding submatrix of result."""
    for i in range(start_row, end_row):
        for j in range(len(B[0])):
            sum_val = 0
            for k in range(len(B)):
                sum_val += A[i][k] * B[k][j]
            result[i][j] = sum_val

def fundamental():
    """Predominant function to coordinate the multi-threaded matrix multiplication."""
    start_time = time.time()

    # Define the scale of the matrices
    size = 1000
    A = [[1 for _ in range(size)] for _ in range(size)]
    B = [[1 for _ in range(size)] for _ in range(size)]
    result = [[0 for _ in range(size)] for _ in range(size)]

    # Get the variety of CPU cores to make a decision on the variety of threads
    num_threads = os.cpu_count()
    print(f"Variety of CPU cores: {num_threads}")

    chunk_size = size // num_threads

    threads = []
    # Create and begin threads
    for i in range(num_threads):
        start_row = i * chunk_size
        end_row = size if i == num_threads - 1 else (i + 1) * chunk_size
        thread = threading.Thread(goal=multiply_matrices, args=(A, B, result, start_row, end_row))
        threads.append(thread)
        thread.start()

    # Wait for all threads to finish
    for thread in threads:
        thread.join()

    end_time = time.time()

    # Just print a small corner to confirm
    print("Top-left 5x5 corner of the result matrix:")
    for r_idx in range(5):
        print(result[r_idx][:5])

    print(f"Total execution time (matrix multiplication): {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    fundamental()

The code performs matrix multiplication of two 1000×1000 matrices in parallel using multiple CPU cores. It divides the result matrix into chunks, assigns each chunk to a separate process (equal to the variety of CPU cores), and every process calculates its assigned portion of the matrix multiplication independently. Finally, it waits for all processes to complete and reports the whole execution time, demonstrating how one can leverage multiprocessing to hurry up CPU-bound tasks.

Timing results

Regular Python:

C:Usersthomaprojectspython-gil>python example3.py
Variety of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 43.95 seconds

GIL-free Python:

C:Usersthomaprojectspython-gil>python3.14t example3.py
Variety of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 4.56 seconds

Once more, we get almost a 10x improvement using GIL-free Python. Not too shabby.

GIL-free shouldn't be all the time higher.

An interesting point to notice is that on this last test, I also tried it with a multiprocessing version of the code. It turned out that the regular Python was significantly faster (28%) than the GIL-free Python. I won’t present the code, just the outcomes,

Timings

Regular Python first (multiprocessing).

C:Usersthomaprojectspython-gil>python example4.py
Variety of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 4.49 seconds

GIL-free version (multiprocessing)

C:Usersthomaprojectspython-gil>python3.14t example4.py
Variety of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 6.29 seconds

As all the time in these situations, it’s essential to check thoroughly.

Keep in mind that these last examples tests to showcase the difference between GIL and GIL-free Python. Using an external library, corresponding to NumPy, to perform matrix multiplication can be not less than an order of magnitude faster than either.

One other point to notice when you resolve to make use of free-threading Python in your workloads is that not all third-party libraries it is advisable to use are compatible with it. The list of incompatible libraries is small and shrinking with each release, nevertheless it’s something to consider. To view an inventory of those, please click the link below.

https://ft-checker.com

Summary

In this text, we discuss a potentially groundbreaking feature of the newest Python 3.14 release: the introduction of an optional “free-threaded” version, which removes the Global Interpreter Lock (GIL). The GIL is a mechanism in standard Python that simplifies memory management by ensuring just one thread executes Python bytecode at a time. Whilst acknowledging that this could be useful in some cases, it prevents true parallel processing on multi-core CPUs for CPU-intensive tasks.

The removal of the GIL within the free-threaded construct is primarily geared toward enhancing performance. This could be especially useful for data scientists and machine learning engineers whose work often involves CPU-bound operations, corresponding to model training and data preprocessing. This alteration allows Python code to utilise all available CPU cores concurrently inside a single process, potentially resulting in significant speed improvements.

To reveal the impact, the article presents several performance comparisons:

Finding prime numbers: A multi-threaded script saw a dramatic 10x performance increase, with execution time dropping from 3.70 seconds in standard Python to simply 0.35 seconds within the GIL-free version.
Reading multiple files concurrently: An I/O-bound task using a thread pool to process 20 large text files was over 3 times faster, completing in 5.13 seconds in comparison with 18.77 seconds with the usual interpreter.
Matrix multiplication: A custom, multi-threaded matrix multiplication code also experienced a virtually 10x speedup, with the GIL-free version ending in 4.56 seconds, in comparison with 43.95 seconds for the usual version.

Nonetheless, I also explained that the GIL-free version shouldn't be a panacea for Python code development. In a surprising turn, a multiprocessing version of the matrix multiplication code ran faster with standard Python (4.49 seconds) than with the GIL-free construct (6.29 seconds). This highlights the importance of testing and benchmarking specific applications, because the overhead of process management within the GIL-free version can sometimes negate its advantages.

I also mentioned the caveat that not all third-party Python libraries are compatible with GIL-free Python and gave a URL where you may view an inventory of incompatible libraries.

Python 3.14 and the End of the GIL

What’s the GIL?

Why this matters

Installing the free-threaded Python version

GIL vs GIL-free Python

Summary

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

SSL handshake failed Error code 525

Methods to Construct Guardrails for Effective Agents

Conceptual Frameworks for Data Science Projects

Helping scientists run complex data analyses without writing code

Can We Save the AI Economy?

Python 3.14 and the End of the GIL

What’s the GIL?

Why this matters

Installing the free-threaded Python version

GIL vs GIL-free Python

Summary

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.