Blog
How To

Why are Multiprocessing Queues Slow when Sharing Large Objects in Python?

Reading time:
3
min
Published on:
Apr 16, 2025

Python’s multiprocessing.Queue lets different processes pass data around safely between producer and consumer tasks. Pretty handy when you're dealing with parallel processing or working with multiple processes.

But if you've ever tried pushing large objects through one of these queues and noticed everything slowing to a crawl, you’re not imagining things.

Let’s look at why that happens and what you can do about it.

What’s Causing the Slowdown?

There are a few reasons why large objects make multiprocessing.Queue perform poorly:

📦 Pickling and Unpickling

Every time you put something on a queue, Python pickles (serializes) it, sends it to the other process, and unpickles it on the other end. That’s fine for small stuff, but with large objects, all that serializing and deserializing becomes expensive. This can also affect functions that import large data objects or custom modules, making the overall processing time slower than expected.

🔒 Locks (and the GIL confusion)

Even though we’re using multiprocessing (not threading), queues still use locks under the hood to keep things safe. And while the GIL isn’t directly involved between processes, any Python code managing locks still has to play by the interpreter’s rules. So if the queue’s getting hammered with large items, the locks become a bottleneck.

🧠 Memory Copying

When you pass an object through a queue, it gets copied. That’s just how it works. Copying a big NumPy array? That’s going to take time and memory. Lots of both, actually.

A Quick Experiment to Show the Problem

Let’s make this real with a little code.

We’ll run a heavy computation in a bunch of processes and use a queue to signal when each process is done.

Step 1 — Baseline: Small object in the queue

Let’s define a simple function that performs a compute-heavy task using NumPy arrays. We’ll use a multiprocessing pool to create multiple processes that each execute the task and send a result to the queue.

from tqdm import tqdm
import multiprocessing as mp
from time import time
import numpy as np

def heavy_function(n, q):
    for _ in range(n):
        # simulate expensive computation
        a = np.random.random((500, 500))
        b = np.random.random((500, 500))
        _ = a.dot(b)
        q.put(1)

if __name__ == "__main__":
    num_workers = 16
    n = 100
    q = mp.Queue(maxsize=100)

    t0 = time()
    processes = []

    for _ in range(num_workers):
        p = mp.Process(target=heavy_function, args=(n, q))
        p.start()
        processes.append(p)

    for _ in tqdm(range(num_workers)):
        q.get(timeout=10)

    for p in processes:
        p.join()

    print(f"{time() - t0:.1f} seconds.")

Each child process runs independently and puts a message into the queue once the task is complete. We print the results after joining all processes.

On my computer, this finishes in about 7 seconds. All subprocesses stay active (R state in htop) and happily crunch through their work.

Multiprocessing python htop

Step 2 — Large object in the queue

Now change this line:

q.put(1)

To this:

q.put(np.zeros((1500, 1500, 3)))

That’s it: just swap the signal from a tiny integer to a big array.

Now it takes ~145 seconds. More than 20x slower. And if you check htop, you’ll see a bunch of processes in the S (sleeping) state. The processes may appear to be running, but they’re actually blocked on memory transfer or waiting for the queue to clear. They will be basically stuck waiting to push data into the queue while it gets slowly serialized, copied, and handed over.

Multiprocessing python htop

What Can You Do Instead?

If you're hitting this problem, here are some options:

1. Avoid Sharing Big Stuff Over Queues

This is the simplest fix. If you can keep the queue payload small—just send signals or references instead of the full object—you’ll avoid the overhead completely.

2. Use Shared Memory

Python’s multiprocessing.sharedctypes (or shared_memory if you're on Python 3.8+) lets processes work with the same chunk of memory without pickling. Here’s a basic example:

from multiprocessing import Process, Value
from ctypes import c_double

def add_one(shared_val):
    shared_val.value += 1

if __name__ == "__main__":
    val = Value(c_double, 0.0)
    processes = [Process(target=add_one, args=(val,)) for _ in range(10)]

    for p in processes:
        p.start()
    for p in processes:
        p.join()

    print("Final value:", val.value)

This works best for small-ish, fixed-size data. For larger or more complex stuff (like big NumPy arrays), you’ll need to get a bit more fancy—but the performance win is real. If your shared memory involves structured data, remember to block concurrent writes with a lock, or you'll risk thread conflicts.

For more details on implementing shared memory in Python, refer to the official documentation.

3. Write to Disk (But Only If It Makes Sense)

Another option: write the large object to disk and just pass a filepath through the queue. That way you skip pickling altogether. It’s not always faster (especially on spinning disks), and you’ll need to handle cleanup and locking yourself. But it can work well, especially if you're working with images, videos, or other large files anyway.

This method works particularly well when you're dealing with large objects like arrays, video items, or high-resolution image files.

Use a smarter queue + timeout design

Using a queue with maxsize and a timeout on get() or put() can help prevent deadlocks. Make sure your code handles exceptions properly and tests for cases where the queue might be full or empty.

You can also limit queue depth using count controls and use range() inside tasks to chunk work efficiently.

Wrapping Up

Here’s what we’ve seen:

  • Queues get really slow when passing large objects.
  • The culprits? Pickling, copying, and inter-process locking.
  • Avoiding large objects in queues, using shared memory, or falling back to disk are all valid workarounds.

So yeah, multiprocessing.Queue is great… until you try stuffing giant objects through it. If you stick to small signals or switch to shared memory, you’ll keep your app snappy—and your CPU a lot happier. By designing your multiprocessing system carefully—with clear functions, efficient task distribution, and proper use of start() and join() methods—you can reduce the performance cost significantly.

And if you're interested in exploring more about Python's multiprocessing capabilities, check out this Stack Overflow discussion on handling large data with multiprocessing.Queue.

How To

Next steps

Try out our products for free. No commitment or credit card required. If you want a custom plan or have questions, we’d be happy to chat.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
0 Comments
Author Name
Comment Time

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere. uis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

FAQ

Why does putting large objects into a multiprocessing.Queue slow down my program?

When you place large objects into a multiprocessing.Queue, Python serializes (pickles) the object, transfers it between processes, and then deserializes (unpickles) it on the receiving end. This process is resource-intensive for large objects, leading to significant overhead. Additionally, the queue mechanism involves copying the object in memory, which further slows down performance. As a result, processes may spend more time handling data transfer than performing their intended computations.

How can I share large data between processes without using a queue?

To efficiently share large data between processes, consider using shared memory. Python's multiprocessing.shared_memory module (introduced in Python 3.8) allows multiple processes to access the same memory block without the need for serialization. This approach reduces overhead and improves performance. However, it's important to note that shared memory is best suited for data types like NumPy arrays and may not be ideal for complex Python objects. Proper synchronization mechanisms, such as locks, should also be employed to prevent race conditions when multiple processes access shared data.

Is writing data to disk a viable alternative to using queues for inter-process communication?

Writing data to disk can be an alternative to using queues, especially when dealing with large objects. By saving data to a file and passing the file path between processes, you avoid the overhead of serialization and memory copying. This method can be effective if your application already involves disk I/O operations. However, it introduces additional complexity, such as managing file creation, ensuring proper synchronization to avoid conflicts, and handling potential I/O errors. Therefore, while feasible, this approach should be used judiciously and typically only when other methods like shared memory are unsuitable.​