Python’s multiprocessing.Queue lets different processes pass data around safely between producer and consumer tasks. Pretty handy when you're dealing with parallel processing or working with multiple processes.
But if you've ever tried pushing large objects through one of these queues and noticed everything slowing to a crawl, you’re not imagining things.
Let’s look at why that happens and what you can do about it.
What’s Causing the Slowdown?
There are a few reasons why large objects make multiprocessing.Queue
perform poorly:
📦 Pickling and Unpickling
Every time you put something on a queue, Python pickles (serializes) it, sends it to the other process, and unpickles it on the other end. That’s fine for small stuff, but with large objects, all that serializing and deserializing becomes expensive. This can also affect functions that import large data objects or custom modules, making the overall processing time slower than expected.
🔒 Locks (and the GIL confusion)
Even though we’re using multiprocessing (not threading), queues still use locks under the hood to keep things safe. And while the GIL isn’t directly involved between processes, any Python code managing locks still has to play by the interpreter’s rules. So if the queue’s getting hammered with large items, the locks become a bottleneck.
🧠 Memory Copying
When you pass an object through a queue, it gets copied. That’s just how it works. Copying a big NumPy array? That’s going to take time and memory. Lots of both, actually.
A Quick Experiment to Show the Problem
Let’s make this real with a little code.
We’ll run a heavy computation in a bunch of processes and use a queue to signal when each process is done.
Step 1 — Baseline: Small object in the queue
Let’s define a simple function that performs a compute-heavy task using NumPy arrays. We’ll use a multiprocessing pool to create multiple processes that each execute the task and send a result to the queue.
from tqdm import tqdm
import multiprocessing as mp
from time import time
import numpy as np
def heavy_function(n, q):
for _ in range(n):
# simulate expensive computation
a = np.random.random((500, 500))
b = np.random.random((500, 500))
_ = a.dot(b)
q.put(1)
if __name__ == "__main__":
num_workers = 16
n = 100
q = mp.Queue(maxsize=100)
t0 = time()
processes = []
for _ in range(num_workers):
p = mp.Process(target=heavy_function, args=(n, q))
p.start()
processes.append(p)
for _ in tqdm(range(num_workers)):
q.get(timeout=10)
for p in processes:
p.join()
print(f"{time() - t0:.1f} seconds.")
Each child process runs independently and puts a message into the queue once the task is complete. We print the results after joining all processes.
On my computer, this finishes in about 7 seconds. All subprocesses stay active (R
state in htop) and happily crunch through their work.

Step 2 — Large object in the queue
Now change this line:
q.put(1)
To this:
q.put(np.zeros((1500, 1500, 3)))
That’s it: just swap the signal from a tiny integer to a big array.
Now it takes ~145 seconds. More than 20x slower. And if you check htop, you’ll see a bunch of processes in the S
(sleeping) state. The processes may appear to be running
, but they’re actually blocked on memory transfer or waiting for the queue to clear. They will be basically stuck waiting to push data into the queue while it gets slowly serialized, copied, and handed over.

What Can You Do Instead?
If you're hitting this problem, here are some options:
1. Avoid Sharing Big Stuff Over Queues
This is the simplest fix. If you can keep the queue payload small—just send signals or references instead of the full object—you’ll avoid the overhead completely.
2. Use Shared Memory
Python’s multiprocessing.sharedctypes (or shared_memory if you're on Python 3.8+) lets processes work with the same chunk of memory without pickling. Here’s a basic example:
from multiprocessing import Process, Value
from ctypes import c_double
def add_one(shared_val):
shared_val.value += 1
if __name__ == "__main__":
val = Value(c_double, 0.0)
processes = [Process(target=add_one, args=(val,)) for _ in range(10)]
for p in processes:
p.start()
for p in processes:
p.join()
print("Final value:", val.value)
This works best for small-ish, fixed-size data. For larger or more complex stuff (like big NumPy arrays), you’ll need to get a bit more fancy—but the performance win is real. If your shared memory involves structured data, remember to block concurrent writes with a lock, or you'll risk thread conflicts.
For more details on implementing shared memory in Python, refer to the official documentation.
3. Write to Disk (But Only If It Makes Sense)
Another option: write the large object to disk and just pass a filepath through the queue. That way you skip pickling altogether. It’s not always faster (especially on spinning disks), and you’ll need to handle cleanup and locking yourself. But it can work well, especially if you're working with images, videos, or other large files anyway.
This method works particularly well when you're dealing with large objects like arrays, video items, or high-resolution image files.
Use a smarter queue + timeout design
Using a queue with maxsize and a timeout on get
() or put
() can help prevent deadlocks. Make sure your code handles exceptions properly and tests for cases where the queue might be full or empty.
You can also limit queue depth using count
controls and use range() inside tasks to chunk work efficiently.
Wrapping Up
Here’s what we’ve seen:
- Queues get really slow when passing large objects.
- The culprits? Pickling, copying, and inter-process locking.
- Avoiding large objects in queues, using shared memory, or falling back to disk are all valid workarounds.
So yeah, multiprocessing.Queue is great… until you try stuffing giant objects through it. If you stick to small signals or switch to shared memory, you’ll keep your app snappy—and your CPU a lot happier. By designing your multiprocessing system carefully—with clear functions, efficient task distribution, and proper use of start() and join() methods—you can reduce the performance cost significantly.
And if you're interested in exploring more about Python's multiprocessing capabilities, check out this Stack Overflow discussion on handling large data with multiprocessing.Queue.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere. uis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.