Hypothesis

Inter-Worker communication

Whether using sub interpreters or multiprocessing you cannot simply send existing Python objects to worker processes.

Multiprocessing uses pickle by default. When you start a process or use a process pool, you can use pipes, queues and shared memory as mechanisms to sending data to/from the workers and the main process. These mechanisms revolve around pickling. Pickling is the builtin serialization library for Python that can convert most Python objects into a byte string and back into a Python object.

Pickle is very flexible. You can serialize a lot of different types of Python objects (but not all) and Python objects can even define a method for how they can be serialized. It also handles nested objects and properties. However, with that flexibility comes a performance hit. Pickle is slow. So if you have a worker model that relies upon continuous inter-worker communication of complex pickled data you’ll likely see a bottleneck.

Sub interpreters can accept pickled data. They also have a second mechanism called shared data. Shared data is a high-speed shared memory space that interpreters can write to and share data with other interpreters. It supports only immutable types, those are:

Strings
Byte Strings
Integers and Floats
Boolean and None
Tuples (and tuples of tuples)

To share data with an interpreter, you can either set it as initialization data or you can send it through a channel.

python multiprocessing subinterpreter comparing pickle shared_data

Tags

Annotators

URL

Tags

Annotators

URL