2 Matching Annotations
  1. Dec 2023
    1. When you’re writing Python, though, you want to share Python objects between processes.

      To enable this, when you pass Python objects between processes using Python’s multiprocessing library:

      • On the sender side, the arguments get serialized to bytes with the pickle module.
      • On the receiver side, the bytes are unserialized using pickle.

      This serialization and deserialization process involves computation, which can potentially be slow.

    1. Inter-Worker communication

      Whether using sub interpreters or multiprocessing you cannot simply send existing Python objects to worker processes.

      Multiprocessing uses pickle by default. When you start a process or use a process pool, you can use pipes, queues and shared memory as mechanisms to sending data to/from the workers and the main process. These mechanisms revolve around pickling. Pickling is the builtin serialization library for Python that can convert most Python objects into a byte string and back into a Python object.

      Pickle is very flexible. You can serialize a lot of different types of Python objects (but not all) and Python objects can even define a method for how they can be serialized. It also handles nested objects and properties. However, with that flexibility comes a performance hit. Pickle is slow. So if you have a worker model that relies upon continuous inter-worker communication of complex pickled data you’ll likely see a bottleneck.

      Sub interpreters can accept pickled data. They also have a second mechanism called shared data. Shared data is a high-speed shared memory space that interpreters can write to and share data with other interpreters. It supports only immutable types, those are:

      • Strings
      • Byte Strings
      • Integers and Floats
      • Boolean and None
      • Tuples (and tuples of tuples)

      To share data with an interpreter, you can either set it as initialization data or you can send it through a channel.