Hypothesis

3 Matching Annotations

Dec 2023
pythonspeed.com pythonspeed.com

Python’s multiprocessing performance problem

2
1. GadjiMurad 15 Dec 2023
  
  in Public
  
  When you’re writing Python, though, you want to share Python objects between processes.
  
  To enable this, when you pass Python objects between processes using Python’s multiprocessing library:
  
  On the sender side, the arguments get serialized to bytes with the pickle module.
  
  On the receiver side, the bytes are unserialized using pickle.
  
  This serialization and deserialization process involves computation, which can potentially be slow.
  
  shared_data python pickle
2. GadjiMurad 15 Dec 2023
  
  in Public
  
  Threads vs. processes
  
  Multiple threads let you run code in parallel, potentially on multiple CPUs. On Python, however, the global interpreter lock makes this parallelism harder to achieve.
  
  Multiple processes also let you run code in parallel—so what’s the difference between threads and processes?
  
  All the threads inside a single process share the same memory address space. If thread 1 in a process stores some memory at address 0x7f0cd1a88810, thread 2 can access the same memory at the same address. That means passing objects between threads is cheap: you just need to get the pointer to the memory address from one thread to the other. A memory address is 8 bytes: this is not a lot of data to move around.
  
  In contrast, processes do not share the same memory space. There are some shared memory facilities provided by the operating system, typically, and we’ll get to that later. But by default, no memory is shared. That means you can’t just share the address of your data across processes: you have to copy the data.
  
  python threading processes shared_data tips
Visit annotations in context

Tags

shared_data

tips

pickle

threading

processes

python

Annotators

GadjiMurad

URL

pythonspeed.com/articles/faster-multiprocessing-pickle/
tonybaloney.github.io tonybaloney.github.io

Running Python Parallel Applications with Sub Interpreters

1
1. GadjiMurad 07 Dec 2023
  
  in Public
  
  Inter-Worker communication
  
  Whether using sub interpreters or multiprocessing you cannot simply send existing Python objects to worker processes.
  
  Multiprocessing uses pickle by default. When you start a process or use a process pool, you can use pipes, queues and shared memory as mechanisms to sending data to/from the workers and the main process. These mechanisms revolve around pickling. Pickling is the builtin serialization library for Python that can convert most Python objects into a byte string and back into a Python object.
  
  Pickle is very flexible. You can serialize a lot of different types of Python objects (but not all) and Python objects can even define a method for how they can be serialized. It also handles nested objects and properties. However, with that flexibility comes a performance hit. Pickle is slow. So if you have a worker model that relies upon continuous inter-worker communication of complex pickled data you’ll likely see a bottleneck.
  
  Sub interpreters can accept pickled data. They also have a second mechanism called shared data. Shared data is a high-speed shared memory space that interpreters can write to and share data with other interpreters. It supports only immutable types, those are:
  
  Strings
  
  Byte Strings
  
  Integers and Floats
  
  Boolean and None
  
  Tuples (and tuples of tuples)
  
  To share data with an interpreter, you can either set it as initialization data or you can send it through a channel.
  
  python multiprocessing subinterpreter comparing pickle shared_data
Visit annotations in context

Tags

comparing

shared_data

python

pickle

multiprocessing

subinterpreter

Annotators

GadjiMurad

URL

tonybaloney.github.io/posts/sub-interpreter-web-workers.html

Tags

Annotators

URL

Tags

Annotators

URL