Hypothesis

9 Matching Annotations

Dec 2023
tonybaloney.github.io tonybaloney.github.io

Running Python Parallel Applications with Sub Interpreters

9
1. GadjiMurad 08 Dec 2023
  
  in Public
  
  Once an interpreter is running (remembering what I said that it is preferable to leave them running) you can share data using a channel. The channels module is also part of PEP554 and available using a secret-import:
  
``` import _xxsubinterpreters as interpreters import _xxinterpchannels as channels

interp_id = interpreters.create(site=site) channel_id = channels.create()

interpreters.run_string( interp_id, """ import _xxinterpchannels as channels channels.send('hello!') """, shared={ "channel_id": channel_id } )

print(channels.recv(channel_id)) ```

python subinterpreter
2. GadjiMurad 08 Dec 2023
  
  in Public
  
  To share data, you can use the shared argument and provide a dictionary with shareable (int, float, bool, bytes, str, None, tuple) values:
  
``` import _xxsubinterpreters as interpreters

interp_id = interpreters.create(site=site)

interpreters.run_string( interp_id, "print(message)", shared={ "message": "hello world!" } )

interpreters.run_string( interp_id, """ for message in messages: print(message) """, shared={ "messages": ("hello world!", "this", "is", "me") } )

interpreters.destroy(interp_id) ```

python subinterpreter
3. GadjiMurad 08 Dec 2023
  
  in Public
  
  To start an interpreter that sticks around, you can use interpreters.create() which returns the interpreter ID. This ID can be used for subsequent .run_string calls:
  
``` import _xxsubinterpreters as interpreters

interp_id = interpreters.create(site=site)

interpreters.run_string(interp_id, "print('hello world')") interpreters.run_string(interp_id, "print('hello universe')")

interpreters.destroy(interp_id) ```

python subinterpreter
4. GadjiMurad 08 Dec 2023
  
  in Public
  
  Starting a sub interpreter is a blocking operation, so most of the time you want to start one inside a thread.
  
``` from threading import Thread import _xxsubinterpreters as interpreters

t = Thread(target=interpreters.run, args=("print('hello world')",)) t.start() ```

python subinterpreter
5. GadjiMurad 08 Dec 2023
  
  in Public
  
  You can create, run and stop a sub interpreter with the .run() function which takes a string or a simple function
  
``` import _xxsubinterpreters as interpreters

interpreters.run(''' print("Hello World") ''') ```

subinterpreter python
6. GadjiMurad 07 Dec 2023
  
  in Public
  
  Inter-Worker communication
  
  Whether using sub interpreters or multiprocessing you cannot simply send existing Python objects to worker processes.
  
  Multiprocessing uses pickle by default. When you start a process or use a process pool, you can use pipes, queues and shared memory as mechanisms to sending data to/from the workers and the main process. These mechanisms revolve around pickling. Pickling is the builtin serialization library for Python that can convert most Python objects into a byte string and back into a Python object.
  
  Pickle is very flexible. You can serialize a lot of different types of Python objects (but not all) and Python objects can even define a method for how they can be serialized. It also handles nested objects and properties. However, with that flexibility comes a performance hit. Pickle is slow. So if you have a worker model that relies upon continuous inter-worker communication of complex pickled data you’ll likely see a bottleneck.
  
  Sub interpreters can accept pickled data. They also have a second mechanism called shared data. Shared data is a high-speed shared memory space that interpreters can write to and share data with other interpreters. It supports only immutable types, those are:
  
  Strings
  
  Byte Strings
  
  Integers and Floats
  
  Boolean and None
  
  Tuples (and tuples of tuples)
  
  To share data with an interpreter, you can either set it as initialization data or you can send it through a channel.
  
  python multiprocessing subinterpreter comparing pickle shared_data
7. GadjiMurad 07 Dec 2023
  
  in Public
  
  The next point when using a parallel execution model like multiprocessing or sub interpreters is how you share data.
  
  Once you get over the hurdle of starting one, this quickly becomes the most important point. You have two questions to answer:
  
  How do we communicate between workers?
  
  How do we manage the state of workers?
  
  multiprocessing subinterpreter python performance
8. GadjiMurad 07 Dec 2023
  
  in Public
  
  Half of the time taken to start an interpreter is taken up running “site import”. This is a special module called site.py that lives within the Python installation. Interpreters have their own caches, their own builtins, they are effectively mini-Python processes. Starting a thread or a coroutine is so fast because it doesn’t have to do any of that work (it shares that state with the owning interpreter), but it’s bound by the lock and isn’t parallel.
  
  python subinterpreter comparing threading couroutine
9. GadjiMurad 07 Dec 2023
  
  in Public
  
  What is the difference between threading, multiprocessing, and sub interpreters?
  
  The Python standard library has a few options for concurrent programming, depending on some factors:
  
  Is the task you’re completing IO-bound (e.g. reading from a network, writing to disk)
  
  Does the task require CPU-heavy work, e.g. computation
  
  Can the tasks be broken into small chunks or are they large pieces of work?
  
  Here are the models:
  
  Threads are fast to create, you can share any Python objects between them and have a small overhead. Their drawback is that Python threads are bound to the GIL of the process, so if the workload is CPU-intensive then you won’t see any performance gains. Threading is very useful for background, polling tasks like a function that waits and listens for a message on a queue.
  
  Coroutines are extremely fast to create, you can share any Python objects between them and have a miniscule overhead. Coroutines are ideal for IO-based activity that has an underlying API that supports async/await.
  
  Multiprocessing is a Python wrapper that creates Python processes and links them together. These processes are slow to start, so the workload that you give them needs to be large enough to see the benefit of parallelising the workload. However, they are truly parallel since each one has it’s own GIL.
  
  Sub interpreters have the parallelism of multiprocessing, but with a much faster startup time.
  
  threading multiprocessing subinterpreter python comparing
Visit annotations in context

Tags

shared_data

comparing

pickle

subinterpreter

performance

threading

couroutine

python

multiprocessing

Annotators

GadjiMurad

URL

tonybaloney.github.io/posts/sub-interpreter-web-workers.html

Tags

Annotators

URL