5 Matching Annotations
  1. Dec 2023
    1. Tips
      • if name == "main" is important for multiprocessing because it will spawn a new Python, that will import the module. You don't want this module to spawn a new Python that imports the module that will spawn a new Python...
      • If the function to submit to the executor has complicated arguments to be passed to it, use a lambda or functools.partial.
      • max_worker = 1 is a very nice way to get a poor man’s task queue.
    2. Thread pools are good for:
      • Tasks (network, file, etc.) that needs less than 10_000 I/O interactions per second. The number is higher than you would expect, because threads are surprisingly cheap nowadays, and you can spawn a lot of them without bloating memory too much. The limit is more the price of context switching. This is not a scientific number, it's a general direction that you should challenge by measuring your own particular case.
      • When you need to share data between the tasks.
      • When you are not CPU bound.
      • When you are OK to execute tasks a bit slower to you ensure you are not blocking any of them (E.G: user UI and a long calculation).
      • When you are CPU bound, but the CPU calculations are delegating to a C extension that releases the GIL, such as numpy. Free parallelism on the cheap, yeah!

      E.G: a web scraper, a GUI to zip files, a development server, sending emails without blocking web page rendering, etc.