3 Matching Annotations
  1. Dec 2023
    1. Because of the way gevent works, you can take a blocking script, and with very few modifications, make it async. Let's take the original stdlib one, and convert it to gevent:

      ``` import re import time

      import gevent from gevent import monkey

      monkey.patch_all() # THIS MUST BE DONE BEFORE IMPORTING URLLIB

      from urllib.request import Request, urlopen

      urls = [ "https://www.bitecode.dev/p/relieving-your-python-packaging-pain", "https://www.bitecode.dev/p/hype-cycles", "https://www.bitecode.dev/p/why-not-tell-people-to-simply-use", "https://www.bitecode.dev/p/nobody-ever-paid-me-for-code", "https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager", "https://www.bitecode.dev/p/the-costly-mistake-so-many-makes", "https://www.bitecode.dev/p/the-weirdest-python-keyword", ]

      title_pattern = re.compile(r"<title[^>]>(.?)</title>", re.IGNORECASE)

      user_agent = ( "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0" )

      We move the fetching into a function so we can isolate it into a green thread

      def fetch_url(url): start_time = time.time()

      headers = {"User-Agent": user_agent}
      
      with urlopen(Request(url, headers=headers)) as response:
          html_content = response.read().decode("utf-8")
          match = title_pattern.search(html_content)
          title = match.group(1) if match else "Unknown"
      
          print(f"URL: {url}\nTitle: {title}")
      
      end_time = time.time()
      elapsed_time = end_time - start_time
      
      print(f"Time taken: {elapsed_time:.4f} seconds\n")
      

      def main(): global_start_time = time.time()

      # Here is where we convert synchronous calls into async ones
      greenlets = [gevent.spawn(fetch_url, url) for url in urls]
      gevent.joinall(greenlets)
      
      global_end_time = time.time()
      global_elapsed_time = global_end_time - global_start_time
      
      print(f"Total time taken: {global_elapsed_time:.4f} seconds")
      

      main() ```

      No async, no await. No special lib except for gevent. In fact it would work with the requests lib just as well. Very few modifications are needed, for a net perf gain.

      The only danger is if you call gevent.monkey.patch_all() too late. You get a cryptic error that crashes your program.

    2. So what's the deal with asyncio, twisted, gevent, trio and all that stuff?

      asyncio

      asyncio is the modern module for asynchronous network programming provided with the python stdlib since 3.4. In other words, it's the default stuff at your disposal if you want to code something without waiting on the network.

      asyncio replaces the old deprecated asyncore module. It is quite low level, so while you can manually code most network-related things with it, you are still at the level of TCP or UDP. If you want higher-level protocols, like FTP, HTTP or SSH, you have to either code it yourself, or install a third party library or module.

      Because asyncio is the default solution, it has a the biggest ecosystem of 3rd party libs, and pretty much everything async strives to be compatible with it directly, or through compatibility layers like anyio.

      Twisted

      20 years ago, there was no asyncio, there was no async/await, nodejs didn't exist and Python 3 was half a decade away. Yet, it was the .com bubble, everything needed to be connected now. And so was born twisted, the grandfather of all the asynchronous frameworks we have today. Twisted ecosystem grew to include everything, from mail to ssh.

      To this day, twisted is still a robust and versatile tool. But you do pay the price of its age. It doesn't follow PEP8 very well, and the design lean on the heavy size.

      Tornado

      Tornado was developed after Twisted, by FriendFeed, at this weird 2005-2015 web dev period where everything needed to be social web scale. It was like Twisted, but tooted to be faster, and was higher level. Out of the box, the HTTP story is way nicer.

      Today, you are unlikely to use Tornado unless you work at Facebook or contribute to jupyter. After all, if you want to make async web things, the default tool is FastAPI in 2023.

      gevent

      Gevent came about in 2009, the same year as Tornado, but with a fundamentally different design. Instead of attempting to provide an asychronous API, it decided to do black magic. When you use gevent, you call from gevent import monkey; monkey.patch_all() and it changes the underlying mechanism of Python networking, making everything non-blocking.

    3. asyncio, twisted, tornado and gevent have one trick up their sleeve: they can send a message to the network, and while waiting for the response, wake up another part of the program to do some other work. And they can do that with many messages in a row. While waiting for the network, they can let other parts of the program use the CPU core.

      Note that they only can speed up waiting on the network. They will not make two calculations at the same time (can't use several CPU cores like with multiprocessing) and you can't speed up waiting on other types of I/O (like when you use threads to not block on user input or disk writes).

      All in all, they are good for writing things like bots (web crawler, chat bots, network sniffers, etc.) and servers (web servers, proxies, ...). For maximum benefits, it's possible to use them inside other concurrency tools, such as multiprocessing or multithreading. You can perfectly have 4 processes, each of them containing 4 threads (so 16 threads in total), and each thread with their own asyncio loop running.