- Nov 2024
-
python.plainenglish.io python.plainenglish.io
-
Deploying Machine Learning Models with Flask and AWS Lambda: A Complete Guide
In essence, this article is about:
1) Training a sample model and uploading it to an S3 bucket:
```python from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import joblib
Load the Iris dataset
iris = load_iris() X, y = iris.data, iris.target
Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Train the logistic regression model
model = LogisticRegression(max_iter=200) model.fit(X_train, y_train)
Save the trained model to a file
joblib.dump(model, 'model.pkl') ```
- Creating a sample Zappa config, because AWS Lambda doesn’t natively support Flask, we need to use Zappa, a tool that helps deploy WSGI applications (like Flask) to AWS Lambda:
```json { "dev": { "app_function": "app.app", "exclude": [ "boto3", "dateutil", "botocore", "s3transfer", "concurrent" ], "profile_name": null, "project_name": "flask-test-app", "runtime": "python3.10", "s3_bucket": "zappa-31096o41b" },
"production": { "app_function": "app.app", "exclude": [ "boto3", "dateutil", "botocore", "s3transfer", "concurrent" ], "profile_name": null, "project_name": "flask-test-app", "runtime": "python3.10", "s3_bucket": "zappa-31096o41b" }
} ```
- Writing a sample Flask app:
```python import boto3 import joblib import os
Initialize the Flask app
app = Flask(name)
S3 client to download the model
s3 = boto3.client('s3')
Download the model from S3 when the app starts
s3.download_file('your-s3-bucket-name', 'model.pkl', '/tmp/model.pkl') model = joblib.load('/tmp/model.pkl')
@app.route('/predict', methods=['POST']) def predict(): # Get the data from the POST request data = request.get_json(force=True)
# Convert the data into a numpy array input_data = np.array(data['input']).reshape(1, -1) # Make a prediction using the model prediction = model.predict(input_data) # Return the prediction as a JSON response return jsonify({'prediction': int(prediction[0])})
if name == 'main': app.run(debug=True) ```
- Deploying this app to production (to AWS):
bash zappa deploy production
and later eventually updating it:
bash zappa update production
- We should get a URL like this:
https://xyz123.execute-api.us-east-1.amazonaws.com/production
which we can query:
curl -X POST -H "Content-Type: application/json" -d '{"input": [5.1, 3.5, 1.4, 0.2]}' https://xyz123.execute-api.us-east-1.amazonaws.com/production/predict
-
-
pythonspeed.com pythonspeed.com
-
I’m writing this on October 15th, 2024. Last week I would’ve said you probably shouldn’t be using uv’s Python in production, because you wouldn’t be getting security updates to OpenSSL. This week, I would tentatively say that it’s fine. This makes me a little uncomfortable, because there may well be other issues I haven’t thought of, and uv is still very new.
You may use uv in production, but there may be still some undiscovered quirks.
-
The uv-provided Python executable is slower than the one shipped by Ubuntu 24.04 LTS, but it’s faster than the “official” Docker image.
-
The ability to install Python with uv adds interesting possibilities for production packaging. For example, you can use an Ubuntu 24.04 base Docker image, download uv, and rely on uv to trivially install any Python version. Which is to say, you won’t be limited to the versions Ubuntu packages for you.
-
Unlike most Python packaging tools, uv doesn’t require Python to be installed to use it.
About uv Python packaging tool
Tags
Annotators
URL
-
- Sep 2024
-
en.wikipedia.org en.wikipedia.org
-
In comparison, Perl/Python/Javascript, which also have the latter property, have other false-like values (0 and empty string), which make || differ from a null-coalescing operator in many more cases (numbers and strings being two of the most frequently used data types). This is what led Perl/Python/Javascript to add a separate operator while Ruby hasn't.
-
- Aug 2024
-
hellogithub.com hellogithub.com
-
austin
Tags
Annotators
URL
-
- Jul 2024
-
pythononline.net pythononline.net
-
Say goodbye to the headaches of setting up Python locally. No more installations or configurations, you can execute Python code right in your web browser. Just input your code, hit RUN, and watch the magic happen! Compile, run, and share Python code online with our powerful integrated Python development environment (IDE). Want to show off your work? Use the SHARE option to make your code accessible to anyone, anywhere.
One of the best python compilers if you're a new dev and dont want to install Python locally.
Tags
Annotators
URL
-
- Jun 2024
-
www.bilibili.com www.bilibili.com
-
CPython
-
-
mp.weixin.qq.com mp.weixin.qq.com
-
CPython 是目前最流行的 Python 运行时。它在 GitHub Star 已经 60k 了,可见关注 Python 内部实现的人非常多,大家也都很好奇如此简洁优美的语言是怎么被创造出来的。
cPython 是个什么?
Tags
Annotators
URL
-
-
-
- openai use LiveKit to deliver realtime voice
- playground: https://cloud.livekit.io/projects/
-
-
www.pythonmorsels.com www.pythonmorsels.com
-
Note that the Python documentation refers to these as special methods and notes the synonym "magic method" but very rarely uses the term "dunder method". However, "dunder method" is a fairly common Python colloquialism, as noted in my unofficial Python glossary.
special methods = magic methods = dunder methods
Tags
Annotators
URL
-
-
www.pythonmorsels.com www.pythonmorsels.com
-
python -m webbrowser https://pym.dev/p
Opening URL using Python's webbrowser module
-
- Apr 2024
-
lucasoshiro.github.io lucasoshiro.github.io
-
What are the tools that comes on your mind when someone say “debug”? Let me guess: a memory leak detector (e.g. Valgrind); a profiler (e.g. GNU gprof); a function that stops your program and gives you a REPL (e.g. Python’s breakpoint and Ruby’s byebug); something that we call a “debugger” (like GDB, or something similar embedded on the IDEs); or even our old friend, the print function. So, in this text I’ll try to convince you to add Git to your debug toolbelt.
6 differen debugging tools
-
- Mar 2024
-
news.hada.io news.hada.io
Tags
Annotators
URL
-
- Feb 2024
-
-
The result? Our runtime image just got 6x smaller! Six times! From > 1.1 GB to 170 MB.
See (above this annotation) the most optimized & CI friendly Python Docker build with Poetry (until this issue gets resolved)
-
-
www.konstantinfo.com www.konstantinfo.com
-
Perl vs Python: What are the Key Differences?
Check this blog to find the key difference between Perl vs Python
-
-
www.w3schools.com www.w3schools.com
-
Python List append() Method
This web page offers a brief overview of the append() function for lists in python.
-
-
medium.com medium.com
-
Tuples can include different data types.
This is an interesting aspect. Tuples are not limited to a single data type.
-
-
marvelousmlops.substack.com marvelousmlops.substack.com
-
We’ve (painstakingly) manually reviewed 310 live MLOps positions, advertised across various platforms in Q4 this year
They went through 310 role descriptions and, even though role descriptions may vary significantly, they found 3 core skills that a large percentage of MLOps roles required:
📦 Docker and Kubernetes 🐍 Python 🌥 Cloud
-
-
github.com github.com
-
Emacs Application Framework
very interesting
Tags
Annotators
URL
-
- Jan 2024
-
chriswarrick.com chriswarrick.com
-
setuptools is the most popular (at 50k packages), Poetry is second at 41k, Hatchling is third at 8.1k. Other tools to cross 500 users include Flit (4.4k), PDM (1.3k), Maturin (1.3k, build backend for Rust-based packages).
Popularity of Python package managers in 2024
-
-
www.youtube.com www.youtube.com
-
Python Development in Spacemacs
Tags
Annotators
URL
-
-
qutebrowser.org qutebrowser.org
-
Installing qutebrowser with virtualenv
-
-
github.com github.com
-
jalajthanaki / NLPython
-
-
python.bakyeono.net python.bakyeono.net
-
연오의 파이썬
연오
Tags
Annotators
URL
-
- Dec 2023
-
docs.python-guide.org docs.python-guide.org
-
ambiguous src
Why is using "src" folder to contain module files ambiguous? In this case "sample" seams more ambiguous cause it could be refering a folder containing sample data. Also how do you know the module is not called docs?
Tags
Annotators
URL
-
-
www.youtube.com www.youtube.com
-
My Python Emacs Workflow
Tags
Annotators
URL
-
-
superfastpython.com superfastpython.com
-
Measure Execution Time With time.thread_time()
The
time.thread_time()
reports the time that the current thread has been executing.The time begins or is zero when the current thread is first created.
Return the value (in fractional seconds) of the sum of the system and user CPU time of the current thread.
It is an equivalent value to the
time.process_time()
, except calculated at the scope of the current thread, not the current process.This value is calculated as the sum of the system time and the user time.
thread time = user time + system time
The reported time does not include sleep time.
This means if the thread is blocked by a call to
time.sleep()
or perhaps is suspended by the operating system, then this time is not included in the reported time. This is called a “thread-wide” or “thread-specific” time. -
Measure Execution Time With time.process_time()
The
time.process_time()
reports the time that the current process has been executed.The time begins or is zero when the current process is first created.
Calculated as the sum of the system time and the user time:
process time = user time + system time
System time is time that the CPU is spent executing system calls for the kernel (e.g. the operating system)
User time is time spent by the CPU executing calls in the program (e.g. your code).
When a program loops through an array, it is accumulating user CPU time. Conversely, when a program executes a system call such as
exec
orfork
, it is accumulating system CPU time.The reported time does not include sleep time.
This means if the process is blocked by a call to
time.sleep()
or perhaps is suspended by the operating system, then this time is not included in the reported time. This is called a “process-wide” time.As such, it only reports the time that the current process was executed since it was created by the operating system.
-
Measure Execution Time With time.monotonic()
The
time.monotonic()
function returns time stamps from a clock that cannot go backwards, as its name suggests.In mathematics, monotonic, e.g. a monotonic function means a function whose output over increases (or decreaes).
This means that the result from the
time.monotonic()
function will never be before the result from a prior call.Return the value (in fractional seconds) of a monotonic clock, i.e. a clock that cannot go backwards.
It is a high-resolution time stamp, although is not relative to epoch-like
time.time()
. Instead, liketime.perf_counter()
uses a separate timer separate from the system clock.The
time.monotonic()
has a lower resolution than thetime.perf_counter()
function.This means that values from the
time.monotonic()
function can be compared to each other, relatively, but not to the system clock.Like the
time.perf_counter()
function,time.monotonic()
function is “system-wide”, meaning that it is not affected by changes to the system clock, such as updates or clock adjustments due to time synchronization.Like the
time.perf_counter()
function, thetime.monotonic()
function was introduced in Python version 3.3 with the intent of addressing the limitations of thetime.time()
function tied to the system clock, such as use in short-duration benchmarking.Monotonic clock (cannot go backward), not affected by system clock updates.
-
Measure Execution Time With time.perf_counter()
The time.perf_counter() function reports the value of a performance counter on the system.
It does not report the time since epoch like time.time().
Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide.
The returned value in seconds with fractional components (e.g. milliseconds and nanoseconds), provides a high-resolution timestamp.
Calculating the difference between two timestamps from the time.perf_counter() allows high-resolution execution time benchmarking, e.g. in the millisecond and nanosecond range.
The timestamp from the
time.perf_counter()
function is consistent, meaning that two durations can be compared relative to each other in a meaningful way.The
time.perf_counter()
function was introduced in Python version 3.3 with the intended use for short-duration benchmarking.The
perf_counter()
function was specifically designed to overcome the limitations of other time functions to ensure that the result is consistent across platforms and monotonic (always increasing).For accuracy, the
timeit
module uses thetime.perf_counter()
internally. -
Measure Execution Time With time.time()
The time.time() function reports the number of seconds since the epoch (epoch is January 1st 1970, which is used on Unix systems and beyond as an arbitrary fixed time in the past) as a floating point number.
The result is a floating point value, potentially offering fractions of a seconds (e.g. milliseconds), if the platforms support it.
The
time.time()
function is not perfect.It is possible for a subsequent call to
time.time()
to return a value in seconds less than the previous value, due to rounding.Note: even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls.
-
there are automatic ways to measure execution time, such as via the timeit module.
-
There are 5 ways to measure execution time manually in Python using the time module, they are:
- Use
time.time()
- Use
time.perf_counter()
- Use
time.monotonic()
- Use
time.process_time()
- Use
time.thread_time()
Note, each function returns a time in seconds and has an equivalent function that returns the time in nanoseconds, e.g.
time.time_ns()
,time.perf_counter_ns()
,time.monotonic_ns()
,time.process_time_ns()
andtime.thread_time_ns()
.Recall that there are 1,000 nanoseconds in one microsecond, 1,000 microseconds in 1 millisecond, and 1,000 milliseconds in one second. This highlights that the nanosecond versions of the function are for measuring very short time scales indeed.
- Use
-
-
superfastpython.com superfastpython.com
-
There are common errors experienced by beginners when getting started with asyncio in Python.
They are:
- Trying to run coroutines by calling them.
- Not letting coroutines run in the event loop.
- Using the asyncio low-level API.
- Exiting the main coroutine too early.
- Assuming race conditions and deadlocks are impossible.
-
-
-
And this is where the asynchronicity comes in: The "results" list does not actually contain the results from running our functions. Instead, it contains "futures" which are similar to the JavaScript idea of "promises." In order to allow our program to continue running, we get back these futures that represent a placeholder for a value. If we try to print the future, depending on whether it's finished running or not, we'll either get back a state of "pending" or "finished." Once it's finished we can get the return value (assuming there is one) using var.result().
-
The difference between asyncio.sleep() and time.sleep() is that asyncio.sleep() is non-blocking.
-
The calls don't actually get made until we schedule them with await asyncio.gather(*tasks). This runs all of the tasks in our list and waits for them to finish before continuing with the rest of our program.
-
programming with asyncio pretty much enforces* using some sort of "main" function.
This is because you need to use the "async" keyword in order to use the "await" syntax, and the "await" syntax is the only way to actually run other async functions.`
-
async for (not used here) iterates over an asynchronous stream.
-
async with allows awaiting async responses and file operations.
-
What is a thread?
A thread is a way of allowing your computer to break up a single process/program into many lightweight pieces that execute in parallel. Somewhat confusingly, Python's standard implementation of threading limits threads to only being able to execute one at a time due to something called the Global Interpreter Lock (GIL). The GIL is necessary because CPython's (Python's default implementation) memory management is not thread-safe. Because of this limitation, threading in Python is concurrent, but not parallel. To get around this, Python has a separate
multiprocessing
module not limited by the GIL that spins up separate processes, enabling parallel execution of your code. Using themultiprocessing
module is nearly identical to using thethreading
module.Asynchronous nature of threading: as one function waits, another one begins, and so on.
-
when we join threads with thread.join(), all we're doing is ensuring the thread has finished before continuing on with our code.
-
Creating a thread is not the same as starting a thread, however. To start your thread, use {the name of your thread}.start(). Starting a thread means "starting its execution."
-
-
pythonspeed.com pythonspeed.com
-
Running the code in a subprocess is much slower than running a thread, not because the computation is slower, but because of the overhead of copying and (de)serializing the data. So how do you avoid this overhead?
Reducing the performance hit of copying data between processes:
Option #1: Just use threads
Processes have overhead, threads do not. And while it’s true that generic Python code won’t parallelize well when using multiple threads, that’s not necessarily true for your Python code. For example, NumPy releases the GIL for many of its operations, which means you can use multiple CPU cores even with threads.
``` # numpy_gil.py import numpy as np from time import time from multiprocessing.pool import ThreadPool
arr = np.ones((1024, 1024, 1024))
start = time() for i in range(10): arr.sum() print("Sequential:", time() - start)
expected = arr.sum()
start = time() with ThreadPool(4) as pool: result = pool.map(np.sum, [arr] * 10) assert result == [expected] * 10 print("4 threads:", time() - start) ```
When run, we see that NumPy uses multiple cores just fine when using threads, at least for this operation:
$ python numpy_gil.py Sequential: 4.253053188323975 4 threads: 1.3854241371154785
Pandas is built on NumPy, so many numeric operations will likely release the GIL as well. However, anything involving strings, or Python objects in general, will not. So another approach is to use a library like Polars which is designed from the ground-up for parallelism, to the point where you don’t have to think about it at all, it has an internal thread pool.
Option #2: Live with it
If you’re stuck with using processes, you might just decide to live with the overhead of pickling. In particular, if you minimize how much data gets passed and forth between processes, and the computation in each process is significant enough, the cost of copying and serializing data might not significantly impact your program’s runtime. Spending a few seconds on pickling doesn’t really matter if your subsequent computation takes 10 minutes.
Option #3: Write the data to disk
Instead of passing data directly, you can write the data to disk, and then pass the path to this file: * to the subprocess (as an argument) * to parent process (as the return value of the function running in the worker process).
The recipient process can then parse the file.
``` import pandas as pd import multiprocessing as mp from pathlib import Path from tempfile import mkdtemp from time import time
def noop(df: pd.DataFrame): # real code would process the dataframe here pass
def noop_from_path(path: Path): df = pd.read_parquet(path, engine="fastparquet") # real code would process the dataframe here pass
def main(): df = pd.DataFrame({"column": list(range(10_000_000))})
with mp.get_context("spawn").Pool(1) as pool: # Pass the DataFrame to the worker process # directly, via pickling: start = time() pool.apply(noop, (df,)) print("Pickling-based:", time() - start) # Write the DataFrame to a file, pass the path to # the file to the worker process: start = time() path = Path(mkdtemp()) / "temp.parquet" df.to_parquet( path, engine="fastparquet", # Run faster by skipping compression: compression="uncompressed", ) pool.apply(noop_from_path, (path,)) print("Parquet-based:", time() - start)
if name == "main": main()
`` **Option #4:
multiprocessing.shared_memory`**Because processes sometimes do want to share memory, operating systems typically provide facilities for explicitly creating shared memory between processes. Python wraps this facilities in the
multiprocessing.shared_memory module
.However, unlike threads, where the same memory address space allows trivially sharing Python objects, in this case you’re mostly limited to sharing arrays. And as we’ve seen, NumPy releases the GIL for expensive operations, which means you can just use threads, which is much simpler. Still, in case you ever need it, it’s worth knowing this module exists.
Note: The module also includes ShareableList, which is a bit like a Python list but limited to int, float, bool, small str and bytes, and None. But this doesn’t help you cheaply share an arbitrary Python object.
A bad option for Linux: the "fork" context
You may have noticed we did
multiprocessing.get_context("spawn").Pool()
to create a process pool. This is because Python has multiple implementations of multiprocessing on some OSes. "spawn" is the only option on Windows, the only non-broken option on macOS, and available on Linux. When using "spawn", a completely new process is created, so you always have to copy data across.On Linux, the default is "fork": the new child process has a complete copy of the memory of the parent process at the time of the child process’ creation. This means any objects in the parent (arrays, giant dicts, whatever) that were created before the child process was created, and were stored somewhere helpful like a module, are accessible to the child. Which means you don’t need to pickle/unpickle to access them.
Sounds useful, right? There’s only one problem: the "fork" context is super-broken, which is why it will stop being the default in Python 3.14.
Consider the following program:
``` import threading import sys from multiprocessing import Process
def thread1(): for i in range(1000): print("hello", file=sys.stderr)
threading.Thread(target=thread1).start()
def foo(): pass
Process(target=foo).start() ```
On my computer, this program consistently deadlocks: it freezes and never exits. Any time you have threads in the parent process, the "fork" context can cause in potential deadlocks, or even corrupted memory, in the child process.
You might think that you’re fine because you don’t start any threads. But many Python libraries start a thread pool on import, for example NumPy. If you’re using NumPy, Pandas, or any other library that depends on NumPy, you are running a threaded program, and therefore at risk of deadlocks, segfaults, or data corruption when using the "fork" multiprocessing context. For more details see this article on why multiprocessing’s default is broken on Linux.
You’re just shooting yourself in the foot if you take this approach.
-
When you’re writing Python, though, you want to share Python objects between processes.
To enable this, when you pass Python objects between processes using Python’s multiprocessing library:
- On the sender side, the arguments get serialized to bytes with the pickle module.
- On the receiver side, the bytes are unserialized using
pickle
.
This serialization and deserialization process involves computation, which can potentially be slow.
-
Threads vs. processes
Multiple threads let you run code in parallel, potentially on multiple CPUs. On Python, however, the global interpreter lock makes this parallelism harder to achieve.
Multiple processes also let you run code in parallel—so what’s the difference between threads and processes?
All the threads inside a single process share the same memory address space. If thread 1 in a process stores some memory at address 0x7f0cd1a88810, thread 2 can access the same memory at the same address. That means passing objects between threads is cheap: you just need to get the pointer to the memory address from one thread to the other. A memory address is 8 bytes: this is not a lot of data to move around.
In contrast, processes do not share the same memory space. There are some shared memory facilities provided by the operating system, typically, and we’ll get to that later. But by default, no memory is shared. That means you can’t just share the address of your data across processes: you have to copy the data.
-
-
pythonspeed.com pythonspeed.com
-
Technique #2: Sampling
How do you load only a subset of the rows?
When you load your data, you can specify a skiprows function that will randomly decide whether to load that row or not:
```
from random import random
def sample(row_number): ... if row_number == 0: ... # Never drop the row with column names: ... return False ... # random() returns uniform numbers between 0 and 1: ... return random() > 0.001 ... sampled = pd.read_csv("/tmp/voting.csv", skiprows=sample) len(sampled) 973 ```
-
lossy compression: drop some of your data in a way that doesn’t impact your final results too much.
If parts of your data don’t impact your analysis, no need to waste memory keeping extraneous details around.
-
-
death.andgravity.com death.andgravity.com
-
except StopAsyncIteration if is_async else StopIteration:
Interesting: using ternary operator in
except
clause -
In sync code, you might use
a thread pool and imap_unordered():
``` pool = multiprocessing.dummy.Pool(2)
for result in pool.imap_unordered(do_stuff, things_to_do): print(result) ```
Here, concurrency is limited by the fixed number of threads.
-
-
horaceguy.pages.dev horaceguy.pages.dev
-
Gunicorn and multiprocessing
Gunicorn forks a base process into
n
worker processes, and each worker is managed by Uvicorn (with the asynchronous uvloop). Which means:- Each worker is concurrent
- The worker pool implements parallelism
This way, we can have the best of both worlds: concurrency (multithreading) and parallelism (multiprocessing).
-
There is another way to declare a route with FastAPI
Using the
asyncio
:``` import asyncio
from fastapi import FastAPI
app = FastAPI()
@app.get("/asyncwait") async def asyncwait(): duration = 0.05 await asyncio.sleep(duration) return {"duration": duration} ```
-
-
guicommits.com guicommits.com
-
Use Python asyncio.as_completed
There will be moments when you don't have to await for every single task to be processed right away.
We do this by using
asyncio.as_completed
which returns a generator with completed coroutines. -
When to use Python Async
Async only makes sense if you're doing IO.
There's ZERO benefit in using async to stuff like this that is CPU-bound:
``` import asyncio
async def sum_two_numbers_async(n1: int, n2: int) -> int: return n1 + n2
async def main(): await sum_two_numbers_async(2, 2) await sum_two_numbers_async(4, 4)
asyncio.run(main()) ```
Your code might even get slower by doing that due to the Event Loop.
That's because Python async only optimizes IDLE time!
-
If you want 2 or more functions to run concurrently, you need asyncio.create_task.
Creating a task triggers the async operation, and it needs to be awaited at some point.
For example:
task = create_task(my_async_function('arg1')) result = await task
As we're creating many tasks, we need
asyncio.gather
which awaits all tasks to be done. -
they think async is parallel which is not true
-
-
www.bitecode.dev www.bitecode.dev
-
Fast API
Fast API is a high-level web framework like flask, but that happens to be async, unlike flask. With the added benefit of using type hints and pydantic to generate schemas.
It's not a building block like twisted, gevent, trio or asyncio. In fact, it's built on top of asyncio. It's in the same group as flask, bottle, django, pyramid, etc. Although it's a micro-framework, so it's focused on routing, data validation and API delivery.
-
The code isn't that different from your typical asyncio script:
``` import re import time
import httpx import trio
urls = [ "https://www.bitecode.dev/p/relieving-your-python-packaging-pain", "https://www.bitecode.dev/p/hype-cycles", "https://www.bitecode.dev/p/why-not-tell-people-to-simply-use", "https://www.bitecode.dev/p/nobody-ever-paid-me-for-code", "https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager", "https://www.bitecode.dev/p/the-costly-mistake-so-many-makes", "https://www.bitecode.dev/p/the-weirdest-python-keyword", ]
title_pattern = re.compile(r"<title[^>]>(.?)</title>", re.IGNORECASE)
user_agent = ( "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0" )
async def fetch_url(url): start_time = time.time()
async with httpx.AsyncClient() as client: headers = {"User-Agent": user_agent} response = await client.get(url, headers=headers) match = title_pattern.search(response.text) title = match.group(1) if match else "Unknown" print(f"URL: {url}\nTitle: {title}") end_time = time.time() elapsed_time = end_time - start_time print(f"Time taken for {url}: {elapsed_time:.4f} seconds\n")
async def main(): global_start_time = time.time()
# That's the biggest API difference async with trio.open_nursery() as nursery: for url in urls: nursery.start_soon(fetch_url, url) global_end_time = time.time() global_elapsed_time = global_end_time - global_start_time print(f"Total time taken for all URLs: {global_elapsed_time:.4f} seconds")
if name == "main": trio.run(main) ```
Because it doesn't create nor schedule coroutines immediately (notice the
nursery.start_soon(fetch_url, url)
is notnursery.start_soon(fetch_url(url)))
, it will also consume less memory. But the most important part is the nursery:# That's the biggest API difference async with trio.open_nursery() as nursery: for url in urls: nursery.start_soon(fetch_url, url)
The
with
block scopes all the tasks, meaning everything that is started inside that context manager is guaranteed to be finished (or terminated) when it exits. First, the API is better than expecting the user to wait manually like withasyncio.gather
: you cannot start concurrent coroutines without a clear scope in trio, it doesn't rely on the coder's discipline. But under the hood, the design is also different. The whole bunch of coroutines you group and start can be canceled easily, because trio always knows where things begin and end.As soon as things get complicated, code with curio-like design become radically simpler than ones with asyncio-like design.
-
trio
For many years, the very talented dev and speaker David Beazley has been showing unease with asyncio's design, and made more and more experiments and public talks about what could an alternative look like. It culminated with the excellent Die Threads presentation, live coding the sum of the experience of all those ideas, that eventually would become the curio library. Watch it. It’s so good.
Trio is not compatible with asyncio, nor gevent or twisted by default. This means it's also its little own async island.
But in exchange for that, it provides a very different internal take on how to deal with this kind of concurrency, where every coroutine is tied to an explicit scope, everything can be awaited easily, or canceled.
-
Because of the way gevent works, you can take a blocking script, and with very few modifications, make it async. Let's take the original stdlib one, and convert it to gevent:
``` import re import time
import gevent from gevent import monkey
monkey.patch_all() # THIS MUST BE DONE BEFORE IMPORTING URLLIB
from urllib.request import Request, urlopen
urls = [ "https://www.bitecode.dev/p/relieving-your-python-packaging-pain", "https://www.bitecode.dev/p/hype-cycles", "https://www.bitecode.dev/p/why-not-tell-people-to-simply-use", "https://www.bitecode.dev/p/nobody-ever-paid-me-for-code", "https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager", "https://www.bitecode.dev/p/the-costly-mistake-so-many-makes", "https://www.bitecode.dev/p/the-weirdest-python-keyword", ]
title_pattern = re.compile(r"<title[^>]>(.?)</title>", re.IGNORECASE)
user_agent = ( "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0" )
We move the fetching into a function so we can isolate it into a green thread
def fetch_url(url): start_time = time.time()
headers = {"User-Agent": user_agent} with urlopen(Request(url, headers=headers)) as response: html_content = response.read().decode("utf-8") match = title_pattern.search(html_content) title = match.group(1) if match else "Unknown" print(f"URL: {url}\nTitle: {title}") end_time = time.time() elapsed_time = end_time - start_time print(f"Time taken: {elapsed_time:.4f} seconds\n")
def main(): global_start_time = time.time()
# Here is where we convert synchronous calls into async ones greenlets = [gevent.spawn(fetch_url, url) for url in urls] gevent.joinall(greenlets) global_end_time = time.time() global_elapsed_time = global_end_time - global_start_time print(f"Total time taken: {global_elapsed_time:.4f} seconds")
main() ```
No async, no await. No special lib except for gevent. In fact it would work with the requests lib just as well. Very few modifications are needed, for a net perf gain.
The only danger is if you call
gevent.monkey.patch_all()
too late. You get a cryptic error that crashes your program.
-
-
www.bitecode.dev www.bitecode.dev
-
Tips
- if name == "main" is important for multiprocessing because it will spawn a new Python, that will import the module. You don't want this module to spawn a new Python that imports the module that will spawn a new Python...
- If the function to submit to the executor has complicated arguments to be passed to it, use a
lambda
orfunctools.partial
. max_worker = 1
is a very nice way to get a poor man’s task queue.
-
Both are bad if you need to cancel tasks, collaborate a lot between tasks, deal precisely with the task lifecycle, needs a huge number of workers or want to milk out every single bit of perfs. You won’t get nowhere near Rust level of speed.
-
Process pools are good for:
- When you don't need to share data between tasks.
- When you are CPU bound.
- When you don't have too many tasks to run at the same time.
- When you need true parallelism and want to exercise your juicy cores.
-
Thread pools are good for:
- Tasks (network, file, etc.) that needs less than 10_000 I/O interactions per second. The number is higher than you would expect, because threads are surprisingly cheap nowadays, and you can spawn a lot of them without bloating memory too much. The limit is more the price of context switching. This is not a scientific number, it's a general direction that you should challenge by measuring your own particular case.
- When you need to share data between the tasks.
- When you are not CPU bound.
- When you are OK to execute tasks a bit slower to you ensure you are not blocking any of them (E.G: user UI and a long calculation).
- When you are CPU bound, but the CPU calculations are delegating to a C extension that releases the GIL, such as numpy. Free parallelism on the cheap, yeah!
E.G: a web scraper, a GUI to zip files, a development server, sending emails without blocking web page rendering, etc.
-
What would a version with multiprocessing look like?
Pretty much the same, but, we use ProcessPoolExecutor instead.
```python from concurrent.futures import ProcessPoolExecutor, as_completed
...
with ProcessPoolExecutor(max_workers=5) as executor: ... ```
Note that here the number of workers maps to the number of CPU cores I want to dedicate to the program. Processes are way more expensive than threads, as each starts a new Python instance.
-
Python standard library comes with a beautiful abstraction for them I see too few people use: the pool executors.
-
ThreadPoolExecutor.
```python from concurrent.futures import ThreadPoolExecutor, as_completed
def main(): with ThreadPoolExecutor(max_workers=len(URLs)) as executor: tasks = {} for url in URLs: future = executor.submit(fetch_url, url) tasks[future] = url
for future in as_completed(tasks): title = future.result() url = tasks[future] print(f"URL: {url}\nTitle: {title}")
```
-
You can distribute work to a bunch of process workers or thread workers with a few lines of code:
```python from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=5) as executor: executor.submit(do_something_blockint) ```
-
-
eddieantonio.ca eddieantonio.ca
-
When you run your Python program using [CPython], the code is parsed and converted to an internal bytecode format, which is then executed inside the VM. From the user’s perspective, this is clearly an interpreter—they run their program from source. But if you look under CPython’s scaly skin, you’ll see that there is definitely some compiling going on. The answer is that it is both. CPython is an interpreter, and it has a compiler.
-
You can actually compile all of your Python code beforehand using the compileall module on the command line:
$ python3 -m compileall .
This will place the compiled bytecode of all Python files in the current directory in pycache/ and show you any compiler errors.
-
Python is both a compiled and interpreted language
The CPython interpreter really is an interpreter. But it also is a compiler. Python must go through a few stages before ever running the first line of code:
- scanning
- parsing
Older versions of Python added an additional stage:
- scanning
- parsing
- checking for valid assignment targets
Let’s compare this to the stages of compiling a C program:
- ~~preprocessing~~
- lexical analysis (another term for “scanning”)
- syntactic analysis (another term for “parsing”)
- ~~semantic analysis~~
- ~~linking~~
-
next stage is parsing (also known as syntactic analysis) and the parser reports the first error in the source code. Parsing the whole file happens before running the first line of code which means that Python does not even see the error on line 1 and reports the syntax error on line 2.
-
I haven’t done a deep dive into the source code of the CPython interpreter to verify this, but I think the reason that this is the first error detected is because one of the first steps that Python 3.12 does is scanning (also known as lexical analysis). The scanner converts the ENTIRE file into a series of tokens before continuing to the next stage. A missing quotation mark at the end of a string literal is an error that is detected by the scanner—the scanner wants to turn the ENTIRE string into one big token, but it can’t do that until it finds the closing quotation mark. The scanner runs first, before anything else in Python 3.12, hence why this is the first error message.
-
Python reports only one error message at a time—so the game is which error message will be reported first?
Here is the buggy program:
python 1 / 0 print() = None if False ñ = "hello
Each line of code generates a different error message:
- 1 / 0 will generate ZeroDivisionError: division by zero.
- print() = None will generate SyntaxError: cannot assign to function call.
- if False will generate SyntaxError: expected ':'.
- ñ = "hello will generate SyntaxError: EOL while scanning string literal.
The question is… which will be reported first?
Spoilers: the specific version of Python matters (more than I thought it would) so keep that in mind if you see different results.
The first error message detected is on the last line of source code. What this tells us is that Python must read the entire source code file before running the first line of code. If you have a definition in your head of an “interpreted language” that includes “interpreted languages run the code one line at a time”, then I want you to cross that out!
-
-
mathspp.com mathspp.com
-
The tokenizer takes your source code and chunks it into “tokens”. Tokens are just small pieces of source code that you can identify in isolation. As examples, there will be tokens for numbers, mathematical operators, variable names, and keywords (like if or for). The parser will take that linear sequence of tokens and essentially reshape them into a tree structure (that's what the T in AST stands for: Tree). This tree is what gives meaning to your tokens, providing a nice structure that is easier to reason about and work on. As soon as we have that tree structure, our compiler can go over the tree and figure out what bytecode instructions represent the code in the tree. For example, if part of the tree represents a function, we may need a bytecode for the return statement of that function. Finally, the interpreter takes those bytecode instructions and executes them, producing the results of our original program.
-
Recap
In this article you started implementing your own version of Python. To do so, you needed to create four main components:
A tokenizer: * accepts strings as input (supposedly, source code); * chunks the input into atomic pieces called tokens; * produces tokens regardless of their sequence making sense or not.
A parser: * accepts tokens as input; * consumes the tokens one at a time, while making sense they come in an order that makes sense; * produces a tree that represents the syntax of the original code.
A compiler: * accepts a tree as input; * traverses the tree to produce bytecode operations.
An interpreter: * accepts bytecode as input; * traverses the bytecode and performs the operation that each one represents; * uses a stack to help with the computations.
-
Each bytecode is defined by two things: the type of bytecode operation we're dealing with (e.g., pushing things on the stack or doing an operation); and the data associated with that bytecode operation, which not all bytecode operations need.
-
The interpreter accepts a list of bytecode operations and its method interpret will go through the list of bytecodes, interpreting one at a time.
``` from .compiler import Bytecode, BytecodeType
...
class Interpreter: def init(self, bytecode: list[Bytecode]) -> None: self.stack = Stack() self.bytecode = bytecode self.ptr: int = 0
def interpret(self) -> None: for bc in self.bytecode: # Interpret this bytecode operator. if bc.type == BytecodeType.PUSH: self.stack.push(bc.value) elif bc.type == BytecodeType.BINOP: right = self.stack.pop() left = self.stack.pop() if bc.value == "+": result = left + right elif bc.value == "-": result = left - right else: raise RuntimeError(f"Unknown operator {bc.value}.") self.stack.push(result) print("Done!") print(self.stack)
```
-
The interpreter is the part of the program that is responsible for taking bytecode operations as input and using those to actually run the source code you started off with.
-
To write our compiler, we'll just create a class with a method compile. The method compile will mimic the method parse in its structure. However, the method parse produces tree nodes and the method compile will produce bytecode operations.
-
The compiler is the part of our program that will take a tree (an AST, to be more precise) and it will produce a sequence of instructions that are simple and easy to follow.
-
Instead of interpreting the tree directly, we'll use a compiler to create an intermediate layer.
-
After we have our sequence of operations (bytecodes), we will “interpret” it. To interpret the bytecode means that we go over the bytecode, sequence by sequence, and at each point we perform the simple operation that the bytecode tells us to perform.
-
Bytecodes are just simple, atomic instructions that do one thing, and one thing only.
-
Abstract syntax tree
It's an abstract syntax tree because it is a tree representation that doesn't care about the original syntax we used to write the operation. It only cares about the operations we are going to perform.
-
The parser is the part of our program that accepts a stream of tokens and makes sure they make sense.
-
The tokenizer
The tokenizer is the part of your program that accepts the source code and produces a linear sequence of tokens – bits of source code that you identify as being relevant.
-
The four parts of our program
- Tokenizer takes source code as input and produces tokens;
- Parser takes tokens as input and produces an AST;
- Compiler takes an AST as input and produces bytecode;
- Interpreter takes bytecode as input and produces program results.
-
-
tonybaloney.github.io tonybaloney.github.io
-
Once an interpreter is running (remembering what I said that it is preferable to leave them running) you can share data using a channel. The channels module is also part of PEP554 and available using a secret-import:
``` import _xxsubinterpreters as interpreters import _xxinterpchannels as channels
interp_id = interpreters.create(site=site) channel_id = channels.create()
interpreters.run_string( interp_id, """ import _xxinterpchannels as channels channels.send('hello!') """, shared={ "channel_id": channel_id } )
print(channels.recv(channel_id)) ```
-
To share data, you can use the shared argument and provide a dictionary with shareable (int, float, bool, bytes, str, None, tuple) values:
``` import _xxsubinterpreters as interpreters
interp_id = interpreters.create(site=site)
interpreters.run_string( interp_id, "print(message)", shared={ "message": "hello world!" } )
interpreters.run_string( interp_id, """ for message in messages: print(message) """, shared={ "messages": ("hello world!", "this", "is", "me") } )
interpreters.destroy(interp_id) ```
-
To start an interpreter that sticks around, you can use interpreters.create() which returns the interpreter ID. This ID can be used for subsequent .run_string calls:
``` import _xxsubinterpreters as interpreters
interp_id = interpreters.create(site=site)
interpreters.run_string(interp_id, "print('hello world')") interpreters.run_string(interp_id, "print('hello universe')")
interpreters.destroy(interp_id) ```
-
Starting a sub interpreter is a blocking operation, so most of the time you want to start one inside a thread.
``` from threading import Thread import _xxsubinterpreters as interpreters
t = Thread(target=interpreters.run, args=("print('hello world')",)) t.start() ```
-
You can create, run and stop a sub interpreter with the .run() function which takes a string or a simple function
``` import _xxsubinterpreters as interpreters
interpreters.run(''' print("Hello World") ''') ```
-
Inter-Worker communication
Whether using sub interpreters or multiprocessing you cannot simply send existing Python objects to worker processes.
Multiprocessing uses
pickle
by default. When you start a process or use a process pool, you can use pipes, queues and shared memory as mechanisms to sending data to/from the workers and the main process. These mechanisms revolve around pickling. Pickling is the builtin serialization library for Python that can convert most Python objects into a byte string and back into a Python object.Pickle is very flexible. You can serialize a lot of different types of Python objects (but not all) and Python objects can even define a method for how they can be serialized. It also handles nested objects and properties. However, with that flexibility comes a performance hit. Pickle is slow. So if you have a worker model that relies upon continuous inter-worker communication of complex pickled data you’ll likely see a bottleneck.
Sub interpreters can accept pickled data. They also have a second mechanism called shared data. Shared data is a high-speed shared memory space that interpreters can write to and share data with other interpreters. It supports only immutable types, those are:
- Strings
- Byte Strings
- Integers and Floats
- Boolean and None
- Tuples (and tuples of tuples)
To share data with an interpreter, you can either set it as initialization data or you can send it through a channel.
-
The next point when using a parallel execution model like multiprocessing or sub interpreters is how you share data.
Once you get over the hurdle of starting one, this quickly becomes the most important point. You have two questions to answer:
- How do we communicate between workers?
- How do we manage the state of workers?
-
Half of the time taken to start an interpreter is taken up running “site import”. This is a special module called site.py that lives within the Python installation. Interpreters have their own caches, their own builtins, they are effectively mini-Python processes. Starting a thread or a coroutine is so fast because it doesn’t have to do any of that work (it shares that state with the owning interpreter), but it’s bound by the lock and isn’t parallel.
-
Both multiprocessing processes and interpreters have their own import state. This is drastically different to threads and coroutines. When you await an async function, you don’t need to worry about whether that coroutine has imported the required modules. The same applies for threads.
For example, you can import something in your module and reference it from inside the thread function:
```python import threading from super.duper.module import cool_function
def worker(info): # This already exists in the interpreter state cool_function()
info = {'a': 1} thread = Thread(target=worker, args=(info, )) ```
-
Another important point is that multiprocessing is often used in a model where the processes are long-running and handed lots of tasks instead of being spawned and destroyed for a single workload. One great example is Gunicorn, the popular Python web server. Gunicorn will spawn “workers” using multiprocessing and those workers will live for the lifetime of the main process. The time to start a process or a sub interpreter then becomes irrelevant (at 89 ms or 1 second) when the web worker can be running for weeks, months or years. The ideal way to use these parallel workers for small tasks (like handle a single web request) is to keep them running and use a main process to coordinate and distribute the workload
-
What is the difference between threading, multiprocessing, and sub interpreters?
The Python standard library has a few options for concurrent programming, depending on some factors:
- Is the task you’re completing IO-bound (e.g. reading from a network, writing to disk)
- Does the task require CPU-heavy work, e.g. computation
- Can the tasks be broken into small chunks or are they large pieces of work?
Here are the models:
- Threads are fast to create, you can share any Python objects between them and have a small overhead. Their drawback is that Python threads are bound to the GIL of the process, so if the workload is CPU-intensive then you won’t see any performance gains. Threading is very useful for background, polling tasks like a function that waits and listens for a message on a queue.
- Coroutines are extremely fast to create, you can share any Python objects between them and have a miniscule overhead. Coroutines are ideal for IO-based activity that has an underlying API that supports async/await.
- Multiprocessing is a Python wrapper that creates Python processes and links them together. These processes are slow to start, so the workload that you give them needs to be large enough to see the benefit of parallelising the workload. However, they are truly parallel since each one has it’s own GIL.
- Sub interpreters have the parallelism of multiprocessing, but with a much faster startup time.
-
- Nov 2023
-
docs.docker.com docs.docker.com
-
Rosetta is now Generally Available for all users on macOS 13 or later. It provides faster emulation of Intel-based images on Apple Silicon. To use Rosetta, see Settings. Rosetta is enabled by default on macOS 14.1 and later.
Tested it on my side, and
poetry install
of one Python project took 44 seconds instead of 2 minutes 53 seconds, so it's nearly a 4x speed increase!
Tags
Annotators
URL
-
- Oct 2023
-
www.pythonpool.com www.pythonpool.com
-
Method 1: numpy.any() to check if the NumPy array is empty in Python numpy.any() method is used to test whether any array element along a given axis evaluates to True. Syntax: numpy.any(a, axis = None, out = None, keepdims = <no value>) Parameters: array: Input array whose elements need to be checked.axis: Axis along which array elements are evaluated.out: Output array having the same dimensions as Input arraykeepdmis: If this is set to True, the axes which are reduced are left in the result. Return Value: A new Boolean array (depending on the ‘out;’ parameter) 1234567import numpy as nparr = np.array([])flag = not np.any(arr)if flag: print('Array is empty')else: print('Array is not empty') Output: Array is empty In this example, we have used numpy.any() method to check whether the array is empty or not. As the array is empty, the value of the flag variable becomes True, and so the output ‘Array is empty’ is displayed. The limitation to this function is that it does not work if the array contains the value 0 in it.
This is WRONG.
numpy.any() checks if there is at least one non-zero element in an array.
Tags
Annotators
URL
-
-
epydoc.sourceforge.net epydoc.sourceforge.net
-
www.cl.cam.ac.uk www.cl.cam.ac.uk
-
nserting a new chicken into a ring at some specified location in it, usu-ally first or last.2. Removing a chicken from a ring.3. Putting all the chickens of one ring, in order, into another at some speci-fied location in it, usually first or last.4. Performing some auxiliary operation on each member of a ring in eitherforward or reverse order
In simpler terms, Sutherland's thesis is discussing the basic operations of a data structure known as a ring. A ring is a type of list where the elements are connected in a circular manner. The operations he mentions are:
- Inserting a new element into the ring at a specified location.
- Removing an element from the ring.
- Moving all elements of one ring, in order, into another ring at a specified location.
- Performing an operation on each member of a ring in either forward or reverse order.
These operations are implemented using macro instructions in the compiler language. A macro instruction is a directive to the compiler which specifies how an input pattern should be mapped to an output pattern.
The thesis also discusses the generation of new elements. Subroutines are used to set up new elements in free spaces in the storage structure. When parts of the drawing are deleted, the registers representing them become free and are placed in a 'FREES' ring. New components are set up at the end of the storage area, while free blocks are allowed to accumulate. A process called 'garbage collection' periodically compacts the storage structure by removing the free blocks and relocating the information above them.
In Python, the ring data structure can be implemented using a doubly linked list. Here's a simple example:
```python class Node: def init(self, data): self.data = data self.next = None self.prev = None
class Ring: def init(self): self.head = None
def append(self, data): if not self.head: self.head = Node(data) self.head.next = self.head self.head.prev = self.head else: new_node = Node(data) new_node.prev = self.head.prev new_node.next = self.head self.head.prev.next = new_node self.head.prev = new_node def display(self): temp = self.head while True: print(temp.data, end = " ") temp = temp.next if temp == self.head: break
```
In this example, the
Node
class represents an element in the ring, and theRing
class represents the ring itself. Theappend
method is used to add a new element to the ring, and thedisplay
method is used to print all elements in the ring. -
RING STRUCTURE
Ivan writes about a concept called "Ring Structure". This is a way of organizing and linking different elements or components in a system.
In simpler terms, imagine you have a bunch of different objects (like points and lines in a drawing program like Sketchpad). You want to keep track of how these objects are related to each other. For example, you might want to know all the lines that end at a particular point.
To do this, Ivan uses a "ring structure". Each object has a "string of pointers" - basically a list of references to other objects. This list is circular - the last item in the list points back to the first item. This makes it easy to move forwards and backwards through the list.
Each object has two "registers" or slots for keeping track of these relationships. One slot is for the object itself, and the other is for the list of related objects.
Ivan uses the terms "hen" and "chicken" to describe these slots. The "hen" is the object itself, and the "chicken" is the list of related objects.
Here's a simple Python code example to illustrate this concept:
```python class Point: def init(self): self.hen = self self.chickens = []
class Line: def init(self, point1, point2): self.hen = self self.chickens = [point1, point2] point1.chickens.append(self) point2.chickens.append(self) ```
In this example, a
Point
object has ahen
that refers to itself and a list ofchickens
that will contain anyLine
objects that end at this point. When aLine
is created, it adds itself to thechickens
list of its end points.The "ring structure" is a way to organize and link different elements in a system, making it easier to find and update related elements.
-
MNEMONICS AND CONVENTIONS
Mnemonics for Registers: Instead of remembering numerical indices, we use human-readable keys.
python point = {'TYPE': 'Point', 'PVAL_X': 5, 'PVAL_Y': 10}
Flexibility: If we want to change the internal structure, we can easily do so by changing the keys.
```python
Changing 'PVAL_X' to 'X_COORD'
point = {'TYPE': 'Point', 'X_COORD': 5, 'PVAL_Y': 10} ```
Conventions:
- The first component ('TYPE') indicates the type of element.
- Numerical information like coordinates is stored at the end.
python line = {'TYPE': 'Line', 'START': 'point1', 'END': 'point2', 'LENGTH': 7.2}
Pointers and Topology: We can use pointers (references) to other elements to establish relationships.
python point1 = {'TYPE': 'Point', 'PVAL_X': 1, 'PVAL_Y': 1} point2 = {'TYPE': 'Point', 'PVAL_X': 4, 'PVAL_Y': 5} line = {'TYPE': 'Line', 'START': point1, 'END': point2}
Relocation: If we move point1 to a new variable, we update the pointer in line.
python new_point1 = point1 # Relocating point1 to new_point1 line['START'] = new_point1 # Updating the pointer
Segregation of Data: Numerical data is at the end, so if we need to move elements, the numerical data remains untouched.
```python
Even if we relocate, the numerical data ('PVAL_X' and 'PVAL_Y') remains the same.
```
-
- Sep 2023
-
docutils.sourceforge.io docutils.sourceforge.io
-
docutils.sourceforge.io docutils.sourceforge.io
-
docutils.sourceforge.io docutils.sourceforge.io
-
thescimus.com thescimus.com
-
Configuring PyCharm: Open PyCharm with ‘Pytest Web Framework’ Press Ctrl+Alt+S > Project Click ‘Project Interpreter’ Select Python 3.6 Click ‘OK’ Go to write over 100500 automated tests!!!
This section provides a step-by-step guide on setting up PyCharm for automated testing using the 'Pytest Web Framework'.
-
- Aug 2023
- Jul 2023
-
danielms.site danielms.site
-
cat requirements.txt | grep -E '^[^# ]' | cut -d= -f1 | xargs -n 1 poetry add
Use
poetry init
to create a samplepyproject.toml
, and then trigger this line to exportrequirements.txt
into apyproject.toml
-
-
wesmckinney.com wesmckinney.com
-
As you can see, it has sliced along axis 0, the first axis. A slice, therefore, selects a range of elements along an axis. It can be helpful to read the expression arr2d[:2] as "select the first two rows of arr2d."
Slices follow a similar logic than indexing in NumPy array's.
array[:2]
selects a range of elements along a single axis,, butarray[:2, 1:]
does it along two axis.
Tags
Annotators
URL
-
-
wesmckinney.com wesmckinney.com
-
You might want to suppress only ValueError, since a TypeError (the input was not a string or numeric value) might indicate a legitimate bug in your program. To do that, write the exception type after except: def attempt_float(x): try: return float(x) except ValueError: return x
-
Since generators produce output one element at a time versus an entire list all at once, it can help your program use less memory.
-
It is not until you request elements from the generator that it begins executing its code:
A generator is a function-like iterator object.
-
A generator is a convenient way, similar to writing a normal function, to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators can return a sequence of multiple values by pausing and resuming execution each time the generator is used. To create a generator, use the yield keyword instead of return in a function:
-
In this case, return_value would be a 3-tuple with the three returned variables. A potentially attractive alternative to returning multiple values like before might be to return a dictionary instead:
Returning multiple values in Python is expressed as a tuple by default and each value is correspondingly assigned. Optionally, you can return a dictionary if specified.
-
The for parts of the list comprehension are arranged according to the order of nesting, and any filter condition is put at the end as before.
Nested list comprehensions follow the same logic as nested for loops. The difference strives that in list comprehensions the filtered variable is mentioned twice.
-
-
til.simonwillison.net til.simonwillison.net
-
python -m calendar
So surprised that you can output a calendar view using Python
-
python -m site, which outputs useful information about your installation
python -m site
<--- see useful information about your Python installation
Tags
Annotators
URL
-
-
github.com github.com
-
```python from flask import Flask, request from collections import defaultdict import re import random
GREEN ="🟩" YELLOW ="🟨" WHITE ="⬜"
def get_answers(): with open("allowed_answers.txt") as f: answers = set(l for l in f.read().splitlines() if l) return answers
def get_guesses(): guesses = get_answers() with open("allowed_guesses.txt") as f: for l in f.read().splitlines(): if l: guesses.add(l) return guesses
app = Flask(name, static_folder="static") app.answers = get_answers() app.guesses = get_guesses() word = random.choice(list(app.answers)) print(f"The word is {word}")
def with_header(content): return f"""
<html> <head> <link rel="search" type="application/opensearchdescription+xml" title="searchGame" href="http://searchgame:5000/static/opensearch.xml" /> </head> <body> {content} </body></html>"""
@app.route("/") def home(): return with_header("
Right click on the address bar to install the search engine.
")@app.route("/search") def search(): return with_header(f"Content: {request.args.get('q')}")
def to_result(guess, answer): chars = [WHITE] * 5 count = defaultdict(int) for idx, (g, a) in enumerate(zip(guess, answer)): if g == a: chars[idx] = GREEN else: count[a] += 1
for idx, g in enumerate(guess): if g in count and count[g] > 0 and chars[idx] == WHITE: chars[idx] = YELLOW count[g] -= 1 return "".join(chars)
def maybe_error(guess): if len(guess) < 5: return f"less than 5 characters" if len(guess) > 5: return f"greater than 5 characters" if guess not in app.guesses: return f"not in wordlist" return None
@app.route("/game") def game(): query = request.args.get("q") guesses = [x for x in re.split("[. ]", query) if x] response = [] if not guesses: response.append("Enter 5-letter guesses separated by spaces") else: most_recent = guesses[-1] # Don't show "too short" error for most recent guess if len(most_recent) < 5: guesses = guesses[:-1] if not guesses: response.append("Enter a wordle guess") for guess in guesses[::-1]: error = maybe_error(guess) if error is None: result = to_result(guess, word) s = f"{guess} | {result}" if result == GREEN * 5: s = f"{s} | CORRECT!" response.append(s) else: response.append(f"{guess} | ERROR: {error}")
return [query, response]
```
-
-
testdriven.io testdriven.io
-
o con
حالا می خواد بره با Axios وصلش کنه
-
e
حالا رفت سراغ Vue
-
requests
در واقع تعیین می کنه که درخواست ها از چه پروتکلی، با چه IP یا Domain Name و رو چه Port می تونی جواب بدی. خیلی خووبه براش تعیین کنی تو فقط باید از این IP و Port که در Front تعیین کردی استفاده کنی
-
Flask-CORS
تازه رفت سراغ Flask-origin چقد جاالب.
-
pyth
اول یه Virtual Environment می سازه
-
-
pythonspeed.com pythonspeed.com
-
For a new project, I’d just immediately start with Ruff; for existing projects, I would strongly recommend trying it as soon as you start getting annoyed about how long linting is taking in CI (or even worse, on your computer).
Recommendation for when to use Ruff over PyLint or Flake8
Tags
Annotators
URL
-
- Jun 2023
-
www.datacamp.com www.datacamp.com
Tags
Annotators
URL
-
-
stackoverflow.com stackoverflow.com
-
Python essentially doesn't have private methods, let alone protected ones, and it doesn't turn out to be that big a deal in practice.
-
-
-
```python def split_user(userid): """ Return the user and domain parts from the given user id as a dict.
For example if userid is u'acct:seanh@hypothes.is' then return {'username': u'seanh', 'domain': u'hypothes.is'}' :raises InvalidUserId: if the given userid isn't a valid userid """ match = re.match(r"^acct:([^@]+)@(.*)$", userid) if match: return {"username": match.groups()[0], "domain": match.groups()[1]} raise InvalidUserId(userid)
```
-
-
realpython.com realpython.com
Tags
Annotators
URL
-
-
-
```html
<body> <div> {% for chat in chats %} <div>{{ chat.contents }}</div> <button id={{chat.id}} ✅ onClick=getId(id) ✅ > print this chat_id out </button> {% endfor %} </div> ... <script> function getId(id){ console.log(id) } </script> </body>```
-
-
jinja.palletsprojects.com jinja.palletsprojects.com
-
drivendata.co drivendata.co
-
Examples of frontends include: pip, build, poetry, hatch, pdm, flit Examples of backends include: setuptools (>=61), poetry-core, hatchling, pdm-backend, flit-core
Frontend and backend examples of Python's build backends
-
pyproject.toml-based builds are the future, and they promote better practices for reliable package builds and installs. You should prefer to use them!
setup.py
is considered a "legacy" functionality these days -
Did you say setuptools? Yes! You may be familiar with setuptools as the thing that used your setup.py files to build packages. Setuptools now also fully supports pyproject.toml-based builds since version 61.0.0. You can do everything in pyproject.toml and no longer need setup.py or setup.cfg.
setuptools can now utilize
pyproject.toml
Tags
Annotators
URL
-
-
www.educative.io www.educative.io
-
Cropping pages.
PyPDF4 is compared with 6 other python libraries to manipulate, create and annotate pdf files via python
-
- May 2023
-
kobzol.github.io kobzol.github.io
-
With this dataclass, I have an explicit description of what the function returns.
Dataclasses give you a lot more clarity of what the function returns, in comparison to returning tuples or dictionaries
-
-
stackoverflow.com stackoverflow.com
-
Host machine: docker run -it -p 8888:8888 image:version Inside the Container : jupyter notebook --ip 0.0.0.0 --no-browser --allow-root Host machine access this url : localhost:8888/tree
3 ways of running
jupyter notebook
in a container
-
-
www.runoob.com www.runoob.com
-
python 内置函数next()
Tags
Annotators
URL
-
-
nedbatchelder.com nedbatchelder.com
-
we are all beginners
Tags
Annotators
URL
-
-
stackoverflow.com stackoverflow.com
-
How can I add, subtract, and compare binary numbers in Python without converting to decimal?
I think the requirements of this were not spelled out well. After reading this over a couple of times, I think the problem should be…
"Add, subtract, and compare binary numbers in Python as strings, without converting them to decimal."
I'll take on that problem sometime when I get free time!
-
-
-
'handlers': { 'console': { 'level': 'INFO', 'class': 'logging.StreamHandler', 'stream': sys.stdout, 'formatter': 'verbose' }, },
It's as simple as adding "sys.stdout" to the "stream" attribute.
-
-
codeinthehole.com codeinthehole.com
-
16 August 2011
This is a pretty old article, from 2011. Note that it refers to Python 2.6 and 2.7. Today, Python is up to version 3.12.
-
- Apr 2023
-
sinoroc.gitlab.io sinoroc.gitlab.io
-
NICE tables comparing build frameworks, front-ends & backends & others, discovered from this SO.
-
-
www.zhihu.com www.zhihu.com
-
Python入门 类class 基础篇
class starter
-
-
colab.research.google.com colab.research.google.com
-
-
Experienced talk on python project configurations
Tags
Annotators
URL
-
-
pythonspeed.com pythonspeed.com
-
If you install a package with pip’s --user option, all its files will be installed in the .local directory of the current user’s home directory.
One of the recommendations for Python multi-stage Docker builds. Thanks to
pip install --user
, the packages won't be spread across 3 different paths.
Tags
Annotators
URL
-
- Mar 2023
-
www.seanh.cc www.seanh.cc
-
snarky.ca snarky.ca
-
Honestly, all the activation scripts do are:
See the 4 steps below to understand what activating an environment in Python really does
Tags
Annotators
URL
-
-
-
Using pex in combination with S3 for storing the pex files, we built a system where the fast path avoids the overhead of building and launching Docker images.Our system works like this: when you commit code to GitHub, the GitHub action either does a full build or a fast build depending on if your dependencies have changed since the previous deploy. We keep track of the set of dependencies specified in setup.py and requirements.txt.For a full build, we build your project dependencies into a deps.pex file and your code into a source.pex file. Both are uploaded to Dagster cloud. For a fast build we only build and upload the source.pex file.In Dagster Cloud, we may reuse an existing container or provision a new container as the code server. We download the deps.pex and source.pex files onto this code server and use them to run your code in an isolated environment.
Fast vs full deployments
Tags
Annotators
URL
-
-
stackoverflow.com stackoverflow.com
-
Well, in short, with iterators, the flow of information is one-way only. When you have an iterator, all you can really do call the __next__ method to get the very next value to be yielded. In contrast, the flow of information with generators is bidirectional: you can send information back into the generator via the send method.
- Iterator ← one-way communication (can only
yield
stuff) - Generator ← bidirectional communication (can also accept information via the
send
method)
- Iterator ← one-way communication (can only
-
-
docs.databricks.com docs.databricks.com