29 Matching Annotations
  1. Nov 2024
    1. Deploying Machine Learning Models with Flask and AWS Lambda: A Complete Guide

      In essence, this article is about:

      1) Training a sample model and uploading it to an S3 bucket:

      ```python from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import joblib

      Load the Iris dataset

      iris = load_iris() X, y = iris.data, iris.target

      Split the data into training and testing sets

      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

      Train the logistic regression model

      model = LogisticRegression(max_iter=200) model.fit(X_train, y_train)

      Save the trained model to a file

      joblib.dump(model, 'model.pkl') ```

      1. Creating a sample Zappa config, because AWS Lambda doesn’t natively support Flask, we need to use Zappa, a tool that helps deploy WSGI applications (like Flask) to AWS Lambda:

      ```json { "dev": { "app_function": "app.app", "exclude": [ "boto3", "dateutil", "botocore", "s3transfer", "concurrent" ], "profile_name": null, "project_name": "flask-test-app", "runtime": "python3.10", "s3_bucket": "zappa-31096o41b" },

      "production": {
          "app_function": "app.app",
          "exclude": [
              "boto3",
              "dateutil",
              "botocore",
              "s3transfer",
              "concurrent"
          ],
          "profile_name": null,
          "project_name": "flask-test-app",
          "runtime": "python3.10",
          "s3_bucket": "zappa-31096o41b"
      }
      

      } ```

      1. Writing a sample Flask app:

      ```python import boto3 import joblib import os

      Initialize the Flask app

      app = Flask(name)

      S3 client to download the model

      s3 = boto3.client('s3')

      Download the model from S3 when the app starts

      s3.download_file('your-s3-bucket-name', 'model.pkl', '/tmp/model.pkl') model = joblib.load('/tmp/model.pkl')

      @app.route('/predict', methods=['POST']) def predict(): # Get the data from the POST request data = request.get_json(force=True)

      # Convert the data into a numpy array
      input_data = np.array(data['input']).reshape(1, -1)
      
      # Make a prediction using the model
      prediction = model.predict(input_data)
      
      # Return the prediction as a JSON response
      return jsonify({'prediction': int(prediction[0])})
      

      if name == 'main': app.run(debug=True) ```

      1. Deploying this app to production (to AWS):

      bash zappa deploy production

      and later eventually updating it:

      bash zappa update production

      1. We should get a URL like this:

      https://xyz123.execute-api.us-east-1.amazonaws.com/production

      which we can query:

      curl -X POST -H "Content-Type: application/json" -d '{"input": [5.1, 3.5, 1.4, 0.2]}' https://xyz123.execute-api.us-east-1.amazonaws.com/production/predict

  2. Apr 2024
    1. Lesson 3: When executing a lot of requests to S3, make sure to explicitly specify the AWS region.
    2. Lesson 2: Adding a random suffix to your bucket names can enhance security.
    3. Lesson 1: Anyone who knows the name of any of your S3 buckets can ramp up your AWS bill as they like.

      The author was charged over $1300 after two days of using an S3 bucket, because some OS tool stored a default bucket name in the config, which was the same as his bucket name.

      Luckily, after everything AWS made an exception and he did not have to pay the bill.

  3. Nov 2023
  4. Nov 2021
    1. Saving Your Wallet With Lifecycle Rules Of course, storing multiple copies of objects uses way more space, especially if you’re frequently overwriting data. You probably don’t need to store these old versions for the rest of eternity, so you can do your wallet a favor by setting up a Lifecycle rule that will remove the old versions after some time. Under Management > Life Cycle Configuration, add a new rule. The two options available are moving old objects to an infrequent access tier, or deleting them permanently after
    1. S3 object versioning Many of the strategies to be discussed for data durability require S3 object versioning to be enabled for the bucket (this includes S3 object locks and replication policies). With object versioning, anytime an object is modified, it results in a new version, and when the object is deleted, it only results in the object being given a delete marker. This allows an object to be recovered if it has been overwritten or marked for deletion. However, it is still possible for someone with sufficient privileges to permanently delete all objects and their versions, so this alone is not sufficient. When using object versioning, deleting old versions permanently is done with the call s3:DeleteObjectVersion, as opposed to the usual s3:DeleteObject, which means that you can apply least privilege restrictions to deny someone from deleting the old versions. This can help mitigate some issues, but you should still do more to ensure data durability. Life cycle policies Old versions of objects will stick around forever, and each version is an entire object, not a diff of the previous version. So if you have a 100MB file that you change frequently, you’ll have many copies of this entire file. AWS acknowledges in the documentation “you might have one or more objects in the bucket for which there are millions of versions”. In order to reduce the number of old versions, you use lifecycle policies. Audit tip: It should be a considered a misconfiguration if you have object versioning enabled and no lifecycle policy on the bucket. Every versioned S3 bucket should have a `NoncurrentVersionExpiration` lifecycle policy to eventually remove objects that are no longer the latest version. For data durability, you may wish to set this to 30 days. If this data is being backed up, you may wish to set this to as little as one day on the primary data and 30 days on the backup. If you are constantly updating the same objects multiple times per day, you may need a different solution to avoid unwanted costs. Audit tip: In 2019, I audited the AWS IAM managed policies and found some issues, including what I called Resource policy privilege escalation. In a handful of cases AWS had attempted to create limited policies that did not allow `s3:Delete*`, but still allowed some form of `s3:Put*`. The danger here is the ability to call `s3:PutBucketPolicy` in order to grant an external account full access to an S3 bucket to delete the objects and versions within it, or `s3:PutLifecycleConfiguration` with an expiration of 1 day for all objects which will delete all objects and their versions in the bucket. Storage classes With lifecycle policies, you have the ability to transition objects to less expensive storage classes. Be aware that there are many constraints, specifically around the size of the object and how long you have to keep it before transitioning or deleting it. Objects in the S3 Standard storage class must be kept there for at least 30 days until they can be transitioned. Further, once an object is in the S3 Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA, those objects must be kept there for 30 days before deletion. Objects in Glacier must be kept for 90 days before deleting, and objects in Glacier Deep Archive must be kept for 180 days. So if you had plans of immediately transitioning all non-current object versions to Glacier Deep Archive to save money, and then deleting them after 30 days, you will not be able to.
  5. Oct 2021
    1. So, while DELETE operations are free, LIST operations (to get a list of objects) are not free (~$.005 per 1000 requests, varying a bit by region).

      Deleting buckets on S3 is not free. If you use either Web Console or AWS CLI, it will execute the LIST call per 1000 objects

  6. Sep 2020
  7. May 2020
    1. Your Amazon Athena query performance improves if you convert your data into open source columnar formats, such as Apache Parquet

      s3 perfomance use columnar formats

    1. Expedited retrieval allows you to quickly access your data when you need to have almost immediate access to your information. This retrieval type can be used for archives up to 250MB. Expedited retrieval usually completes within 1 and 5 minutes.

      https://aws.amazon.com/glacier/faqs/

      3 types of retrieval

      expecited 1~5minutes

    1. In general, bucket owners pay for all Amazon S3 storage and data transfer costs associated with their bucket. A bucket owner, however, can configure a bucket to be a Requester Pays bucket. With Requester Pays buckets, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket. The bucket owner always pays the cost of storing data.

      Request Pays

    1. Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.

      event notification of s3 might take minutes

      BTW,

      cloud watch does not support s3, but cloud trail does

  8. Aug 2019
  9. Aug 2018
  10. Feb 2018
  11. Dec 2017
  12. Oct 2017
  13. Jun 2017
    1. Kafka consumer offset management protocol to keep track of what’s been uploaded to S3

      consumers keep track of what's written and where it left off by looking at kafka consumer offsets rather than checking S3 since S3 is an eventually consistent system.

    2. Data lost or corrupted at this stage isn’t recoverable so the greatest design objective for Secor is data integrity.

      data loss in S3 is being mitigated.

  14. May 2017
  15. Mar 2017
  16. May 2016
    1. . Is art about making up new things or about transforming the raw material that's out there? Cutting, pasting, sampling, remixing and mashing up have become mainstream modes of cultural expression, and fan fiction is part of that. It challenges just about everything we thought we knew about art and creativity.

      Not really. Art has always been about reacting (in some part) to the works that have come before it.