20 Matching Annotations
  1. Mar 2024
  2. Mar 2022
    1. Claim-Check pattern

      In a way, the valet key pattern is a claim-check (you do not send the data, you send a token that can be used to get/process the data) The same consideration about event streaming - e.g. a stream with all actions that happen to a blob in a storage can be also kind-of-sort-of considered a claim-check

    1. autoincremented fields can't be coordinated across shards, possibly resulting in items in different shards having the same shard key

      It's a completely separate problem, not related to hashing. Any key that is used for hashing can be potentially duplicated. Even UUIDs are not 100% unique, it's just the likelihood of a collision for it that is insanely low. But not impossible. Furthermore, it seems like uniqueness opposes the purpose of logical partitioning - the better uniqueness of the key is, the lower logical meaning of it. E.g. in the triplet merchandise id as code - merchandise id as int - merchandise id as uuid, code is the most meaningful, but the least unique. UUID is opposite - it's unique, but doesn't carry any significant meaning.

    1. When defining a materialized view, maximize its value by adding data items or columns to it based on computation or transformation of existing data items, on values passed in the query, or on combinations of these values when appropriate.

      Denormalization?

    2. A materialized view is never updated directly by an application, and so it's a specialized cache.

      Would be nice to have some examples, systems that perform the data storage -> materialized view sync

    3. To support efficient querying, a common solution is to generate, in advance, a view that materializes the data in a format suited to the required results set.

      'Write-heavy vs read-heavy' balance

    1. Use this pattern to improve query performance when an application frequently needs to retrieve data by using a key other than the primary (or shard) key.

      ... or consider an indexing engine like Elasticsearch or Solr

    2. The first operation searches the index table to retrieve the primary key, and the second uses the primary key to fetch the data.

      ... especially if data storage doesn't support any form of join

    1. Adding a timestamp to every event

      Only if clocks are monotonic and guaranteed to be in sync between event source service instances. And even then a clash is possible - depends on timestamp precision.

    2. data update conflicts are more likely because the update operations take place on a single item of data

      But in a distributed system the conflicts problem still exists? Unless there's partitioning that guarantees that each item is always processed by the same event source service instance. And even then multiple clients can concurrently apply different changes to the same data?

    3. directly against a data store, which can slow down performance

      How it is related? Besides, the performance of the part that is responsible for keeping materialized view of the data is going to have similar performance issues?