Kafka mirror maker
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
reduce coupling between services
But in return it demands each service to be aware of the initial request's business logic.
It acknowledges all incoming requests and delegates operations to the respective services.
E.g. an aggregator
Claim-Check pattern
In a way, the valet key pattern is a claim-check (you do not send the data, you send a token that can be used to get/process the data) The same consideration about event streaming - e.g. a stream with all actions that happen to a blob in a storage can be also kind-of-sort-of considered a claim-check
message size exceeds the data limit of the message bus
Though it will make the code a bit more complex
client with a key or token that the data store can validate
E.g. pre-signed URLs for S3
autoincremented fields can't be coordinated across shards, possibly resulting in items in different shards having the same shard key
It's a completely separate problem, not related to hashing. Any key that is used for hashing can be potentially duplicated. Even UUIDs are not 100% unique, it's just the likelihood of a collision for it that is insanely low. But not impossible. Furthermore, it seems like uniqueness opposes the purpose of logical partitioning - the better uniqueness of the key is, the lower logical meaning of it. E.g. in the triplet merchandise id as code - merchandise id as int - merchandise id as uuid, code is the most meaningful, but the least unique. UUID is opposite - it's unique, but doesn't carry any significant meaning.
similar volume of I/O
This. Not data volume, but I/O
because the partition keys are hashes of the shard keys or data identifiers.
Consistent hashing?
The next figure illustrates storing sequential sets (ranges) of data in shard.
So lookup is still required
When defining a materialized view, maximize its value by adding data items or columns to it based on computation or transformation of existing data items, on values passed in the query, or on combinations of these values when appropriate.
Denormalization?
A materialized view is never updated directly by an application, and so it's a specialized cache.
Would be nice to have some examples, systems that perform the data storage -> materialized view sync
To support efficient querying, a common solution is to generate, in advance, a view that materializes the data in a format suited to the required results set.
'Write-heavy vs read-heavy' balance
Use this pattern to improve query performance when an application frequently needs to retrieve data by using a key other than the primary (or shard) key.
... or consider an indexing engine like Elasticsearch or Solr
The first operation searches the index table to retrieve the primary key, and the second uses the primary key to fetch the data.
... especially if data storage doesn't support any form of join
The overhead of maintaining secondary indexes can be significant
This
Adding a timestamp to every event
Only if clocks are monotonic and guaranteed to be in sync between event source service instances. And even then a clash is possible - depends on timestamp precision.
by avoiding the need to synchronize the data model
How does it avoid the data model (schema) change sync problems?
data update conflicts are more likely because the update operations take place on a single item of data
But in a distributed system the conflicts problem still exists? Unless there's partitioning that guarantees that each item is always processed by the same event source service instance. And even then multiple clients can concurrently apply different changes to the same data?
directly against a data store, which can slow down performance
How it is related? Besides, the performance of the part that is responsible for keeping materialized view of the data is going to have similar performance issues?