1,065 Matching Annotations
  1. Apr 2024
    1. But it does drag in a lot of problems, even outside of the document convergence—peers have to store the document along with, depending on the technique used, its entire history.

      This may not be the case. When peers know they're in sync - they can snapshot and dispose of the ops.

    1. The Awareness CRDT must only apply updates if the received clock is newer / larger than the currently known clock for that client

      Clock, meaning Lamport one? Otherwise, with subjective clocks, I don't see how that works.

    1. This collaborator can then deserialize it back and generate an update which will contain all new changes performed since provided state vector

      Does receiver need to trust the sender that the provided update does indeed contain all the missing ops?

    1. Moreover, the ordering is effectively determined by the effects of the operations when they are generated

      I don't see how that makes ops "admissible". Simply by reordering we won't get preservation of intent.

      E.g.,

      "abbcd"

      peer1 and peer2 in parallel delete the second duplicate b.

      No matter how we order them, we have:

      [delete(3), delete(3)]

      Intent of peers: "abcd"

      Playback of log: 1. "abcd" 2. "abd"

      Doesn't look like we preserved the intent - delete(position) is not admissible, in general case. It's context-dependent. No matter the order.

    2. Consequently, the total order becomes application specific

      Well, it's not 'total order' anymore then, is it?

      More like a subjective operations log. See Chronofold.

    3. The invocation of every operation is admissible in its execution state

      Then we can't have delete(position) operation. As it context-dependent.

      I.e., ops need to be self-contained. delete(position) becomes delete(char-id) - whoa, now we have CRDTs.

    4. The above CSM model requires that a total order of all objects in the system be specified

      I don't see that.

      What CSM does, as it seems to me, is to specify/extend the "Intention preservation" of CCI.

      Moreover, total order has nothing to do with "intention preservation". Ops that been issued in parallel can be put in total order - yet it does not give convergence of intent.

      E.g.,

      "abbcd"

      And peer1 wants to change c to C.

      Whereas peer2 wants to remove the second b.

      [delete(3), delete(4), add(4, char: "a")]

      Playing the ops: 1. "abcd" 2. "abc" 3. "abcC"

      Intention of c -> C is not preserve, despite having total order.

    5. Second, it was defined as operation context-based pre- and post- transformation conditions for generic OT functions

      This solution seems to aim to preserve intention (same result of an op in a different context) as trying to capture some host context and try to adopt op+some_context to the target context.

      Which seems rather unnecessary if you would transfer op+host_context as is, the receiving party will be able to adopt it.

    1. It should be obvious by now that causal histories work but are not very compact

      Also, while a causal history includes every event, it's by a non-hash-based id. So no actual content is available.

      Using hashes to refer to content would allow to be context-free. Any replica would know the stuff they're talking about.

      Not compact, yes. Just two hashes are enough, in fact. ;)

    1. to work withother linearizations we have to build their respec-tive chronofolds.

      Is this possible? I thought no.

      It doesn't seem that RCT captures what's in subjective logs.

      Version vectors would seem to allow it. But I think we don't have them in RCT.

    2. Thatcosts O(N ) and from that perspective we mightbe tempted to store that mapping in a hash map

      Alternativey, have a vec for each author (yourself included).

      So there would be a vec of author y, where unber k index is index in a's log.

      Althought having no RCT to start with would be better.

    3. Next, it has to findthe op’s position in the weave and relink the linkedlist to include the new op at that position

      Seems this is not on a hot path. Can be done in parallel, without blocking user's input. Once links are established, it can be atomicly applied. This will block.

    4. CRDT overheads and complexity are rooted inthe use of per-character ids and the need to mapthem to positions in the weave and the text

      Isn't it already a delta weave?

    5. The top issue with a weave isthat it needs to be spliced on every edit (i.e. rewrit-ten in full), very much like a plain string

      Here weave refers to interleaved deltas (SCCS weave).

      And splice refers to insert.

      Meaning, that in order to add delta, we'd need to insert it in some position of a file. Which would lead to moving characters around. Hence the comparison with a plain string.

      How this works is described here, in RCS, that seems to be a successor of interleaved deltas (SCCS) approach.

    6. Atimestamp is an ordered pair t = ⟨n, α⟩ where α isthe author also denoted by auth(t), and n is theauthor’s index also denoted by andx(t).

      Resembles pointstamp of timely dataflow(?).

    1. For example, we can use it to inform observers of this model that its state has changed. Let's add that to our example, by calling cx.notify().

      Huh. Would be nicer if that model implemented some sort of Watchable trait. Akin to Clojure's atoms. So others can subscribe to, and get notified on change automatically. So we can't shoot ourselves in the foot by forgetting to do that manually.

    1. If the subsequent frame contains the same text-font pair, the shaped glyphs get reused.

      Perhaps we could reuse even larger repeating patterns, such as words, since in programming we tend to operate with such. (programming language reserved words, such as if when match, and defs a user creates, such as myCoolVar.

      Though they could be split across multiple lines, as myCool- Var, although perhaps that is not a desired feature for an editor to do.

    2. A modern display's refresh rate ranges from 60 to 120 frames per second

      As of now, there are displayes with 240Hz. (4.1(6)ms/frame)

      But aiming to get smooth 120fps 4K is a good minimum.

      However, for a text editor, being able to run 240Hz seems to be a resonable demand.

    1. but we may eventually introduce the ability to undo operations of collaborators

      Or, rather, a client could 'reject' some contributions. I.e., be in power to accept contribution of others.

    2. A shared global stack of edit operations isn't enough

      Having these ops contain author, one could undo their own ops. And I bet authorship is preserved.

      The thing is that shared global stack is not useful to be shared, rather some stack can derived on each client in order to play.

      But then some things could be played it paralell, So instead we're better of with some sort of AST. It could be derived out of global stack, but then we could construct an AST in the first place, having ops to come with their deps, as we have with CRDTs.

      That is better, given that cost of AST is less than the cost of playing sequentially. I.e., cost of AST (increased size and complexity) worth the gainst we get from parallel play.

  2. Mar 2024
    1. Tasks are the top-level futures that have been submitted to an executor.

      From what I understand, Tasks are not the top-level futures, but rather any Future that's being executed.

      So there may be: async { async { async { }.await }.await }.await Where each async block is a Future, that's been wrapped by a Task on .await, and will drive to completion, be it returning right away, or hanging and calling .wake() on it's parent when it's done.

    1. OS threads don't require any changes to the programming model, which makes it very easy to express concurrency. However, synchronizing between threads can be difficult, and the performance overhead is large. Thread pools can mitigate some of these costs, but not enough to support massive IO-bound workloads.

      ^

    1. .iter().map

      Why do we need iter before map?

      Couldn't map do the iter, if needed?

      That could even be done at compile time, to reduce run-time overhead, giving users a more comfy/familiar map interface.

  3. Feb 2024
  4. Jan 2024
    1. If a pure function mutates some local data in order to produce an immutable return value, is that ok? It’s an interesting question. Clojure data structures use mutation every time you call, e.g. assoc, creating one or more arrays and mutating them, before returning them for immutable use thereafter. The reason is performance - you simply can’t get as fast using only pure functions and immutable data. Once constructed and shared however, being immutable and persistent is essential to robust programs. The things Clojure mutates internally are small, newly allocated arrays that constitute the internal nodes of its data structures. No one ever sees the arrays.

      Functional interfaces are enough for FP. Inside you can mutate, noone will know, it does not affect in=>out result. And that's what FP cares about.

    1. Once round r is settled, thefuture rounds will reshuffle, and the calculations for round r + 1 famous witnesseswill be done using the new stake record.

      Change to stake record that happened by concluding round r will not affect round r events, they will use the previous stake record. Events that are r+1 will use the new one.

  5. Dec 2023
    1. When wave 𝑤 completes (Line 34), we use the global perfectcoin to retrospectively elect some process and consider its vertexin the wave’s first round as the leader of wave 𝑤 (Line 35).

      Can we use random selection of a concluding vertex based on hashes of possible concluding witnesses?

      We can sprinkle it with more state known to all, e.g., round number.

    2. In a nutshell, the idea is to interpret the DAG as a wave-by-wave protocol and try to commit a randomly chosen single leadervertex in every wave.

      How much txes will not be received in a round?

      This is an interesting approach. It seems that in each round received txes will be more than in a received round of Hashgraph, by receivedTxesByEachMember difference intersection(receivedTxesByEachMember).

    3. Our validity property requiresthat all messages broadcast by correct processes are eventually or-dered (with probability 1), whereas most Byzantine SMR protocols(i.g., [ 17, 37 , 43 ] require that in an infinite run, an infinite number ofdecisions are made, but some proposals by correct processes can beignored.

      BAB that forms a DAG retains conflicting txes by ordering them.

    4. optimal qua-dratic communication

      That is a not lowest there is for a single YES/NO answer.

      Fast path of ConsensusOnDemand requires O(n), it's only used when many peers approve a tx as not conflicting with others. And this is often the case for cryptocurrency, where conflict occurs only when txes on the same account are issued in parallel, which is rare.

      Hashgraph's YES/NO on famous witnesses, which totally orders many txes, is O(n * log n).

    1. set of each event z such that z isa self -ancestor of a round r unique famouswitness , and x is an ancestor of z but notof the self -parent of z

      Many self-ancestors may be ancestors of z. We'd need to take the earliest one.

    2. (d), one event (orange) is an ancestorof each of 4 intermediate events by different creators (red)

      Red event of the rightmost member is not an ancestor of the orange event. Rather, orange event is the ancestor to itself, and so can see itself. So orange event should be also a red event.

    3. Amuch faster approach is to pick just a few events (vertices in the hashgraph),to be called witnesses, and define a witness to be famous if the hashgraphshows that most members received it fairly soon after it was created

      Why is it faster? What's the defenition of "faster"?

    4. If Alice and Bob both have the same hashgraph, then theycan calculate a total order on the events according to any deterministic function ofthat hashgraph, and they will both get the same answer. Therefore, consensus isachieved, even without sending vote messages

      ^

    5. When Bob sends an event to Carol, that sequence number can be calculated byCarol, so there is no need for Bob to actually send it over the internet

      I.e., akin to logical clock.

      Also, if timestamps are required, they can be compressed as delta milliseconds from the previous timestamp.

    6. Only the state itself needs to bekept. It might be possible to do this every few minutes, so there will never be ahuge number of transactions and events stored. Only the consensus state itself. Ofcourse, a member is free to preserve the old events, transactions, and states, perhapsfor archive or audit purposes. But the system is still immutable and secure, evenif everyone discards that old information.

      These old events do not contribute towards processing of the new ones. So they are not required to be kept in "RAM" so to say, across all the participants. This history can be kept safe somewhere in cold storage.

    7. To defend against attackers whocan control the internet, there are periodic coin rounds where witnesses can votepseudorandomly. This means that even if an attacker can control all the messagesgoing over the internet to keep the votes carefully split, there is still a chancethat the community will randomly cross the 2n/3 threshold. And so agreement iseventually reached, with probability one.

      Pseudo-random voting occurs periodically to mess with adversaries, given they have control over communication layer.

    8. The protocol might re-quire that Alice send those events in topological order, so Bob will always have anevent’s parents before receiving the event

      Or send them all at once.

    1. Issuing a Transaction: The client c1 composes and signs a transaction.Client c1 sends the transaction to the validators. If the inputs of the trans-action are not part of genesis, the client also sends the confirmations of theinputs to the validators.

      Due to validators not talking to each other, may it be that they end up in different states? As client may share to a part of them, some 2/3.

      Clients do send inputs though. So I guess that's the way the history of txes gets communicated.

      Perhaps that's the whole point. Validators approve based on the inputs, given they weren't spent.

    1. Finally, the high communication complexity of the underlying protocols, i. e., consensus of shards, handling cross-shard transactions, and bootstrapping of participants to shards, can make the bandwidth become the bottleneck.

      Bootstrapping new participants may be a bandwidth bottleneck, as they need to download previous history.

      Perhaps compactien/snapshots to the resque.

    2. Furthermore, we show that against a somewhat adaptive adversary, sharding protocolsmust periodically compact the state updates (e. g., via checkpoints [26], cryptographic accumulators [7], zero-knowledgeproofs [6, 31] or other techniques [12, 19–21]); otherwise, the necessary bandwidth resources of each participantincrease infinitely in time, eventually exceeding the resources needed in a non-sharded blockchain

      Compacting history is necessary, as it's ever-growing, and given it grows faster than storage of peers it'll eventually reach the size > than available storage.

    3. Additionally, we show that scalable sharding isimpossible if participants maintain (and verify) information on all other shards with which they execute cross-shardtransactions. This happens because the storage requirements become proportional to that of a non-sharded protocol.

      If, in order to process a cross-shard tx, a shard needs to verify other shard's txes, then storage does not scale, as it needs to download txes. (and perhaps compute does not scale, as it needs to verify them)

    1. To this end,Bob’s representative replica collects and aggregates CREDITmessages for the same incoming payment into a dependencycertificate for Bob’s xlog.

      Akin to aggregating delta of a tx as commits in corresponding subject repos.

    2. Astro guarantees total order within—but not across—xlogs

      xlog is an index of txes of a sender by tx idx.

      There can be another index by beneficiary.

      And the total of an account is the sum across the two.

      I guess the representative would need to aggregate the two.

      This resembles an AST of txes. Where tx's parents are other txes it depends on. E.g., sending money would depend on all current heads: last spent money tx and all novel receive money txes.

    3. Essentially, preventing Alice from double-spending meanspreventing her from reusing sequence numbers. To do so, thebroadcast layer in Astro provides Byzantine resilience. Thisensures that a malicious client cannot broadcast two differentpayments with the same sequence number.

      Wouldn't broadcast layer become a peer that keeps all passed through it txes, filtering the ones with duplicate idx?

      And all communication's required to go through that peer.

      Getting us back to centralized architecture.

      There can be a set of such peers, but then they'd need to have consensus on what txes they seen.

    1. While that prohibits applications thatneed a unique total ordering on conflicting transactions, likecryptocurrencies where double spending must be prevented

      Vegvisir's consensus is suitable for CRDTs, however it's not suitable for conflicting txes.

    1. The gossip-about-gossip method allows for the history ofall the communications within an infrastructure to be reconstructed

      More like acknowledgements about acknowledgements. Sent/received isn't captured.

    2. None of these methods, though, includesdynamic membership, and this is a key differentiator in this paper

      That's not true. The Hashgraph paper describes that there can be dynamic membership, by the exact same technique described in this paper - nodes vote for member addition.

  6. Nov 2023
    1. is proved in section 5 that if x and y are ondifferent branches of an illegal fork, then w can strongly see at most one of x andy, but not both.

      But a solution of ignoring events issued by peers that created a fork means that w can strongly see x but having learned of the fork y it would ignore x. So, in presence of this solution, we can't be sure on finality of "strongly seen". Or, I guess, the solution can be applied only to seen, but not strongly seen ones. So strongly seen events stay final.

    2. If there are n members (n > 1), then an event w can strongly see an event x,if w can see more than 2n/3 events by different members, each of which can seex.

      I.e., x is "strongly seen" when it's seen by super-majority.

    3. In other words, w can see x if x is known to it, and no forksby that creator are known to it

      I.e., forks indicate that a peer is missbehaving. A strategy is to ignore all his events.

      Also peers that communicate with a malicious peer are endangered, so "excercise caution".

    4. If Aliceis honest, she will calculate virtual votes for the virtual Bob that are honest. Evenif the real Bob is a cheater, he cannot attack Alice by making the virtual Bob voteincorrectly.

      I.e., peers can't attack by sending incorrect vote(localState) results.

    5. The otherancestor events (gray) are not stored in the red event, but theyare determined by all the hashes. The other self-ancestors (darkgray) are those reachable by sequences of self-parent links, and theothers (light gray) are not.

      So they are a part of DAG, addressed by hashes, right?

    6. Most Byzantine fault tolerance protocols without aleader depend on members sending each other votes. So for n members to agreeon a single YES/NO question might require O(n2) voting messages to be sent overthe network, as every member tells every other member their vote. Some of theseprotocols require receipts on votes sent to everyone, making them O(n3). And theymay require multiple rounds of voting, which further increases the number of votingmessages sent.

      Naive leaderless BFT is damm costly on communication.

    7. It is sufficient to tell Carol that this event is the next one by Alice, and that itsother-parent is the third one by Bob. With appropriate compression, this can besent in very few bytes, adding only a few percent to the size of the message beingsent.

      I.e., events can be uniquely identified by peer+event index. So they can be compressed to [int, int].

    8. Alice will not send Carol the red event until Carol alreadyhas all its earlier ancestors

      Nice. Interesting, how communication history can be used to learn what a peer knows. It is a stale view, and the peer may learned some more on top. But at least it spares the need to re-transmit already known values.

    9. Instead of Alice signing a transaction she creates, she will sign theevent she creates to contain that transaction. Either way, she is only sending onesignature

      Having reified communication that is signed spares the need for signing the stuff that's being communicated.

    10. Virtual voting - every member has a copy of the hashgraph, so Alice cancalculate what vote Bob would have sent her, if they had been runninga traditional Byzantine agreement protocol that involved sending votes.So Bob doesn’t need to actually her the vote.

      Vote is a function of local state. So instead of computing it, you can send local state and let others compute it for themselves. This makes it resilient to malicious vote(localState) execution, where adversary could send intentianally wrong result.

    11. Fairness - it should be difficult for a small group of attackers to unfairlyinfluence the order of transactions that is chosen as the consensus.

      Fairness is stronger than Censorship Resistance. It's more like "Manipulation Resistance".

    12. An algorithm for a single YES/NO decision can then be extended to decidinga total order on a set of transactions, which may further increase the vote traffic.

      Umm, if we settled on partial order of txes - we have total order, no?

    13. A proof-of-expired-timesystem [9] can avoid the wasted electricity (though perhaps not the cost of miningrigs) by using trusted hardware chips that delay for long periods, as if they weredoing proof-of-work computations.

      Wow, that's a thing. Hindering performance at the HARDWARE level. This just takes it up a notch.

      It's like limiting max typing speed on a type writter.

    14. It also means that itis necessary to slow down how fast blocks are mined, so that the community canjointly prune branches faster than new branches sprout. That is the purpose of theproof-of-work. By requiring that the miners solve difficult computation problemsto mine a block, it can ensure that the entire network will have sufficiently longdelays between mining events, on average

      I.e., PoW as a performance reduction to cater for the upstream problem.

      Reminds reasoning behind the design of the QUERTY layout.

      Instead of solving the upstream problem, e.g., by having convergence of branches, we bolt on a patch atop, hindering performance so branches can be dropped.

    1. In the permissioned setting there is the following inter-esting line of research. The aim is to construct an atomicbroadcast protocol based on a combined encoding of thedata transmission history and voting on “leader blocks”.Such protocols allow the network participants to reach aconsensus on a total ordering of the received transactions,and this linearised output forms the ledger.

      Atomic broadcast is the technique to broadcast all observed communication and voting in order to reach consensus.

    2. The main difference of our proposal to all the aforemen-tioned protocols is that consensus is found on the heaviestDAG without the need for a “linearisation” using any leaderselection. This reduces the purposes of blocks to data trans-mission and voting

      Dope.

    3. These protocolsremove the bottleneck of data dissemination of the classicalNakamoto consensus by decoupling the data disseminationfrom the consensus finding

      By having a DAG of txes and a DAG of votes we have reduced data transmission. As DAG of txes is transmitted once, and so only DAG of votes needs to be extensively communicated with peers. And technique such as Bloom filters will make it pretty efficient.

    4. Blockchain-based protocols relyon a chain or “linearisation” of blocks that create a totalorder of transactions. These blocks have three purposes:leader election, data transmission and voting for ancestorblocks through the chain structure, see [34]. Each of theseaspects can be, individually or combined, addressed througha DAG-based component. The various proposals differ inhow and which of these components are replaced by a DAG.

      Usually blockchain blocks take on responsibility of: 1. leader election 2. data transmission 3. voting

      These can be decoupled and catered for in DAG-based fashion.

    5. Upon receipt of the missing parent block,the past cone is now complete (unless their parents aremissing - in which case the node has to repeat this procedurerecursively).

      Or use Bloom filter for fetching all missing blocks at once.

    6. Improvements are proposed,for example, in Hashgraph [41] and Aleph [42] and morerecently in Narwhal [43] based on the encoding of the “com-munication history” in the form of a DAG

      Have acks as first-class citizen of a DAG to reach consensus.

    1. Probabilistic. The units appended to the system are alwaysaccompanied by the risk of being reversed. The probability isinversely proportional to its cumulative confidence (includ-ing the forms of depth/score/weight etc). The confidence isobtained from the contributions of subsequent units.- Deterministic. The units, once attached to the system atsome point in time, will instantly and permanently get con-firmed so that they can never be removed from the system.

      Consensus strategies can be classified into probabilistic and determenistic.

    2. Secondly, building smartcontracts or state transition applications rely on linear order

      Perhaps order can be achieved ad-hoc, as required.

      E.g., there are SIM-commuting txes addToBalance(Alice, 1B) and addToBalance(Bob, 1B). They are accepted without ordering. Later on there appeared smartcontract "give a billion to first billionare", which is not SIM-commuting with the previous txes. So order of 1st and 2nd tx needs to be decided upon. So we can delay ordering for when such tx3 occurs. And order determenistically, based on the baked-into blockchain rules. E.g., select one one pseudo-randomly based on previous hashes.

    3. Each partici-pated node maintains a separate chain, and nodes mutually interactvia the gossip protocol. The node locally creates an event to recordthe history of received information

      Each node in Hashgraph maintains a log of observed txes.

    4. Graphchain, instead, introduces an incentivemechanism to maintain the graph. Transactions are required topost transaction fees as the offering for collection. That means eachtransaction must refer to a group of ancestor transactions to de-plete their deposit (fees) when verifying the validity. Meanwhile,ancestor transactions must have enough fees for collection. Feesare depleted by incoming transactions from the oldest ancestorsin paths to reach a prescribed total amount. High-fee transactionsattract powerful miners working in parallel for rapid confirmation,while low-fee transactions will finally be picked up by a small miner.Based on this design, Graphchain makes the current transactionsquickly become enshrined in the ancestry of all future transactions.

      Graphchain requires tx to come with feels, that are depleted by acks that refer to it.

    5. Specifically,the extension of the tangle follows the rule3 where one tip (pend-ing/unapproved transaction) is required to approve two ancestortransactions. Thus, users who issue a transaction will contributeto the security of the system.

      Acks on a tx make it contribute to security.

    6. several systems,both tie-breaking and conflict solving are integrated intothe extension rule. The consensus mechanism, by this way,reaches the agreement in one-step

      Priority and tie-breaker as guidance for consensus.

    Annotators

    1. The most telling example fromSection 3.2 is the use of tombstoning for low-latency deletion. Inpractice, memory for tombstoned items must be reclaimed, so even-tually all machines need to agree to delete some items.

      Embracing temporality of data as opposed to deletion we eliminate the need for deletion.

      Marking an entry with a "valid to" timestapm would make it valueable for future use, as in the future one can run queries as of some time and get the valid results of that time.

      Not all info is practilally useful to keep around forever though, e.g., removal of a cart item. Although usefulness is use-case-dependent, and there may be future use-cases for that info. Even more, eventually there will be a use-case. E.g., shop can use removal as insight on shopping patterns. So if it's cheap to store all the data there is - let's store it.

    2. A related pattern in storage systems is the use of tombstones: specialdata values that mark a data item as deleted. Instead of explicitlyallowing deletion (a non-monotonic construct), tombstones maskedimmutable values with corresponding immutable tombstone val-ues.

      Logical deletion is a way to make non-monotonic deletion monotonic.

    3. CALM provides the formal framework forthe widespread intuition that we can indeed “work around CAP”in many cases, even if we violate traditional systems-level notionsof storage consistency.

      CAP is about storage consistency.

      CALM is about result eventual consistency.

    4. This is the conclusion of the Dynamo design.In later work [18 ], we go further to make checkout monotonic inthis setting as well. : the checkout operation is enhanced with amanifest from the client of all its update message IDs that precededthe checkout message: replicas can delay processing of the checkoutmessage until they have processed all updates in the manifest

      I.e., By having in checkout baked in exact cart we can assure monotonicity. This exact cart will be checked out, with no regard to any outside adds and removes of items.

    1. A keyinvariant of these coherence protocols is thateither 1) a cache line is not present in anycache, 2) a mutable copyis present in a single cache,or 3) the line is present in anynumber of caches but is immutable

      That looks like the design of Rust.

    2. At the core of this dissertation’s approach is this scalable commutativityrule: In anysituation where several operations commute—meaning there’s no wayto distinguish theirexecution order using the interface—theyhave a implementation that is conflict-free duringthoseoperations—meaning no corewrites acachelinethat was reador written byanother core.Empirically, conflict-free operations scale, so this implementation scales. Or, more concisely,whenever interface operations commute, theycan be implemented in a waythat scales.

      I.e., Operations commute if there's no way to distinguish order between them using the interface.

    1. Importantly, the external transaction determines valid modifiers of each state.

      Managing users and what they can do, as well as effects of their actions, invariants that should hold, can be achieved by specifying it in yet another system-level tx.

    2. It isinflexible for a node to join or leave a workflow at any time since the config-uration of channels is fixed.

      This isn't true. Channel configuration, that includes MSP (Membership Service Provider), a definiiton of members, can be updated via special tx. The result will be a new block, solely containing the configuration change tx.

      So it's possible to change members of a channel, but it seems to be costly, and only authorities can do that.

      So rather, it's impractical for a node to join and leave a channel, since it would require coordination with the authorities of that channel.

      UPD: in this paper they refer to 2016 verison, whereas I refered to 2018 version.

    3. Moreover, interactions between channels rely on atrusted channel [3] or an atomic commit protocol [2], which either breaks thedecentralization or still treat the system as an entirety.

      Some solutions to cross-shard tx include: - trusted channel, breaking decentralization - atomic commit, involving all the participants of cross-sharding tx

    Annotators

    1. The configuration of a channel may be updated using achannel configuration update transaction. This transactioncontains a representation of the changes to be made to theconfiguration, as well as a set of signatures.

      In Fabric it's possible to update configuration of a channel by issuing a special tx.

    1. The relative ordering of aborted trans-actions is kept the same. As a result, every transaction cancommit eventually, since the first transaction in a batch al-ways commits and the position of an aborted transactionalways moves forward across batches

      Aria eventually commits txes.

    1. However, as the orderingbetween transactions is determined by consensus and fixedacross all nodes, we leverage this to permit both transactionsto write to different copies of the object, but subsequentlycommit only the first as determined by the ordering.

      Wasted effort in ordering and execution of a to-fail tx. Moreover, all the info info needed to execute it properly atop the other tx is there!

    1. Hot Key Theorem. Let l be the average time between atransaction’s execution and its state transition commitment.Then the average effective throughput for all transactionsoperating on the same key is at most 1l .

      .

    2. Compatibility with external oracles: To allow the useof external oracles in the deterministic second executionphase, we gather and save oracle inputs in the pre-orderexecution step.

      I.e., external oracles are read deps as of pre-order execution step.

    3. A dependency exists when two transactionsoverlap in some keys of their RW sets (read-only transactionsdo not even enter the orderer). In that case, we cannot processthem independently. Therefore, we need to keep track ofdependencies between transactions so we know which subsetsof transactions can be processed concurrently

      Note: RW of dependee transaction overlaps with W of dependent transactions. As clarified later in the paper.

    4. This requires theuse of a deterministic contract language, such as Ethereum’sSolidity, which must be learned by the application developercommunity

      Or any other determenistic compute. Such as Docker, or Nix, Guix, IPVM.

    1. With this optimization, TAPIR can now accept: (1) readsof past versions, as long as the read timestamp precedes thewrite timestamp of the next version, and (2) writes in thepast (per the Thomas Write Rule [45]), as long as the writetimestamp follows the read timestamp of the previous versionand precedes the write timestamp of the next version.

      I.e., write in the past, preserving serializability.

    1. A peer checks if a sufficient number of valid endorsingpeer signatures, based on the endorsement policy, have been col-lected (Validation System Chaincode (VSCC) validation)

      Checks that are tx-based could be performed by the endorsing peers in order to fail-fast. They would not even need to execute tx if it's syntactically invalid.

    2. Table 1: Notations for sets

      Very nice to give legend. Spares the need to search text for definition of an abbreviation.

      Perhaps even nicer would be to have wikipedia-like modals on hover.

    3. Optionally, the clientmay check the validity of the endorsing peers signatures and theconsistency between the read/write set received from differentpeers.

      I guess this check is required in order to fail-fast. And fail may be due to: 1) read-set inconsistency between endorsement peers (some peer has stale db) 2) tx is not deterministic (if it's a smartcontract - it's a bad smartcontract, if it's a plain tx issued by peer - who cares, as long as issuer is happy to tx one version of the results)

      In 1) case, client been let down by that stale endorserer. He could pick endorsments that have the latest read-set and send them over, since including stale endorsment would result in failure of his tx. Alternatively, what about not fail tx with stale endorsements, but pick the ones that are latest (optionally checking that latest endorsments count > some threshold, to be sure in faithful compute of the results).

      It makes sense, no? Those are valid results as of some state of db. Pick the latest one. And if you can't commit them due to a conflicting tx that happened prior - re-run atop it.

      The main idea is - don't fail, it's a bad UX. Let the tx eventually commute. Given the tx is well-formed (determenistic).

      The issuer may not be happy for a tx to be ru-run. In this case let him state that explicitly. If he's fine with it - eventually commit.

      The issuer may wish for tx to commit within a certain timeframe, say 1 hour or until he explicitly aborts. Again, he can be empowered to express that.

    4. The client collects the endorsing peers responses andsends them to the ordering service nodes

      Delegating execution to endorsing peers allows to: 1) Spare the need for clients to be compute-able 2) Gives trust that tx been executed faithfully (if client to entrusted with executed - it can provide false results. Execution by trusted peers would be required to ensure trustful result of a tx. These peers would not necesserily be the system-owned-peers, they can be the stakeholders of blockchain. But here we're back to having endorsment peers, just of a different kind).

      Also, is there harm in commiting "wrong" results that's been provided by the issuer? This tx won't be a smartcontract, but it's kinda ain't already, as tx is being issued by some peer, instead of being globally registered.

    1. First, wedeliberately do not send transaction read-write sets to theorderers, only transaction IDs. Second, we chose to keep toFabric’s design goal of allocating different tasks to differenttypes of nodes, so our orderers do not examine the contents ofthe read and write sets

      Good call. Let it do its thing.

      Conflicts can be identified by Fast Peer. And istead of dropping the rest of conflicting txes, it could delay their (re-execution) to the next block, in FIFO manner, one tx per block.

    2. Arguably, the use of BFT protocols in permissionedblockchains is not as important as in permissionless sys-tems because all participants are known and incentivizedto keep the system running in an honest manner.

      I.e., stakeholders are incentivized to maintain consensus, and BFT is less valuable, as they more inclined to well-behave.

    3. (a) to achieve con-sensus on the transaction order

      Why do we need tx order if txes that are conflicting will be rejected by the commiter? (except for XOX and FabricCRDT) Couldn't we drop them now?

      Or rather, couldn't orderers order the conflicting txes and process them one-by-one, FIFO? So users don't need to resubmit, their tx will get processed at some time. Let user supply process time he's fine to wait for.

    4. Transactions follow an execute-order-commit flowpattern instead of the common order-execute-commit pattern.Client transactions are first executed in a sandbox to determinetheir read-write sets, i.e., the set of keys read by and writtento by the transaction. Transactions are then ordered by anordering service, and finally validated and committed to theblockchain

      In Fabric, Execution gives insight into read-write sets.

    1. Furthermore, it doesn’t say that every reorderinghas indistinguishable results on a given implementation;it requires instead that every reordering is allowed bythe specification to have indistinguishable results.

      I.e., an implementation may accidentally compomise SIM-commutativity of its interface.

    1. But as noted above, a partial order like happens-before is iso-morphic to a join semi-lattice! That means that we can capture theCmRDT log as a CvRDT. One simple way to do this is to hold theDAG corresponding to the partial order: this is simply a “grow-only”set—whose only operation is add—which holds edges of a DAG.Each directed edge connects two operations, and indicates that theoperations happened in the order of the edge. (Another solutionwould be to use CvRDTs for vector clocks.) A natural query oversuch a CvRDT is to “play the log”: i.e. produce a topological sort ofthe log and apply the operations in the resulting order.

      CvRDT can be used to capture causal DAG of ops (CmCRDT), as used by ipfs-log.

    1. Instead of explicitly allowing deletion (a non-monotonic construct), tombstones mask immutable values with corresponding immutable tombstone values.

      Logical deletion is monotonic.

    2. Unfortunately, neither the add nor delete operation commutes with checkout—if a checkout message arrives before some insertions into either the Added or Deleted sets, those insertions will be lost.

      checkout is meant to be performed on some state of the cart. When it's issued the exact state is known. So checkout can refer to it. Making it concrete / context-free.

    3. As a traditional approach to avoid such "race conditions," we might bracket every non-monotonic delete operation with a global coordination to agree on which add requests come before it. Can we do better?

      Alternatively, DEL can refer to the exact ADD. As shown in SU-Set paper.

    1. However, this powerful abstractioncomes with a hefty cost: concurrent transactions must coordinate inorder to prevent read/write conflicts that could compromise equiv-alence to a serial execution.

      ACID requires coordination of parallel transactions that intersect by their by their write-sets.

      Using CRDT as result + I-confluence rules allow them to be executed in parallel.

      However, when when a tx intersect by its read-set with write-set of another, its result may be affected. Such txes may be I-confluent, but application-level correctness is not guaranteed.

    2. we only consider a singleset of invariants per database rather than per-operation invariants

      I.e., this paper uses invariants on results, not invariants on txes themself.

    3. we explic-itly require more information from the application in the form ofinvariants (Kung and Papadimitriou [38] suggest this is informationis required for general-purpose non-serializable yet safe execution.)

      I.e., invariants that are I-confluent are required to know what can be safely executed coordination-free.

    4. However, we are not aware of related work that dis-cusses when coordination is strictly required to enforce a given set ofinvariants

      I.e., when result is not I-confluent and requires coordination has not been studied, this paper is the first.

    5. or safety (e.g., the CRDT OR-Set does not—by itself—enforce invariants, such as ensuring that no employee belongs totwo departments), and are therefore not sufficient to guarantee cor-rectness for all applications

      I.e., CRDTs allow results to converge. However, converged state is not guaranteed to be I-invariant.

    6. This means that, inour payroll example, we can provide coordination-free support forconcurrent salary increments but not concurrent salary decrements

      That's not true. Given an agreed-upon invariant (> or <) users are able to perform inc or dec. So I is an agreed upon contract that allows for some ops to be performed coordination-free. Moreover, this contract may be agreed upon to change.

      So, I is a consensual contstraint that allows for coordination-free execution of some ops.

      Even more, given we know collaborating parties. Say Alice and Bob are family members that have a shared balance. They may agree that account balance should not reach negative number. Then in order to be able to make coordination-free withdrawals within a given day they can agree that they won't withdraw more than 1/2 of money in that day. So they are safe to withdraw less that 1/2 of money within that day. At the end of the day they may coordinate to determine what's left in the account, And so are able to follow the same strategy the next day.

      I.e., parties can agree to share resource and act coordination-free on their share. They may coordinate in order to reconsile shares as they see fit.

      E.g., in the payroll example, managers that are in charge for salary adjustment would share an employee salary and would be able to adjust it to not less than their share.

      Share holder could be malicious and issue parallel ops that are I-valid but are not I-confluent. E.g., manager could issue two decrements that would tank employee salaree beyound the agreed upon threshold. So there should be consensus within the share/shard.

    7. To avoid variants of these anomalies, many optimistic, coordination-free database designs have proposed the use of abstract data types(ADTs), providing merge functions for a variety of uses such ascounters, sets, and maps [19, 44, 50, 58] that ensure that all updatesare reflected in final database state.

      How ADTs are different from op-based/delta-based CRDTs?

      Both capture deltas as datatype building block. Both converge to a snapshot.

    8. However, the reliance on invariants also has drawbacks. I-confluence analysis only guards against violations of any providedinvariants. If invariants are incorrectly or incompletely specified, anI-confluent database system may violate application-level correct-ness.

      I.e., make sure that use specify I-variants correctly, otherwise they won't give you correctness of coordination-free execution.

    9. If two indepen-dent decisions to commit can result in invalid converged state, thenreplicas must coordinate in order to ensure that only one of the deci-sions is to commit.

      I.e., I-valid?(merge(tx1, tx2) = false is not I-confluent, even if I-valid?(tx1) && I-valid?(tx2) = true.

      And in such cases coordination is required to determine which tx to adopt first, and whether to adopt the second one on top.

    10. Theorem 1 establishes I-confluence as a necessary and sufficientcondition for invariant-preserving, coordination-free execution

      I.e., I-confluence is required to reach a convergent state that is I-valid in coordination-free manner.

    11. A system is globally I-valid iff all replicas alwayscontain I-valid state

      Meaning that I-valid state is preserved by tx results and by merge of tx results - tx results are I-confluent.

      I.e., I-valid?(tx1) && I-valid?(tx2) && I-valid(merge(tx1, tx2)).

    12. A system is convergent iff, for each pair of servers,in the absence of new writes to the servers and in the absence ofindefinite communication delays between the servers, the serverseventually contain the same versions for any item they both store

      I.e., strong eventual consistency of tx results is required for convergence.

    13. On a wide-area network, delay is lower-bounded bythe speed of light (worst-case on Earth, around 75ms, or about 13operations per second [9]). Under network partitions [13], as delaytends towards infinity, these penalties lead to unavailability [9, 28]

      I.e., coordination in presense of network partition leads unavailability.

      Using quorums to acknowledge accepted txes seems to mitigate this. Given 50% of acks is required and tx issuer is able to propagate his tx to 50% of quorum members.

    14. The classic answer to maintain-ing application-level invariants is to use serializable isolation: ex-ecute each user’s ordered sequence of operations, or transactions,such that the end result is equivalent to some sequential execu-tion [15, 30, 53].

      I.e., serializability is a property of tx execution such that their result is the same as their sequential execution.

      I.e., serialized?(someTxExecutionScheme, txes) => someTxExecutionScheme(txes) = serialTxExecutionScheme(txes)

    15. Given knowledge of application transactions and correctness crite-ria (e.g., invariants), it is often possible to avoid this coordination(by executing some transactions without coordination, thus pro-viding availability, low latency, and excellent scalability) whilestill preserving those correctness criteria.3. Invariant confluence offers a necessary and sufficient conditionfor this correctness-preserving, coordination-free execution.

      I.e., invariant preservation as a way to guarantee correctnes. It's damm efficient, but not all txes will be invariant-confluent.

    16. Serializable transactions preserve application correctness at thecost of always coordinating between conflicting reads and writes.

      I.e., serialization of txes as generic way to guarantee correctness. Albeit most costly way.

    17. In this paper, we address the central question inherent in this trade-off: when is coordination strictly necessary to maintain application-level consistency? To do so, we enlist the aid of application pro-grammers to specify their correctness criteria in the form of invari-ants.

      I.e., invariants as a way to specify database validity criteria, which grants knowledge about which txes are safe to run in parallel and which need to run on top of each other.

    1. Use cases that requiretransactional isolation of repeatable reads [ 5] are not a good fit,as FabricCRDT commits transactions even if their read-set is out-dated.

      I.e., having tx results as CRDT allows to merge them. But this is only possible if one tx does not change read deps of the other tx. Otherwise the result of the other tx would be ivladit, and would be invalid to merge it. It would need to be re-run on top. E.g., for cases where double-spend can occur.

    2. According to our findings, FabricCRDTsuccessfully merges all conflicting transactions without any failureswhen all transactions use CRDTs.

      Note: when all txes use CRDTs. When there is a non-CRDT tx it may fail.

    3. CRDT Transactions in a Block

      Non-CRDT tx is akin to a snapshot. Multiple snapshots for the same cannot exist within a block if you don't have rules on how to merge them (approach described in XOX paper for post-order execution).

      CRDT ops can be applied on top of a snapshot.

      snapshot cannot be applied on top of CRDT.

      So a block with non-CRDT and CRDT ops seems to only be fully valid if it has snapshot first and then CRDT deltas on top.

    4. Once a transaction fails, the only option for clients is to create anew transaction and resubmit, which adds to the complexity ofFabric application development

      As shown by XOX, it's possible to retain conflicting transactions and perform them on top of detected "happened before" ones.

    1. where there exists a happened-before relationVoteRegister1

      Malicious voter may issue votes with no Vote1 "happened-before" Vote2. If we have 2 of 4 endorsment reuqirement, then he can send Vote1 to two orgs and Vote2 to the other two. Both sets of orgs will endorse the vote.

      Seems it may be tackled by imposing 3 of 4 as endorsement threshold.

      Or, in general case, > 1/2 orgs.

      Given orgs may be malicious (faulty), we can take it into an account by setting fault threshold. Say to 1/4.

      Then endorsement threshold will be: ((1/2 + 1/4) +1) * number of orgs.

      Or, in general case, endorsement theroshld = ((1/2 + fault threshold) + 1) * number of orgs

    1. Rather than moving forward by a fixed number of predecessors in each round trip, this algorithm allows some commits to be skipped, so that it reaches the common predecessors faster.

      Binary search as an alternative

    1. Instead of resolvingbranches, it permits them, resulting in a Directed AcyclicGraph (DAG) structure of the blockchain rather than a linearone.

      Seems it makes possible to double-spend.

    1. In this chapter, we address theorthogonal problem of automatically suggesting an efficient set of MV for a given workloadof SPARQL queries.

      Can we avoid the problem of selecting MVs by treating each query as AST, preserving it's results? And so a set of queries would form a forest of ASTs, where shared nodes would only be computed once.

      Nodes become MVs.


      Interestingly, a node may be only partially computed.

      E.g., if there is a join node AB that joins A and B nodes, then A could be fully computed, and B could be computed only to find matching results for AB pivot.

      Although here we mix logical representation of node A, node B, and a pivot node AB, pivot(A, B), with it's physical representation of join(A, B) and it's optimisation join(A, B(A)), where B(A) is B based on results of A, which resembles embedding A and AB, rather than A, B, AB.

      These are different ASTs. B based on A not= B.


      All in all, out of logical queries physical ASTs can be derived that are optimized for those queries.


      Perhaps having embeddings as joins is optimal. E.g., AB = join(A, B(A)) ABC = join(AB, C(AB))

    2. Getting the complete set of solutions of q would require to query the residual pattern(a?,p4,?e) over the RDF dataset and join the results with the partial results of i1 andi2.

      Residial patterns could have indexes created for them on-the-fly as overlapping embeddings. Making use of precomputed solutions of those embeddings.

      So during query processing we don't seek matches for all residual patternss and join afterwards, dropping some of the matches. Rather we query for matches based on embedding matches, so we don't get "to be dropped" matches at all.

      This is an approach on "how to efficiently compute join of embeddings". And it's about having a "join embedding" that joins based on most-restrcitive embedding first. I.e., that embedding that has less matches or, if embeddings haven't been computed yet - that embedding that is expected to be most restrictive (e.g., because it has most restrictive BGB).

    3. We focus on covers with overlapping embeddings since they allow better estimations ofthe cost savings that can be achieved with them (see Section 2.3).

      I.e., covers with overlapping embeddings allow better cost saving estimation than covers with non-overlapping embeddings.

    4. Query

      This query could have its own index. Making use of (c) and (d) to maintain it's results by joining their solutions.

      I.e., there can be an index per query. Query patterns would be broken down to embeddings for the most efficient compute of queries.

      I.e., query as an AST of embeddings.

    5. However, for the rest of this thesis we only considerintensional overlaps to avoid the costly pre-computation and maintenance of extensionaloverlaps

      An extensional overlap is the one not made intensional.

      Extensional overlaps can be made intensional overlaps.

      And maintained in the same way.

    6. Let G be a data graph, I the set of indexes on G, andQ a query against G. We call an index I ∈ I eligible for Q iff P(I) v P(Q). The set ofall eligible indexes for query Q is denoted by IQ.

      The notation is meh.

      Alternative:

      G - graph

      P - pattern

      S - solution

      I - index

      EI - eligible index

      EIs = ei(P, I) - eligible indexes I for P

    7. itis not possible to map variables ?y and z from i2 to any variable of q

      ?c would contain all ?y. Since ?y is a more restrictive pattern than ?c.

      I.e., it may be possible to adopt some of the ?y matches for ?c.

    8. During index creation, indexes are computed andpersistently materialized and the results are made available for the query processor. Inour approach, indexes are defined offline, i.e., created by an administrator before queriesare executed.

      Alternative approach is to create indexes at arbitrary time, e.g., after query execution.

      Then, during query execution available indexes are used. And the result can always be indexed as one needs.

      This would speed up query execution. As there is no cost of index creation and maintenance at this stage.

    9. The frequency of I in G, written as #G(P), is thenumber of occurrences of P in G.

      |O| would be more uniform in style, akin to |P|.

      Or O(P(Q), G), |P(Q)| - size of query, |O(P(Q), G)| - size of solutions.

      Or, in polish notation, (O (P Q) G).

    10. The query pattern must be divided into that part that is executed byusing materialized results and the rest of the query patterns that are not possible or notbeneficial to execute using them

      Perhaps would be nice to have that as optimization at the query engine level?

    11. The MV–selection problem can be viewed as a combinatorial optimization problemsince its solution has to be chosen from among a finite number of possible configurations.Therefore, explicit and complete enumeration and scoring of all possible MV subsets is,in principle, a possible way to solve it. This method is however impractical in most cases,since the computational effort grows exponentially with the number of candidate MV[25].

      Enumirating all possible MV combinations in order to find the best one found impractical.

    1. In order to generate globally unique operation identifierswithout requiring synchronous coordination between repli-cas we use Lamport timestamps [42].

      Ehh. Isn't author+deps enough?

    2. “done”: true

      Ehh, why?

      This node has been marked as deleted and has"done" true field. Why is this field leaked out of the deleted node?

      Data Laced with history gets it right by having data structure as AST. Where you'd have a node with this field, and that has been marked as deleted, resulting in node and all its fields "deleted".

    Annotators

  7. agregore.mauve.moe agregore.mauve.moe
    1. Agregore works by letting you load web pages and content from peer to peer protocols the same way you would load them from a website.

      I.e., use Hypercore to maintain a personal website. Use Aggregore as browser.

    1. Hypercore is a secure, distributed append-only log built for sharing large datasets and streams of real-time data. It comes with a secure transport protocol, making it easy to build fast and scalable peer-to-peer applications.

      Hypercore bundles CRDT with p2p.

      It is akin to OrbitDB's db, if only it'd have one writter.

      OrbitDB is a toolkit, where you can assemble your CRDT p2p solution. Making use of components: - ipfs-log as base for your CRDT - libp2p as base for your p2p - auth to manage readers and writters

      Whereas Hypercore seems to bundle it all up.

  8. dat-ecosystem-archive.github.io dat-ecosystem-archive.github.io
    1. Multi-writer will allow Dats to be modified by multiple devices and multiple authors at the same time. Each author will have their own secret key and publish a Dat with their data in it. Multi-writer fuses all of these separate Dats into one “meta-Dat” that is a view of everyone’s data combined
    1. multi-causal operations will prevent the structured log from being linearly interpretable and may severely affect performance, simplicity, and intent

      Welp, what can we do when an op does express intent that depends on multiple others? E.g., transferm money from Bob to Alice

    2. This won’t be convergent by default since the operations don’t have an inherent total order, but it’s easy to fix this by giving each one a globally-unique ID in the form of an owner UUID 3 plus a Lamport timestamp. (In the diagrams below, this is encoded as “SX@TY”, where “SX” represents the UUID for site X, and Y is just the timestamp.) With this scheme, no two operations can have the same ID: operations from the same owner will have different timestamps, while operations from different owners will have different UUIDs

      Welp, would be nice to have author inside the operation.

      Lamport timestamp doesn't seem to add any new information. It's an idx of op, which can be derived from the longest chain of its parent if needed.

      Wall clock is useful. It's akin to "decision time".

      Wall clock + author would be enough for uniqueness of operptons. And linking can be done by hash.

    3. Nonetheless, I made one additional tweak to ensure that remapping only happens very rarely. Instead of storing just the UUID in the site map, I also store the wall clock time at which the UUID was added. In the site map, these tuples are sorted first by time, then by UUID. Assuming that modern connected devices tend to have relatively accurate clocks (but not relying on this fact for correctness), we can ensure that new sites almost always get appended to the end of the ordered array and thus avoid shifting any of the existing UUIDs out of their previous spots. The only exception is when multiple sites happen to be added concurrently or when the wall clock on a site is significantly off.

      Perhaps an alternative way is to have a tx log on each devices - which will only be accreted with newly observed ops. Then it's order never changes and we can use idx as identifiers that stay evergreen.

    4. I’ve solved this with the help of a secondary CRDT that’s stored and transferred along with the CT: an ordered, insert-only array of known UUIDs called the site map. The 16-bit site identifier corresponding to a UUID is simply its index in the array.

      Reminds of the techinque WasmTree uses, where an internal representation of RDF nodes replace URLs with an integer ID and has a map to resolve URLs back.

      Except that here we use idx of a log entry in place of its UUID. Which requires to update mapping on merge of new ops. Whereas by having integer ids in entries, in place of UUIDs, we would be spared from that. Con is that there is a storage cost to store those integer ids. Which is lower than cost of UUIDS, but more than of that idx approach.

    1. We see further interesting problems around types, schemamigrations, and compatibility. Different collaborators may beusing different versions of an application, potentially withdifferent features. As there is no central database server,there is no authoritative “current” schema for the data. Howcan we write software so that varying application versionscan safely interoperate, even as data formats evolve?

      Events to the rescue?

    2. CRDTs do not require a peer-to-peer networking layer;using a server for communication is fine for CRDTs. However,to fully realize the longevity goal of local-first software, wewant applications to outlive any backend services managedby their vendors, so a decentralized solution is the logicalend goal.

      Should not bake a server into a lofi software as a mean to sync. It needs to outlive the server.

    3. URLs are a good mechanism for sharing

      URLs as collaboration sites been found handy.

      They could be treated as "compilation sites" on a subject.

      Where each user could have their own compiled view.

      I.e., they are IDs of subjects. Note: ID not= content. Content is user-specific. A personal compilation / view on it.

    4. However, in all the prototypes we devel-oped, we found that the default merge semantics to be suffi-cient, and we have so far not identified any case requiringcustomised semantics

      The mentioned "one user sets state on a task to one state and another to another" happened. As mentioned, it's not clear how to resolve. So they did encounter merge conflicts. I guess these conflicts are not app-specific and conflict resolution can be implemented in a generic way as well. Such as present users with choice which delta to adopt.