- Oct 2024
-
-
And, as a nice side note, you happen to now know what the phrase “fully qualified base64 primitives” in KERIpy means. All that means is that your encoded value has been pre-padded, pre-conversion, and has had its type code added to the front, as we did here with substitution, with the exception that some CESR primitives
BONUS!!!! Thx for pointing this out
-
Along these same lines, SAIDs do not use Base64-style padding because it does not enable separability of individual concatenated
With a SAID being embedded in a Data format (like JSON-ACDC) it does not quite make sense why composable "concatenation" is a desired property. I presume the SAIDs are used in other parts of the stream, where they are concatenated/composed. It just does not make sense in the context of an embedded SAID.
-
TLV encoding formats require the type character to be at the front of the value
This is for a practical purpose. It makes a protocol extensible. i.e. very flexible. Open-closed principle.
-
TLV
From ChatGPT: TLV encoding stands for Type-Length-Value encoding, a format often used in communication protocols to represent data in a structured and efficient way. The format includes three parts:
Type: Identifies the kind of data or the meaning of the value (e.g., a field identifier). Length: Specifies the length of the data field in bytes. Value: Contains the actual data associated with the type. This structure allows flexible, self-describing data communication, where each element can be easily parsed and identified. It is commonly used in protocols like ASN.1 (Abstract Syntax Notation One), EMV (credit card transactions), and many other network or security protocols.
-
In CESR padding is handled a bit differently because it repurposes the pad characters for type codes in its TLV encoding scheme
OK. BINGO. I FOUND A KEY TO UNDERSTAND THIS. So because these digests/hashes are always 32 bytes, we have some padding we can re-purpose.
-
The way that Base64 handles the need for pad bytes is to split the last byte into two characters, add zero bits to the last Base64 character, and then add the correct number of pad ‘=’ equals sign characters to the final output to end up with groups of 4 Base64 characters, which aligns on a 24 bit boundary because 4 * 6 bits per Base64 character = 24 bits.
I feel like I have read this statement or something similar like 5 times in this blog post already....
-
decoder is the same value you put in
Base64 is a two-way encoding scheme
-
CBOR, MessagePack, arbitrary text, or otherwise.
I presume these need to have an ID field/location in the data schema. Like ACDC has the 'd' for the ID. Right?
-
Get an object with some data and a digest field
A digest field 'd' is ACDC specific... Abstractly it is just an ID field or "location" (like for fixed length data). Right?
-
Calculate
All these references to "calculate" make it seem harder than it is... you should just say "lookup" instead of saying "you may look it up". If we are coupled to CESR in this example - just embrace it.
-
each of the five
Then I see 7 below.. and I wonder if this is a typo, but I know there were 5 above. Probably just 7 steps, and I am over thinking this. ;)
-
SAIDs that are encoded in the CESR format. These CESR-style SAIDs
Ok... I guess they ARE coupled in this case. Sounds like it is a design choice for a given SAID. I can get behind that.
-
To understand how SAIDs work
Little confused... this is telling me about how "SAIDs" work... I thought I already learned that... the #1 below seems very CESR related... are SAIDs and CESR tightly coupled? or are they independent concepts? Making an ID with an eye toward how it will be serialized seems... unnecessarily coupled.
-
two-pass
"Two pass" makes more sense then "two step". "Steps" to me are procedural/algorithmic steps. A "pass" indicates "going over" a complete object.
-
(ACDC)
So it is related to ACDC. It is the Data Format (Data Type?) being ID'd
-
rationale behind why CESR encoding is designed the way it is.
Looking forward to this
-
this example:
When I look at this example... I realize this may also be a good starting place for this (or another) blog post. You could start with a JSON object and discuss HOW you could ID the JSON object. GUID, Increment integer, Random, etc... and talk about the pros/cons of those IDs, then talk about a "magical" id that is "perfect" for this object because it has the property that it TRULY reflects the contents of the JSON object.
-
Step By Step
If the code could recognizably match the "steps" above.... that would be better
-
‘d’
This is Neither a SAID thing nor a CESR thing right? In this context - this is an input data thing. 'd' is for an ACDC, right? Are SAIDs also coupled to ACDCs? Perhaps the implication is that whatever data container you use.. in this case ACDC, there will be an ID field/location. This algorithm is used for the specific data container - wherever it has the ID. Right?
-
type code from the CESR Master Code Table
noting another thing that does not make sense at first.. will circle back.
-
You can come back to these examples after reading the post if they don’t make sense to you at first.
YES, THIS IS ABSOLUTELY NEEDED.
-
24 bit boundaries, pad characters, and pad bytes.
Yes, this all sounds like CESR things I have heard. Again... feels very thrust upon me, with no understanding of CESR. I do understand that you are trying to just "get to the point" quickly, so I can/should glaze over some of these unknowns.
-
pre-padding
This pre-padding concepts seems to have been thrust uponme with no warning... What is this?
-
pre-padding
is this the filling of "#"s?
-
How does the generation step work
Which step is the "generation" step? 1 or 2 above?
-
The digest is then calculated, encoded, and placed in the destination field.
ID (digest/hash) is calculated on the normalized data 3. Embed the ID
-
During SAID calculation the destination field of the SAID is filled with pound sign filler characters (“#”) up to the same length of the SAID.
I like to think of this as a way to "normalize the data" or "standardize the data" - Data normalization is a common practice
-
Two
Three steps said simply: 1. Normalize the data (put #'s where there ID will be) 2. Compute ID on the normalized data 3. Embed ID in data
-
What is a content addressable identifier? A content addressable identifier is an identifier derived from the content being stored which makes a useful lookup key in content addressable storage, such as IPFS or a key-value store database like LevelDB, LMDB, Redis, DynamoDB, Couchbase, Memcached, or Cassandra.
An "Address" is a way to find something. or "Search" for something.
-
A note on terminology, sometimes digests are called hashes or hash values. The technical definition of the term hash refers to a hash function. Hash functions transform data into a fixed-size string. This fixed-size string is the digest, the output of a hash function.
Yes. And property of hash functions relative to this discussion, is that that Hash is unique and is effectively and ID
-
is embedded within the data it is a digest of
The "SA" of SAID means the ID is embedded within the data it is ID'ing
-
SAID is a cryptographic digest of a given set of data
a SAID is an ID
-
SHA2-256, SHA3-256, and Blake3-256
These functions all produce a one-way "Hash". The "Hash" for all intents and purposes is an ID: Something UNIQUE that can identity(ID) the given input. Your grandmother would understand this as "Giving something a name". The "something" is the input. The "name" is the output hash.
-
SAID is a cryptographic digest of a given set of data
a SAID is an ID
-
- Sep 2024
-
trustoverip.github.io trustoverip.github.io
-
in addition to value, may include information about the type and, in some cases, the size
Value, Type, and Size of a Cryptographically Verifiable Primitive?
-
type or size
I don't understand why this is important. What is the "Type and Size" of a Cryptographically Verifiable Primitive?
-
One way to better secure Internet communications is to use cryptographically verifiable Primitives and data structures inside Messages and in support of messaging protocols. Cryptographically verifiable Primitives provide essential building blocks for zero-trust computing and networking architectures. Traditionally, Cryptographic Primitives, including but not limited to digests, salts, seeds (private keys), public keys, and digital signatures, have been largely represented in some binary encoding. This limits their usability in domains or protocols that are human-centric or equivalently that only support ASCII text-printable characters RFC20. These domains include source code, documents, system logs, audit logs, legally defensible archives, Ricardian contracts, and human-readable text documents of many types [RFC4627].
Security depends on cryptographically verifiable primitives. Cryptography is native to binary, which makes text-based human readable Protocols (like JSON) awkward.
-
domains or protocols
I see, some "protocols" are Text Only (because Human Readability is baked into the protocol)
-
This limits their usability in domains or protocols that are human-centric or equivalently that only support ASCII text-printable characters RFC20. These domains include source code, documents, system logs, audit logs, legally defensible archives, Ricardian contracts, and human-readable text documents of many types [RFC4627].
Ok, this confirms that "text domain" means Human Readable
-
digests, salts, seeds (private keys), public keys, and digital signatures
Nice List of "Cryptographically Verifiable Primitives"
-
Cryptographically verifiable Primitives
When we say "Cryptographic Primitives" are we saying "Cryptographically Verifiable Primitives"?
-
Primitives and data structures inside Messages
Are these on the same level? "Primitives", "Data Structures", "Messages"
-
cryptographically verifiable Primitives
This implies there are non-cryptographic Primitives. right?
-
text domain and binary domain
Assuming we are talking about cryptographic primitives... is there a difference between text domain cryptography and binary domain cryptography?
-
Primitives
Does this mean cryptographic primitives?
-