Favor regular json encoding instead of json-ld/normalize #979

bplatz · 2025-02-25T11:41:57Z

Another performance boost comes from not json-ld/normalize(ing).

With the other performance gains making stage much faster, the commit process ended up taking longer than stage.

The biggest culprit of time is json-ld/normalize. By moving to a normal json/stringify process, this speeds up commits by more than 2x, and staging actually now takes longer again.

Indexing ended up using json-ld/normalize as well - and it turned out to be the real winner. In my prior large DB tests, the last indexing process took about 30 minutes. With this change, it takes about 5 minutes.

json-ld/normalize will be important for certain type of verifiable credential proofs - but we can implement that if/when we are writing out those proofs and only pay the penalty then.

dpetran · 2025-02-25T13:15:57Z

If we get rid of normalize, I think we give up verifiability - you won't be able to re-hash commits and get the same hash 100% of the time. Is that ok?

bplatz · 2025-02-25T13:27:40Z

If we get rid of normalize, I think we give up verifiability - you won't be able to re-hash commits and get the same hash 100% of the time. Is that ok?

Is there a need to re-hash commits? So long as you have the bytes (or JSON) the hashes will always be equal.

There are some VC proofs that want verifiability with just a 'set of triples' - in that case you need normalization, but that is the only case that I'm aware of.

dpetran · 2025-02-25T14:37:03Z

I thought that the way to verify the contents of a ledger was to re-transact the commit data in order and then check that the commit data hash you generated matches the commit data hash from the original commit.

bplatz · 2025-02-25T14:56:15Z

I thought that the way to verify the contents of a ledger was to re-transact the commit data in order and then check that the commit data hash you generated matches the commit data hash from the original commit.

You verify the hashes are the same - but json-ld/normalize is not needed to do that. The question is what you hashed... in this case we are hashing the JSON. json-ld/normalize you are hashing a specialized version of the JSON that can be derived from any data structure - and eliminates the reliance on access to the "source". e.g. you could go from the same EDN data structure to the identical hash once serialized to JSON... but that isn't what we use it for.

The way this now works is the same method that JWS, IPFS, or other "verification-based" file hashes work. json-ld/normalize is a whole different level for a use case that we are not currently using. My thought is if/when that use case is needed, pay the price then.

In the meantime, we can get a substantially faster database with zero functionality loss.

Favor regular json encoding instead of json-ld/normalize

b53ed38

bplatz requested a review from a team February 25, 2025 11:41

Adjust hashes in tests

c1cad59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Favor regular json encoding instead of json-ld/normalize #979

Favor regular json encoding instead of json-ld/normalize #979

bplatz commented Feb 25, 2025 •

edited

Loading

dpetran commented Feb 25, 2025

bplatz commented Feb 25, 2025

dpetran commented Feb 25, 2025

bplatz commented Feb 25, 2025 •

edited

Loading

Favor regular json encoding instead of json-ld/normalize #979

Are you sure you want to change the base?

Favor regular json encoding instead of json-ld/normalize #979

Conversation

bplatz commented Feb 25, 2025 • edited Loading

dpetran commented Feb 25, 2025

bplatz commented Feb 25, 2025

dpetran commented Feb 25, 2025

bplatz commented Feb 25, 2025 • edited Loading

bplatz commented Feb 25, 2025 •

edited

Loading

bplatz commented Feb 25, 2025 •

edited

Loading