Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Favor regular json encoding instead of json-ld/normalize #979

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bplatz
Copy link
Contributor

@bplatz bplatz commented Feb 25, 2025

Another performance boost comes from not json-ld/normalize(ing).

With the other performance gains making stage much faster, the commit process ended up taking longer than stage.

The biggest culprit of time is json-ld/normalize. By moving to a normal json/stringify process, this speeds up commits by more than 2x, and staging actually now takes longer again.

Indexing ended up using json-ld/normalize as well - and it turned out to be the real winner. In my prior large DB tests, the last indexing process took about 30 minutes. With this change, it takes about 5 minutes.

json-ld/normalize will be important for certain type of verifiable credential proofs - but we can implement that if/when we are writing out those proofs and only pay the penalty then.

@bplatz bplatz requested a review from a team February 25, 2025 11:41
@dpetran
Copy link
Contributor

dpetran commented Feb 25, 2025

If we get rid of normalize, I think we give up verifiability - you won't be able to re-hash commits and get the same hash 100% of the time. Is that ok?

@bplatz
Copy link
Contributor Author

bplatz commented Feb 25, 2025

If we get rid of normalize, I think we give up verifiability - you won't be able to re-hash commits and get the same hash 100% of the time. Is that ok?

Is there a need to re-hash commits? So long as you have the bytes (or JSON) the hashes will always be equal.

There are some VC proofs that want verifiability with just a 'set of triples' - in that case you need normalization, but that is the only case that I'm aware of.

@dpetran
Copy link
Contributor

dpetran commented Feb 25, 2025

I thought that the way to verify the contents of a ledger was to re-transact the commit data in order and then check that the commit data hash you generated matches the commit data hash from the original commit.

@bplatz
Copy link
Contributor Author

bplatz commented Feb 25, 2025

I thought that the way to verify the contents of a ledger was to re-transact the commit data in order and then check that the commit data hash you generated matches the commit data hash from the original commit.

You verify the hashes are the same - but json-ld/normalize is not needed to do that. The question is what you hashed... in this case we are hashing the JSON. json-ld/normalize you are hashing a specialized version of the JSON that can be derived from any data structure - and eliminates the reliance on access to the "source". e.g. you could go from the same EDN data structure to the identical hash once serialized to JSON... but that isn't what we use it for.

The way this now works is the same method that JWS, IPFS, or other "verification-based" file hashes work. json-ld/normalize is a whole different level for a use case that we are not currently using. My thought is if/when that use case is needed, pay the price then.

In the meantime, we can get a substantially faster database with zero functionality loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants