Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: support transcode to/from JSON, and to/from CBOR using different options #441

Open
extemporalgenome opened this issue Nov 13, 2023 · 0 comments

Comments

@extemporalgenome
Copy link

extemporalgenome commented Nov 13, 2023

Is your feature request related to a problem? Please describe.

Given CBOR's use of the JSON data model as a starting point, it is often desirable to receive data in JSON and output semantically identical data to CBOR, or vice versa.

Because this package has selective method-signature compatibility with encoding/json, transcoding support would be a critical piece to CBOR adoption in some applications.

Notably, decoding JSON into any, and then re-encoding that any to CBOR is undesirable because:

  1. It's typically rather expensive. Consider a message publisher receiving JSON and broadcasting equivalent CBOR to save subscribers decode cost: the added cost of any-based transcoding in the publisher may outweigh the sum of savings awarded to the subscribers, thus yielding a net loss in total efficiency.
  2. Compressed CBOR isn't necessarily small enough compared to compressed JSON to yield wins when compression is reasonable, and compressed JSON is typically smaller than uncompressed equivalent CBOR, when the data is of non-trivial size. The cost of compressing JSON may well be cheaper than the cost of any-transcoding to CBOR without compression.
  3. The conversion through any may be lossy (decode JSON number to float64), or non-lossy-but-incompatible without extra work (decode to json.Number, which gets CBOR encoded as a string).

Describe the solution you'd like

Either within this package, or as a subpackage, functionality which:

  1. Iteratively tokenizes JSON and produces a []byte (or writes to an io.Writer) with equivalent CBOR data. The JSON decode would probably internally use (*json.Decoder).Token.
  2. Equivalent CBOR to JSON transcode functionality.
  3. CBOR-to-CBOR re-encode, i.e. switch from indefinite length to definite length, different timestamp formats, modes, etc.
  4. Ideally, an option to control JSON timestamp detection (i.e. if a string decodes as an RFC-3339 timestamp, encode it as a CBOR timestamp).
  5. Ideally, an option to control Indefinite vs Definite length value encoding. Indefinite length would certainly use less memory and encode faster due to not needing lookahead/buffering. There are tricks that could be used with two-pass decoding or maintaining look-back indices (i.e. allocate all arrays/maps/(byte)strings as if they need heuristic n-byte lengths, then re-encode the initial byte(s) alongside a memcpy to shift data bytes left or right once actual size is known).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant