Using () to delimit objects breaks auto-url-detectors #16

wmertens · 2017-03-28T16:13:26Z

if you embed a jsurl object result in a url as the last component, you get something like http://example.com/foo?q=~(a~'test), and if you paste that somewhere, there's a good chance that the url up but not including the final ) is recognized.

One option is adding a final ~, that fixes it?

The text was updated successfully, but these errors were encountered:

bjouhier · 2017-03-28T19:45:11Z

I could implement this in a v2 but the problem is that a string produced by a v2 will fail to parse with a v1 parser. So far I have resisted making changes because I did not want to break protocols that use jsurl.

wmertens · 2017-03-28T21:42:26Z

Well, I respect that, but you can always call the encoding jsurl2 and make it clear there is no compatibility except in spirit… My usage so far was to encode data for consumption by the same application, and I would guess that that is the major use case…

…

On Tue, Mar 28, 2017, 9:45 PM Bruno Jouhier ***@***.***> wrote: I could implement this in a v2 but the problem is that a string produced by a v2 will fail to parse with a v1 parser. So far I have resisted making changes because I did not want to break protocols that use jsurl. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlsJTDVput2un6AQV7qnquSYnravtks5rqWNHgaJpZM4Mr2qb> .

bjouhier · 2017-03-29T09:03:25Z

Our situation is different because our app has several components that interact with jsurl and it is more difficult to move them all at once (especially as our components are deployed on-premise). So we need to preserve interop.

But I'm not opposed to fixing the issues with a v2. We should solve all the pending issues at once (encoded quote and trailing ~) so that we don't have to move again later.

wmertens · 2017-03-29T12:20:43Z

So, how about changing the initial character for jsurl2? That way, you can parse ~ starting strings as v1 and = (or whatever) as v2 For the (), I realized that as you descend into a JS value, there are only a few possibilities, so if you drop some robustness, you can use any valid character to delimit blocks. Furthermore, while parsing the inside of a block, you only need 2 characters: one to stop the block and one to go deeper. Normally these are ) and (, but they could also change on every level. So you could delimit the first block with / (will be part of the url even at end) and then alternating with | and / (for example): =/name~"John*20Doe~age~42~children~|~"Mary~"Bill|/ In fact, at each split point of the JSON structures at http://www.json.org/, you can use a different set of encoding characters. The example could also be e.g. =/name~John*_Doe~age~42~children*Mary~Bill~*/, or even =/!0~John*_Doe~!1~42~!2*Mary~!3~*/ (with pre-shared dictionary): - /, | and * start objects/arrays depending on level (rotate the set on every level, note that * is not needed for escaping here) - " or any a-zA-Z start a string. " is only needed if a string does not start with alpha - -, 0-9 and . start a number, so a decimal can be .5 - !/, !|, !* can be true, false and null. That leaves lots of address space in ! to refer to a pre-shared dictionary. Keys starting with ! could also refer to that dictionary. - inside properties and strings, *_ encodes a space. all of *x is available if *XX requires uppercase. E.g. *! *~ */ *| ** That should make for shorter encodings that still are fairly readable. For robustness, a short fixed-size checksum could be added to the end, e.g. 2 characters taking the sum of all character values plus the string length, module 64^2, base64 (is that url safe?)

…

On Wed, Mar 29, 2017 at 11:03 AM Bruno Jouhier ***@***.***> wrote: Our situation is different because our app has several components that interact with jsurl and it is more difficult to move them all at once (especially as our components are deployed on-premise). So we need to preserve interop. But I'm not opposed to fixing the issues with a v2. We should solve all the pending issues at once (encoded quote and trailing ~) so that we don't have to move again later. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlntQzv1dlE7GFNp-1MZUfAttNwUEks5rqh5egaJpZM4Mr2qb> .

bjouhier · 2017-03-29T17:05:24Z

I was thinking about less invasive changes. I would like to keep the parentheses. If we add a ~ at the end, do we still have a problem with parentheses?

I want the encoded string to be unaltered by encodeURIComponent (this was a strong requirement for v1). This limits the character set to ascii alpha + ascii digits + - _ . ! ~ * ' ( ) (uriUnescaped in https://www.ecma-international.org/ecma-262/5.1/#sec-15.1.3) and I would restrict even further, and eliminate '. This rules out characters like = / |.

So I'm proposing the following changes:

add a ~ as the end, to keep the auto-url-detectors happy. This trailing char can also be used to distinguish between v1 and v2
replace ' by !, to avoid browser encoding.
maybe a few special * escapes. I like *_ for space, maybe *- for $ (frequent in object keys because it is valid in js identifiers) but I would not go much further because gain is small and result quickly becomes cryptic.

wmertens · 2017-03-31T12:48:11Z

~ at the end is good, but then ~ at the beginning is no longer needed. I thought some more about it, and I think we can encode using only the unreserved characters of section https://www.ietf.org/rfc/rfc3986.txt, so ALPHA / DIGIT / "-" / "." / "_" / "~". Here are the rules: - all values terminate with ~ - true, false, null become -T~, -F~, -N~ - numbers start with - (+ digit) or a digit and end with ~ - strings start with alpha or * (the only extra non-unreserved character we use) and terminate with ~ - strings internally get space replaced by _ (common and very readable), * by **, _ by *_, ~ by *-, % by *. and any others we like - I don't think we need *XX and *XXXX encoding, that will be done by uriencoding whenever actually needed. Lots of common characters can be replaced by *+single char - Empty string is *~ - objects start with _, arrays start with ., both terminate with ~. - object keys are encoded as strings, so no starting * needed, only * escaping is done - [1, 2] becomes .1~2~~ - {"a": "fo%o", "_test": "_hm*h~m", "5": [1, true]} becomes _a~fo*.o~*_test~**_hm**h*-m~5~.1~-T~~~ This way, the ending ~ doubles as the value terminator. Any value can be extracted by reading until the next ~. As a bonus, no value starts with ~ so that can distinguish v1 * is not actually 100% needed if we want to stay pure, . or - could serve as the escape characters with some adjustments

…

On Wed, Mar 29, 2017 at 7:06 PM Bruno Jouhier ***@***.***> wrote: I was thinking about less invasive changes. I would like to keep the parentheses. If we add a ~ at the end, do we still have a problem with parentheses? I want the encoded string to be unaltered by encodeURIComponent (this was a *strong* requirement for v1). This limits the character set to ascii alpha + ascii digits + - _ . ! ~ * ' ( ) (*uriUnescaped* in https://www.ecma-international.org/ecma-262/5.1/#sec-15.1.3) and I would restrict even further, and eliminate '. This rules out characters like = / |. So I'm proposing the following changes: - add a ~ as the end, to keep the auto-url-detectors happy. This trailing char can also be used to distinguish between v1 and v2 - replace ' by !, to avoid browser encoding. - maybe a few special * escapes. I like *_ for space, maybe *- for $ (frequent in object keys because it is valid in js identifiers) but I would not go much further because gain is small and result quickly becomes cryptic. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlv0OZHpG1oEDMPbA0FwfhGH7rn6tks5rqo9WgaJpZM4Mr2qb> .

wmertens · 2017-03-31T15:51:33Z

one more optimization: change repeating final ~ to a single ~, and to grab a value search until ~ or end of string. Then the standard example becomes _name~John_Doe~age~42~children~.Mary~Bill~

bjouhier · 2017-04-01T14:12:23Z

Lots of good ideas here but I want to understand why you want to get rid of parentheses. Lots of URLs have parentheses, and parentheses are a good visual clue for nested substructures.

wmertens · 2017-04-01T14:16:48Z

They are not guaranteed to be left alone, and by making ~ the terminator for everything, parsing is faster…

…

On Sat, Apr 1, 2017, 4:12 PM Bruno Jouhier ***@***.***> wrote: Lots of good ideas here but I want to understand why you want to get rid of parentheses. Lots of URLs have parentheses, and parentheses are a good visual clue for nested substructures. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlh0yy53qDfj6Lct9XHbR767G8cNzks5rrltIgaJpZM4Mr2qb> .

wmertens · 2017-04-01T14:20:08Z

(plus auto-url detection works better with ~, and we save a few bytes at the end of the string by merging ˜s)

…

On Sat, Apr 1, 2017, 4:16 PM Wout Mertens ***@***.***> wrote: They are not guaranteed to be left alone, and by making ~ the terminator for everything, parsing is faster… On Sat, Apr 1, 2017, 4:12 PM Bruno Jouhier ***@***.***> wrote: Lots of good ideas here but I want to understand why you want to get rid of parentheses. Lots of URLs have parentheses, and parentheses are a good visual clue for nested substructures. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlh0yy53qDfj6Lct9XHbR767G8cNzks5rrltIgaJpZM4Mr2qb> .

bjouhier · 2017-04-01T14:41:25Z

More detailed comments:

all values terminate with ~ OK
true, false, null become -T~, -F~, -N~ OK
numbers start with - (+ digit) or a digit and end with ~ OK
strings start with alpha or * (the only extra non-unreserved character
we use) and terminate with ~ OK
- strings internally get space replaced by _ (common and very
  readable), * by **, _ by *_, ~ by *-, % by *. and any others we like OK for space - others need discussion
- I don't think we need *XX and *XXXX encoding, that will be done by
  uriencoding whenever actually needed. Lots of common characters can be
  replaced by *+single char KO - jsurl shouldn't rely on a uriencoding pass
- Empty string is *~ OK - clever
objects start with _, arrays start with ., both terminate with ~. I'd like to keep parens, at least around objects
- object keys are encoded as strings, so no starting * needed, only * escaping is done OK
  - [1, 2] becomes .1~2~~ **
  - {"a": "fo%o", "_test": "_hm*h~m", "5": [1, true]} becomes
    _a~fo*.o~*_test~**_hm**h*-m~5~.1~-T~~~

bjouhier · 2017-04-01T14:43:21Z

When would parentheses get escaped? They are uriUnescaped (but ' was too) and I have never seen them being escaped.

bjouhier · 2017-04-01T14:48:31Z

There is a problem with strings starting with a number. How do you encode "0"?

wmertens · 2017-04-01T15:01:38Z

Well another reason for not using () is that you then need an extra char to start an array and I wanted to minimize byte length. Plus, they are part of the "reserved" set, and most of those get encoded anyway. (so is * but replacing that with - or _ would make things uglier) "0" becomes *0~.

…

On Sat, Apr 1, 2017, 4:48 PM Bruno Jouhier ***@***.***> wrote: There is a problem with strings starting with a number. How do you encode "0"? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWllL2z9kL0Vj8Ies2deBiGWaMRXfeks5rrmO_gaJpZM4Mr2qb> .

bjouhier · 2017-04-01T15:05:48Z

We could keep ! too. Then I'd rather do the following:

true, false, null become T~, F~, N~ (shorter, and leading - felt strange).
strings start with !.

*0~ feels like a hack. What about "20"? It cannot be *20~ as this would be space. Is it *2*0~? Will be bad for us because we are passing decimal values as strings to avoid precision pb with js numbers.

bjouhier · 2017-04-01T15:12:43Z

Parentheses are not uriReserved, they are uriUnescaped.

wmertens · 2017-04-01T15:23:51Z

So the code works by the fact that at the beginning of a value there are only a number of possible characters. All cases are in the if clauses as https://github.com/wmertens/jsurl/blob/4ffcdea624eb29070bd6c44510e438b46799e986/lib/jsurl2.js#L71 - I tried to optimize for stringified length. So strings only start with * (or ! if they are not unambiguously strings.

Parentheses are in section 2.2 "Reserved Characters" https://tools.ietf.org/html/rfc3986#section-2.2 - although wikipedia says that means they can be used. I must say, if I paste ! $ & ' ( ) * + , ; = in the URL bar in Chrome, only ' gets escaped, and behind a # none get escaped.

How about starting objects with ( but still terminating with ~?

wmertens · 2017-04-01T15:29:51Z

I must say, I really like the _ for space, it makes embedded spaces easy to read.

As for the URI encoding, I was reasoning thusly:

you have no control over URI encoding, and if it happens anyway, why not let the fast native functions do it? It can recover from it in any case.
If you let native handle it, then embedded unicode is readable in the address bar
It frees up escaped address space for other purposes; I'd rather escape common encoded chars in 2 chars instead of 3.

wmertens · 2017-04-01T15:42:13Z

Oh and *20~ is "20". If we do our own encoding still it would be **20~. * is only escape inside string values.

…

On Sat, Apr 1, 2017, 5:12 PM Bruno Jouhier ***@***.***> wrote: Parentheses are not *uriReserved*, they are *uriUnescaped*. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlh03ghzP7TnCu66qZ0S2SnXF4gJNks5rrmlrgaJpZM4Mr2qb> .

bjouhier · 2017-04-01T15:42:29Z

And we could omit the leading ! for object keys if the key starts with alpha.

wmertens · 2017-04-01T15:46:15Z

That already happens, object keys are string context so they don't need a string marker…

…

On Sat, Apr 1, 2017, 5:42 PM Bruno Jouhier ***@***.***> wrote: And we could omit the leading ! for object keys if the key starts with alpha. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWllLGuAyiRZd7VsS4e62CKOl0EhMpks5rrnBlgaJpZM4Mr2qb> .

bjouhier · 2017-04-01T15:55:44Z

Point taken about generic URL RFC. I was referring to the specs for JS URL handling functions: https://www.ecma-international.org/ecma-262/5.1/#sec-15.1.3. I care most about the JS functions because that what's JS guys use to encode/decode.

I like _ for embedded space too.

OK for leaving non-ASCII chars as is instead of encoding with **. More compact and more readable.

I'd like to have the closing parenthesis at the end of objects too. The whole point is to trade a bit of compactness (one extra char at the end - wtf) for readability. Without it, it is very difficult to see where the object ends.

I had misunderstood the leading * in strings. I thought that it was the start of an escape sequence.

What about prefixing T, F and N by ! instead of -? I find the ("- followed by digit" vs. "- followed by letter" rule a bit too hacky).

bjouhier · 2017-04-01T16:07:07Z

Note: with this, a non empty object looks like (<...>~)~ and a non empty array like .<...>~~. So we have an unambiguous end marker for objects ()~) and arrays (~~).

And then we could use _T, _F and _N because _ is not reserved for object start any more.

wmertens · 2017-04-01T16:26:26Z

Right, and actually you can drop ~ before ), if strings cannot contain ). Then ) is unambiguous and the initial parse split can split on ~ or ). So then there is no byte cost, and the string end can replace all ) and ~ with a single ~ still. Actually I like !T etc, it doesn't read a

…

On Sat, Apr 1, 2017, 6:07 PM Bruno Jouhier ***@***.***> wrote: Note: with this, a non empty object looks like (<...>~)~ and a non empty array like .<...>~~. So we have an unambiguous end marker for objects ()~) and arrays (~~). And then we could use _T, _F and _N because _ is not reserved for object start any more. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlo-4PfLy3CngN564Gs43PKK_bR7Wks5rrnYrgaJpZM4Mr2qb> .

bjouhier · 2017-04-01T16:26:33Z

Summary of revised proposal:

all values terminate with ~
true, false, null become _T~, _F~, _N~
numbers start with - (+ digit) or a digit and end with ~
strings start with alpha or * (the only extra non-unreserved character
we use) and terminate with ~
- strings internally get space replaced by _ (common and very
  readable), * by **, _ by *_, ~ by *-, % by *..
- I don't think we need *XX and *XXXX encoding, that will be done by
  uriencoding whenever actually needed.
- Empty string is *~
objects start with ( and end with )~
arrays start with ., and end with ~
object keys are encoded as strings, so no starting * needed, only * escaping is done
- [1, 2] becomes .1~2~~
- {"a": "fo%o", "_test": "_hm*h~m", "5": [1, true]} becomes
(a~fo*.o~*_test~**_hm**h*-m~5~.1~_T~~)~

wmertens · 2017-04-01T16:26:46Z

...as a string.

…

On Sat, Apr 1, 2017, 6:26 PM Wout Mertens ***@***.***> wrote: Right, and actually you can drop ~ before ), if strings cannot contain ). Then ) is unambiguous and the initial parse split can split on ~ or ). So then there is no byte cost, and the string end can replace all ) and ~ with a single ~ still. Actually I like !T etc, it doesn't read a On Sat, Apr 1, 2017, 6:07 PM Bruno Jouhier ***@***.***> wrote: Note: with this, a non empty object looks like (<...>~)~ and a non empty array like .<...>~~. So we have an unambiguous end marker for objects ()~) and arrays (~~). And then we could use _T, _F and _N because _ is not reserved for object start any more. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlo-4PfLy3CngN564Gs43PKK_bR7Wks5rrnYrgaJpZM4Mr2qb> .

bjouhier · 2017-04-01T16:32:11Z

What about having arrays start with ~ rather than . and end with ~. As they usually follow another value, it gives them a nice ~~<...>~~ symmetry.

wmertens · 2017-04-01T16:33:02Z

Also, the "force string start" char could be _. Then the final example becomes (a~fo*.o~*_test~_*_hm**h*-m~5~.1~!T~ (sorry on mobile)

…

On Sat, Apr 1, 2017, 6:26 PM Bruno Jouhier ***@***.***> wrote: Summary of revised proposal: - all values terminate with ~ - true, false, null become _T~, _F~, _N~ - numbers start with - (+ digit) or a digit and end with ~ - strings start with alpha or * (the only extra non-unreserved character we use) and terminate with ~ - - strings internally get space replaced by _ (common and very - - readable), * by **, _ by *_, ~ by *-, % by *.. - - I don't think we need *XX and *XXXX encoding, that will be done by uriencoding whenever actually needed. - - Empty string is *~ - objects start with ( and end with ')~' - arrays start with ., and end with ~. - - object keys are encoded as strings, so no starting * needed, only * escaping is done *OK* - - [1, 2] becomes .1~2~~ - {"a": "fo%o", "_test": "_hm*h~m", "5": [1, true]} becomes (a~fo*.o~*_test~**_hm**h*-m~5~.1~_T~~)~ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlvVyXNkJrgrFykeju6FyivGKvVXgks5rrnq6gaJpZM4Mr2qb> .

wmertens · 2017-04-01T16:36:58Z

That can work, it would take the ~ special case for true but that's no biggie

…

On Sat, Apr 1, 2017, 6:32 PM Wout Mertens ***@***.***> wrote: Also, the "force string start" char could be _. Then the final example becomes (a~fo*.o~*_test~_*_hm**h*-m~5~.1~!T~ (sorry on mobile) On Sat, Apr 1, 2017, 6:26 PM Bruno Jouhier ***@***.***> wrote: Summary of revised proposal: - all values terminate with ~ - true, false, null become _T~, _F~, _N~ - numbers start with - (+ digit) or a digit and end with ~ - strings start with alpha or * (the only extra non-unreserved character we use) and terminate with ~ - - strings internally get space replaced by _ (common and very - - readable), * by **, _ by *_, ~ by *-, % by *.. - - I don't think we need *XX and *XXXX encoding, that will be done by uriencoding whenever actually needed. - - Empty string is *~ - objects start with ( and end with ')~' - arrays start with ., and end with ~. - - object keys are encoded as strings, so no starting * needed, only * escaping is done *OK* - - [1, 2] becomes .1~2~~ - {"a": "fo%o", "_test": "_hm*h~m", "5": [1, true]} becomes (a~fo*.o~*_test~**_hm**h*-m~5~.1~_T~~)~ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlvVyXNkJrgrFykeju6FyivGKvVXgks5rrnq6gaJpZM4Mr2qb> .

bjouhier · 2017-04-01T16:39:17Z

I too was thinking of dropping the ~ after ). Only gotcha is the url-auto-detector issue that started this whole thing 😄.

wmertens · 2017-04-01T16:49:15Z

No, it would drop ending ) too :)

…

On Sat, Apr 1, 2017, 6:39 PM Bruno Jouhier ***@***.***> wrote: I too was thinking of dropping the ~ after ). Only gotcha is the url-auto-detector issue that started this whole thing 😄. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlvCUnU1_q1gwie7B-SanIgnTZRuLks5rrn21gaJpZM4Mr2qb> .

bjouhier · 2017-04-01T16:57:05Z

Summarizing one more time:

all values terminate with ~ or )
true, false, null become _T~, _F~, _N~
numbers start with - (+ digit) or a digit and end with ~
strings start with alpha or * (the only extra non-unreserved character
we use) and terminate with ~
- strings internally get space replaced by _ (common and very
  readable), * by **, _ by *_, ~ by *-, % by *..
- chars that need escaping are embedded as is. URI percent encoding will take care of them.
- empty string is *~
objects start with ( and end with )
arrays start with ~, and end with ~
object keys are encoded as strings, so no starting * needed, only * escaping is done
- [1, 2] becomes ~1~2~~
- {"a": "fo%o", "_test": "_hm*h~m", "5": [1, true]} becomes
  (a~fo*.o~*_test~**_hm**h*-m~5~~1~_T~~)

Closing characters (~ and )) could be dropped at the very end? This would solve the original problem but then parentheses are unbalanced.

wmertens · 2017-04-01T17:29:14Z

I just realized you can't use ~ to start an array because then you can't have array-in-array - there would be no difference between start and stop.

…

On Sat, Apr 1, 2017, 6:57 PM Bruno Jouhier ***@***.***> wrote: Summarizing one more time: Summary of revised proposal: - all values terminate with ~ - true, false, null become _T~, _F~, _N~ - numbers start with - (+ digit) or a digit and end with ~ - strings start with alpha or * (the only extra non-unreserved character we use) and terminate with ~ - - strings internally get space replaced by _ (common and very readable), * by **, _ by *_, ~ by *-, % by *.. - - chars that need escaping are embedded *as is*. URI percent encoding will take care of them. - empty string is *~ - objects start with ( and end with ) - arrays start with ~, and end with ~ - - object keys are encoded as strings, so no starting * needed, only * escaping is done - - [1, 2] becomes ~1~2~~ - - - {"a": "fo%o", "_test": "_hm*h~m", "5": [1, true]} becomes - - (a~fo*.o~*_test~**_hm**h*-m~5~~1~_T~~) Closing characters (~ and )) could be dropped at the very end? This would solve the original problem but then parentheses are unbalanced. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlhouorJjBdlwx-gMQb72ZMX5Eie2ks5rroHhgaJpZM4Mr2qb> .

bjouhier · 2017-04-01T17:41:11Z

Good point. It also broke the test on leading ~ to distinguish v1 and v2.

I find . a bit too difficult to spot visually. Why not start arrays with ! then?

wmertens · 2017-04-01T17:45:44Z

Sure that is fine…

…

On Sat, Apr 1, 2017, 7:41 PM Bruno Jouhier ***@***.***> wrote: Good point. It also broke the test on leading ~ to distinguish v1 and v2. I find . a bit too difficult to spot visually. Why not start arrays with ! then? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADWlgD2Sa9-3cXDIgdlqEljwmSH6Q8oks5rrow3gaJpZM4Mr2qb> .

bjouhier · 2017-04-01T19:26:42Z

Getting there. Here it comes:

all values terminate with ~ or )
true, false, null become _T~, _F~, _N~
numbers start with - (+ digit) or a digit and end with ~
strings start with alpha or * and terminate with ~
- strings internally get space replaced by _, * by **, _ by *_, ~ by *-, % by *..
- chars that need URL escaping are embedded as is. URI percent encoding will take care of them.
- empty string is *~
objects start with ( and end with )
arrays start with !, and end with ~
object keys are encoded as strings, so no starting * needed, only * escaping is done
closing characters (~ and )) may be dropped at the very end.

Regarding closing characters, the rule is a may. stringify has an option to control whether they are emitted or not. Parser does not have an option and accepts input with or without them.

Examples:
* [1, 2] becomes !1~2~~ or !1~2
* {"a": "fo%o", "_test": "_hm*h~m", "5": [1, true]} becomes
(a~fo*.o~*_test~**_hm**h*-m~5~!1~_T~~) or (a~fo*.o~*_test~**_hm**h*-m~5~!1~_T

wmertens · 2017-04-02T12:15:09Z

Alright, I implemented this, look at the tests to see the results. I had to also escape () to allow unambiguous parsing of ), which also allowed me to drop the last ~ in objects.

wmertens · 2017-04-02T14:50:31Z

I also made that shortening optional. I wonder if we should not leave a terminal ~ at all times, or maybe make that optional too.

I like how an object with booleans now looks like (doFoo~~withBar~~meep)~

bjouhier · 2017-04-02T18:17:16Z

Cool. I'll take a look but only tomorrow. Thanks.

wmertens · 2017-04-02T21:20:26Z

Well, this was fun. I'm extremely happy to report that on my test object in Chrome at least, v2 now outperforms native JSON for both parsing and stringifying 😁

performance.html:15 JSON: 200000 parsed in 731ms, 0.003655ms/item
performance.html:23 JSON: 200000 stringified in 448ms, 0.00224ms/item
performance.html:32 v1: 200000 parsed in 1337ms, 0.006685ms/item
performance.html:40 v1: 200000 stringified in 934ms, 0.00467ms/item
performance.html:49 v2: 200000 parsed in 601ms, 0.003005ms/item
performance.html:57 v2: 200000 stringified in 403ms, 0.002015ms/item

wmertens mentioned this issue Apr 1, 2017

jsurl v2 #17

Open

10 tasks

Using () to delimit objects breaks auto-url-detectors #16

Using () to delimit objects breaks auto-url-detectors #16

Comments

wmertens commented Mar 28, 2017

bjouhier commented Mar 28, 2017

wmertens commented Mar 28, 2017 via email

bjouhier commented Mar 29, 2017

wmertens commented Mar 29, 2017 via email

bjouhier commented Mar 29, 2017

wmertens commented Mar 31, 2017 via email • edited Loading

wmertens commented Mar 31, 2017

bjouhier commented Apr 1, 2017

wmertens commented Apr 1, 2017 via email

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017 • edited Loading

bjouhier commented Apr 1, 2017

bjouhier commented Apr 1, 2017

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017 • edited Loading

bjouhier commented Apr 1, 2017

wmertens commented Apr 1, 2017

wmertens commented Apr 1, 2017

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017 • edited Loading

bjouhier commented Apr 1, 2017

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017 • edited Loading

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017

wmertens commented Apr 1, 2017 via email

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017 • edited Loading

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017

wmertens commented Apr 1, 2017 via email

bjouhier commented Apr 1, 2017

wmertens commented Apr 2, 2017

wmertens commented Apr 2, 2017

bjouhier commented Apr 2, 2017

wmertens commented Apr 2, 2017

wmertens commented Mar 31, 2017 via email •

edited

Loading

bjouhier commented Apr 1, 2017 •

edited

Loading

bjouhier commented Apr 1, 2017 •

edited

Loading

bjouhier commented Apr 1, 2017 •

edited

Loading

bjouhier commented Apr 1, 2017 •

edited

Loading

bjouhier commented Apr 1, 2017 •

edited

Loading