Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse/Encode logfmt is not idempotent when values have double quotes #777

Open
srstrickland opened this issue Mar 30, 2024 · 7 comments
Open
Labels
type: bug A code related bug vrl: stdlib Changes to the standard library

Comments

@srstrickland
Copy link

In general, any encode/decode function pairs should be idempotent; that is, this should always be true:

x == decode(encode(x))

It appears to not be the case with encode_logfmt and parse_logfmt, as demonstrated here, where result.key has additional escape characters.

Taken to the extreme:

.obj.key = s'this has a " quote'
parse_logfmt!(encode_logfmt(parse_logfmt!(encode_logfmt(parse_logfmt!(encode_logfmt(parse_logfmt!(encode_logfmt(parse_logfmt!(encode_logfmt(.obj))))))))))

yields:

{ "key": "this has a \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" quote" }

This isn't really causing me any headaches, and the fact that it hasn't been raised yet probably indicates the same for other folks, just thought it was worth bringing up, as I stumbled on it today.

@jszwedko jszwedko added type: bug A code related bug vrl: stdlib Changes to the standard library labels Apr 1, 2024
@drmason13
Copy link
Contributor

Thanks for raising this, I think you're absolutely right about the principle. I would like to take a stab at fixing this.

@drmason13
Copy link
Contributor

drmason13 commented May 7, 2024

I got a test reproducing the error this evening. The problem is the same in both directions:
encode(decode())
decode(encode())

decoding does not remove escape characters.
encoding escapes " and \ characters. EDIT: also \n characters.
Either decoding should parse and remove the escape characters, or encoding should not add escape characters.

I've been testing the encode/decode value functions, so I don't think it's just the logfmt functions.

@drmason13
Copy link
Contributor

drmason13 commented May 7, 2024

Adding some parsing escape removal to parsing sounds like the right fix to me.

It would be a breaking change for people who are relying on escape characters being preserved when parsing.

github-merge-queue bot pushed a commit that referenced this issue Jul 23, 2024
* fix(stdlib): parse_logfmt performs escaping (#777)

* label changelog fragment as breaking

* explaining comment for Some(_) -> c branch
@Gutawer
Copy link

Gutawer commented Aug 6, 2024

As far as I can tell this is fixed for the case of \" but it seems to still not round-trip properly for the \n escape sequence:
https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoiLnBhcnNlZCA9IHBhcnNlX2xvZ2ZtdCEoLm1lc3NhZ2UpXG4uZW5jb2RlZCA9IGVuY29kZV9sb2dmbXQoLnBhcnNlZClcbi5jaGVjayA9IC5tZXNzYWdlID09IC5lbmNvZGVkIiwiZXZlbnQiOnsibWVzc2FnZSI6Im1lc3NhZ2U9XCJoZWxsbyBcXG4gXFxcIiBoaVwiIn0sImlzX2pzb25sIjpmYWxzZSwiZXJyb3IiOm51bGx9

In fact the output is the same regardless of whether the string contains \n or \\n, when round-tripping both become \\n. The parser seems to work properly though, I think the encoder is now the thing causing the issue

@Gutawer
Copy link

Gutawer commented Aug 6, 2024

'\n' => output.push_str(r"\\n"),

Think maybe this should be changed to r"\n"?

@jszwedko
Copy link
Member

jszwedko commented Aug 6, 2024

As far as I can tell this is fixed for the case of \" but it seems to still not round-trip properly for the \n escape sequence: https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoiLnBhcnNlZCA9IHBhcnNlX2xvZ2ZtdCEoLm1lc3NhZ2UpXG4uZW5jb2RlZCA9IGVuY29kZV9sb2dmbXQoLnBhcnNlZClcbi5jaGVjayA9IC5tZXNzYWdlID09IC5lbmNvZGVkIiwiZXZlbnQiOnsibWVzc2FnZSI6Im1lc3NhZ2U9XCJoZWxsbyBcXG4gXFxcIiBoaVwiIn0sImlzX2pzb25sIjpmYWxzZSwiZXJyb3IiOm51bGx9

In fact the output is the same regardless of whether the string contains \n or \\n, when round-tripping both become \\n. The parser seems to work properly though, I think the encoder is now the thing causing the issue

I think an issue here might be that encode_key_value is escaping the \s but then then encoding the value as a string to render the output is also escaping the \s 🤔

@drmason13
Copy link
Contributor

'\n' => output.push_str(r"\\n"),

Think maybe this should be changed to r"\n"?

I think this is exactly it, silly mistake on my part!

I've added a newline character to the encode_decode / decode_encode cycle tests and they fail:

  left: Object({KeyString("key"): Bytes(b"this has a \" quote and a \n newline")})
 right: Object({KeyString("key"): Bytes(b"this has a \" quote and a \\n newline")})

the decode_encode_cycle test fails a bit more spectacularly!

  left: Bytes(b"key=\"this has a \\\" quote and a \\\n newline\"")
 right: Bytes(b"key=\"this has a \\\" quote and a \\\\\\\\n newline\"")

Making hte change:

 '\n' => output.push_str(r"\n"), 

fixes the encode_decode cycle, but doesn't quite fix the decode_encode_cycle:

The new failure with the change is:

  left: Bytes(b"key=\"this has a \\\" quote and a \\\n newline\"")
 right: Bytes(b"key=\"this has a \\\" quote and a \\\\\\n newline\"")

I am not sure how to fix this to make a full encode -> decode -> encode cycle work.

encoding the value as a string to render the output is also escaping the \s

Perhaps this is causing the remaining test failure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug vrl: stdlib Changes to the standard library
Projects
None yet
Development

No branches or pull requests

4 participants