Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_syslog incorrectly adds the previous year for RFC3464 timestamps when close to the new year and the system timezone is not UTC #641

Open
yangshike opened this issue Jan 11, 2024 · 1 comment
Labels
type: bug A code related bug vrl: stdlib Changes to the standard library

Comments

@yangshike
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When the system time is (+800) 2024-01-01 00:00:00 - 2024-01-01 08:00

The timestamp field parsed through parse_syslog becomes 2022, this bug is reproducible

screenshot:

image
image

Configuration

[sources.message_log]
type = "file"
include = ["/var/log/messages"]
start_at_beginning = true
[transforms.modify_message_log]
type = "remap"
inputs = [ "message_log" ]
source = '''
. = parse_syslog!(.message)
'''
[sinks.console]
type = "console"
inputs = ["modify_message_log"]
encoding.codec = "json"

Version

0.33

Debug Output

No response

Example Data

Comparison of original log and log processed by vector:
image

Additional Context

No response

References

No response

@yangshike yangshike added the type: bug A code related bug label Jan 11, 2024
@jszwedko jszwedko transferred this issue from vectordotdev/vector Jan 12, 2024
@drmason13
Copy link
Contributor

The year is inferred using the current time in UTC, and the month as parsed from the log message (without any conversion).

This seems to be where the problems occur. The message's month could be different in UTC if the message is in a different timezone. The code is shared between Vector and a dependency syslog_loose.

  • syslog_loose takes a resolve_year closure, runs it and builds a datetime with the result, and then converts the built datetime using an optional offset.
  • Vector defines and passes in the resolve_year closure.
  • worth noting that the IncompleteDate passed to resolve year is not translated to or from any timezone. I.e. a month is read from a str, and then that produces a year somehow, with zero knowledge of what timezone the message time is in.

I don't understand what's gone wrong exactly and how it managed to end up with 2022, but I think it's likely to be in this resolve_year and syslog_loose interaction.

Timezones are hard.

@jszwedko jszwedko changed the title Bug Report for function parse_syslog parse_syslog incorrectly adds the previous year for RFC3464 timestamps when close to the new year and the system timezone is not UTC May 22, 2024
@jszwedko jszwedko added the vrl: stdlib Changes to the standard library label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug vrl: stdlib Changes to the standard library
Projects
None yet
Development

No branches or pull requests

3 participants