Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accumulation of gigabytes of .tmp files after daily rollover #921

Open
gtaylor1981 opened this issue Feb 5, 2025 · 3 comments
Open

Accumulation of gigabytes of .tmp files after daily rollover #921

gtaylor1981 opened this issue Feb 5, 2025 · 3 comments

Comments

@gtaylor1981
Copy link

gtaylor1981 commented Feb 5, 2025

Description

We are logging on software running on devices with intermittent connectivity.
We are using SizeAndTimeBasedRollingPolicy to roll over the log file either daily, or when it gets too big.

Recently we noticed some devices with completely full storage. This was caused by the presence of thousands of .tmp files in the logging directory, consuming many gigabytes of space.

Steps to Reproduce

The devices occasionally restart after a few days, which also causes their system clock to reset. The limited connectivity means they cannot sync their time correctly with NTP. This causes logs to be written out with the same date each time as last time.

This does not immediately cause a problem, because when the app restarts SizeAndTimeBasedFileNamingAndTriggeringPolicy::computeCurrentPeriodsHighestCounterValue() finds the latest counter value and continues logging from there.
However, when the time-based rollover occurs at midnight, then internally the counter is just set back to 0. But because the device has restarted many times before, logs already exist for this new day.

Then, when the next size-based rollover occurs, the main log file (app.log) is renamed to app.0.log + <nanoTime> + .tmp. This file is sent for compression, with a target name of app.0.log.gz. Normally, the compressor will delete this .tmp file on success. However, the target file already exists (having been written before at some point in the past). Its existence is checked for in Compressor::gzCompress(), where a warning is logged and the method returns immediately. This means the '.tmp' file is never removed.

The logic there is understandable, as the Compressor should probably not delete the file upon error.
It is probably the responsibility of the TimeBasedRollingPolicy class to ensure its temp file is removed.

Expected Behavior

Firstly, erratic clocks could be compensated for by checking what the next counter should be on date rollover, rather than assuming it is 0.

The expected behaviour if the target file already exists during compression is harder to define. As compression occurs asynchronously, the .tmp file can no longer be renamed back to app.log as this probably already exists with new entries. Maybe they could be compressed to the target name plus a suffix?

Leaving the .tmp files to accumulate indefinitely is certainly not the expected behavior. They could periodically be cleaned up instead perhaps.

Bonus problem

When the devices' connectivity is eventually restored, the time will synchronise and be persisted to a time point a few months in the future. However, then the reset-rollover cycle continues again.

This time, any old log files from the previous month are also kept indefinitely. This seems to be because TimeBasedArchiveRemover has a constant INACTIVITY_TOLERANCE_IN_MILLIS of 32 days. Files much older than this do not get considered for removal, despite the <maxHistory>100</maxHistory> in our config. Our expectation was that we would have a maximum of 100 files, whatever happens.

Logs and Configuration

<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>../log/app.log</file>

    <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
        <fileNamePattern>../log/app-%d{yyyy-MM-dd, UTC}.%i.log.gz</fileNamePattern>
        <maxFileSize>3MB</maxFileSize>
        <maxHistory>100</maxHistory>
        <totalSizeCap>500MB</totalSizeCap>
    </rollingPolicy>

    <encoder>
        <pattern>%date{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX", UTC} %-5level %logger{36}[%line] %X{sourceThread} - %msg%n</pattern>
    </encoder>
</appender>

Environment

Logback Version: seen in 1.5.6 but likely present in current (v1.5.16) too
Java Version: 17
Operating System: Ubuntu Core 22.04

Related

Other issues with .tmp files are:

@gtaylor1981 gtaylor1981 changed the title Issues during rollover Leftover .tmp files after rollover Feb 5, 2025
@gtaylor1981 gtaylor1981 changed the title Leftover .tmp files after rollover Accumulation of gigabytes of .tmp files after daily rollover Feb 5, 2025
@gtaylor1981
Copy link
Author

gtaylor1981 commented Feb 7, 2025

While testing the above, I used a very simple config:

<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>logs/diagnostic.log</file>

    <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
        <fileNamePattern>logs/diagnostic-%d{yyyy-MM-dd'T'HHmmss, UTC}.%i.log.gz</fileNamePattern> <!-- rollover every second -->
        <maxFileSize>1KB</maxFileSize>
        <maxHistory>10</maxHistory>
        <totalSizeCap>10MB</totalSizeCap>
    </rollingPolicy>

    <encoder>
        <pattern>%date{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX", UTC} %-5level [%thread] %logger{36}.%M\(%line\) - %msg%n</pattern>
    </encoder>
</appender>

I then used this config in a program like

for (int i = 0; i < 1_000_000; i++) {
    LOG.info("log message here - {}", UUID.randomUUID());
    Thread.sleep(5);
}

This would occasionally (i.e. a few times in 100 rollovers) leave .tmp files lying round, with a log like:

16:38:00,244 |-INFO in c.q.l.co.rolling.helper.RenameUtil - Renaming file [logs\diagnostic.log] to [logs\diagnostic-2025-02-06T1537.1.log30770405312400.tmp]
16:38:00,247 |-INFO in ch.qos.logback.core.rolling.helper.Compressor - GZ compressing [logs\diagnostic-2025-02-06T1537.1.log30770405312400.tmp] as [logs\diagnostic-2025-02-06T1537.1.log.gz]
16:38:00,248 |-INFO in c.q.l.core.rolling.helper.TimeBasedArchiveRemover - deleting c:\GIT\experiments\logs\diagnostic-2025-02-06T1527.0.log.gz
16:38:00,249 |-INFO in c.q.l.core.rolling.helper.TimeBasedArchiveRemover - deleting c:\GIT\experiments\logs\diagnostic-2025-02-06T1527.1.log.gz
16:38:00,252 |-INFO in c.q.l.core.rolling.helper.TimeBasedArchiveRemover - No removal attempts were made.
16:38:00,256 |-INFO in ch.qos.logback.core.rolling.helper.Compressor - Done GZ compressing [logs\diagnostic-2025-02-06T1537.1.log30770405312400.tmp] as [logs\diagnostic-2025-02-06T1537.1.log.gz]
16:38:49,297 |-INFO in c.q.l.co.rolling.helper.RenameUtil - Renaming file [logs\diagnostic.log] to [logs\diagnostic-2025-02-06T1538.0.log30819458461000.tmp]
16:38:49,299 |-WARN in ch.qos.logback.core.rolling.helper.Compressor - The target compressed file named [logs/diagnostic-2025-02-06T1538.0.log.gz] exist already. Aborting file compression.
16:38:49,302 |-INFO in c.q.l.core.rolling.helper.TimeBasedArchiveRemover - No removal attempts were made.
16:39:00,009 |-INFO in c.q.l.co.rolling.helper.RenameUtil - Renaming file [logs\diagnostic.log] to [logs\diagnostic-2025-02-06T1538.1.log30830169919500.tmp]

It seems this is related to running the Compressor asynchronously, i.e. by the time the Compressor starts, the log file has already rolled over and created a new file.

@ceki
Copy link
Member

ceki commented Feb 7, 2025

As of version 1.5.8, SizeAndTimeBasedRollingPolicy uses an internal size counter instead of occasionally calling the OS to obtain file size. The new approach is more precise and with less jitter.

Which version of logback-classic are you using in the above test?

In addition, I would also recommend leaving the file property unset and only set the fileNamePattern.

@gtaylor1981
Copy link
Author

Thanks for your comment.
I was using an older version 1.5.6. The latest 1.5.16 does indeed seem more stable. However, I did still see the problem a few times, typically on startup.
The file property does seem to be the cause. We are looking into whether we can change this, as it is useful for us to have a non-date based filename (in case of erratic system clocks as mentioned above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants