Accumulation of gigabytes of .tmp files after daily rollover #921

gtaylor1981 · 2025-02-05T17:26:37Z

Description

We are logging on software running on devices with intermittent connectivity.
We are using SizeAndTimeBasedRollingPolicy to roll over the log file either daily, or when it gets too big.

Recently we noticed some devices with completely full storage. This was caused by the presence of thousands of .tmp files in the logging directory, consuming many gigabytes of space.

Steps to Reproduce

The devices occasionally restart after a few days, which also causes their system clock to reset. The limited connectivity means they cannot sync their time correctly with NTP. This causes logs to be written out with the same date each time as last time.

This does not immediately cause a problem, because when the app restarts SizeAndTimeBasedFileNamingAndTriggeringPolicy::computeCurrentPeriodsHighestCounterValue() finds the latest counter value and continues logging from there.
However, when the time-based rollover occurs at midnight, then internally the counter is just set back to 0. But because the device has restarted many times before, logs already exist for this new day.

Then, when the next size-based rollover occurs, the main log file (app.log) is renamed to app.0.log + <nanoTime> + .tmp. This file is sent for compression, with a target name of app.0.log.gz. Normally, the compressor will delete this .tmp file on success. However, the target file already exists (having been written before at some point in the past). Its existence is checked for in Compressor::gzCompress(), where a warning is logged and the method returns immediately. This means the '.tmp' file is never removed.

The logic there is understandable, as the Compressor should probably not delete the file upon error.
It is probably the responsibility of the TimeBasedRollingPolicy class to ensure its temp file is removed.

Expected Behavior

Firstly, erratic clocks could be compensated for by checking what the next counter should be on date rollover, rather than assuming it is 0.

The expected behaviour if the target file already exists during compression is harder to define. As compression occurs asynchronously, the .tmp file can no longer be renamed back to app.log as this probably already exists with new entries. Maybe they could be compressed to the target name plus a suffix?

Leaving the .tmp files to accumulate indefinitely is certainly not the expected behavior. They could periodically be cleaned up instead perhaps.

Bonus problem

When the devices' connectivity is eventually restored, the time will synchronise and be persisted to a time point a few months in the future. However, then the reset-rollover cycle continues again.

This time, any old log files from the previous month are also kept indefinitely. This seems to be because TimeBasedArchiveRemover has a constant INACTIVITY_TOLERANCE_IN_MILLIS of 32 days. Files much older than this do not get considered for removal, despite the <maxHistory>100</maxHistory> in our config. Our expectation was that we would have a maximum of 100 files, whatever happens.

Logs and Configuration

<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>../log/app.log</file>

    <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
        <fileNamePattern>../log/app-%d{yyyy-MM-dd, UTC}.%i.log.gz</fileNamePattern>
        <maxFileSize>3MB</maxFileSize>
        <maxHistory>100</maxHistory>
        <totalSizeCap>500MB</totalSizeCap>
    </rollingPolicy>

    <encoder>
        <pattern>%date{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX", UTC} %-5level %logger{36}[%line] %X{sourceThread} - %msg%n</pattern>
    </encoder>
</appender>

Environment

Logback Version: seen in 1.5.6 but likely present in current (v1.5.16) too
Java Version: 17
Operating System: Ubuntu Core 22.04

<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>logs/diagnostic.log</file>

    <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
        <fileNamePattern>logs/diagnostic-%d{yyyy-MM-dd'T'HHmmss, UTC}.%i.log.gz</fileNamePattern> <!-- rollover every second -->
        <maxFileSize>1KB</maxFileSize>
        <maxHistory>10</maxHistory>
        <totalSizeCap>10MB</totalSizeCap>
    </rollingPolicy>

    <encoder>
        <pattern>%date{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX", UTC} %-5level [%thread] %logger{36}.%M\(%line\) - %msg%n</pattern>
    </encoder>
</appender>

I then used this config in a program like

for (int i = 0; i < 1_000_000; i++) {
    LOG.info("log message here - {}", UUID.randomUUID());
    Thread.sleep(5);
}

This would occasionally (i.e. a few times in 100 rollovers) leave .tmp files lying round, with a log like:

16:38:00,244 |-INFO in c.q.l.co.rolling.helper.RenameUtil - Renaming file [logs\diagnostic.log] to [logs\diagnostic-2025-02-06T1537.1.log30770405312400.tmp]
16:38:00,247 |-INFO in ch.qos.logback.core.rolling.helper.Compressor - GZ compressing [logs\diagnostic-2025-02-06T1537.1.log30770405312400.tmp] as [logs\diagnostic-2025-02-06T1537.1.log.gz]
16:38:00,248 |-INFO in c.q.l.core.rolling.helper.TimeBasedArchiveRemover - deleting c:\GIT\experiments\logs\diagnostic-2025-02-06T1527.0.log.gz
16:38:00,249 |-INFO in c.q.l.core.rolling.helper.TimeBasedArchiveRemover - deleting c:\GIT\experiments\logs\diagnostic-2025-02-06T1527.1.log.gz
16:38:00,252 |-INFO in c.q.l.core.rolling.helper.TimeBasedArchiveRemover - No removal attempts were made.
16:38:00,256 |-INFO in ch.qos.logback.core.rolling.helper.Compressor - Done GZ compressing [logs\diagnostic-2025-02-06T1537.1.log30770405312400.tmp] as [logs\diagnostic-2025-02-06T1537.1.log.gz]
16:38:49,297 |-INFO in c.q.l.co.rolling.helper.RenameUtil - Renaming file [logs\diagnostic.log] to [logs\diagnostic-2025-02-06T1538.0.log30819458461000.tmp]
16:38:49,299 |-WARN in ch.qos.logback.core.rolling.helper.Compressor - The target compressed file named [logs/diagnostic-2025-02-06T1538.0.log.gz] exist already. Aborting file compression.
16:38:49,302 |-INFO in c.q.l.core.rolling.helper.TimeBasedArchiveRemover - No removal attempts were made.
16:39:00,009 |-INFO in c.q.l.co.rolling.helper.RenameUtil - Renaming file [logs\diagnostic.log] to [logs\diagnostic-2025-02-06T1538.1.log30830169919500.tmp]

It seems this is related to running the Compressor asynchronously, i.e. by the time the Compressor starts, the log file has already rolled over and created a new file.

ceki · 2025-02-07T12:58:11Z

As of version 1.5.8, SizeAndTimeBasedRollingPolicy uses an internal size counter instead of occasionally calling the OS to obtain file size. The new approach is more precise and with less jitter.

Which version of logback-classic are you using in the above test?

In addition, I would also recommend leaving the file property unset and only set the fileNamePattern.

gtaylor1981 · 2025-02-07T13:46:08Z

Thanks for your comment.
I was using an older version 1.5.6. The latest 1.5.16 does indeed seem more stable. However, I did still see the problem a few times, typically on startup.
The file property does seem to be the cause. We are looking into whether we can change this, as it is useful for us to have a non-date based filename (in case of erratic system clocks as mentioned above).

gtaylor1981 changed the title ~~Issues during rollover~~ Leftover .tmp files after rollover Feb 5, 2025

gtaylor1981 changed the title ~~Leftover .tmp files after rollover~~ Accumulation of gigabytes of .tmp files after daily rollover Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accumulation of gigabytes of .tmp files after daily rollover #921

Accumulation of gigabytes of .tmp files after daily rollover #921

gtaylor1981 commented Feb 5, 2025 •

edited

Loading

gtaylor1981 commented Feb 7, 2025 •

edited

Loading

ceki commented Feb 7, 2025 •

edited

Loading

gtaylor1981 commented Feb 7, 2025

Accumulation of gigabytes of .tmp files after daily rollover #921

Accumulation of gigabytes of .tmp files after daily rollover #921

Comments

gtaylor1981 commented Feb 5, 2025 • edited Loading

Description

Steps to Reproduce

Expected Behavior

Bonus problem

Logs and Configuration

Environment

Related

gtaylor1981 commented Feb 7, 2025 • edited Loading

ceki commented Feb 7, 2025 • edited Loading

gtaylor1981 commented Feb 7, 2025

gtaylor1981 commented Feb 5, 2025 •

edited

Loading

gtaylor1981 commented Feb 7, 2025 •

edited

Loading

ceki commented Feb 7, 2025 •

edited

Loading