Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Enable automated valgrind tests #280

Open
wants to merge 20 commits into
base: bag71
Choose a base branch
from

Conversation

rousskov
Copy link

@rousskov rousskov commented Jan 7, 2025

TBD

It is best to version valgrind suppression file in Squid repository (as
opposed to some CI tool repository) because suppressions are
Squid-version specific. The suppression file should not rot too much
because automated CI tests will flag pull requests that add leaks.

It would be nice if CI tests could also warn about no-longer-needed
suppressions, but implementing that is difficult because it requires
analyzing valgrind logs across all CI tests that use valgrind (including
across all tested Squid build environments/configurations).

SquidMain() was split into two functions (and two AsyncJob::Start()
calls were moved) to be able to ignore leaks that happen, roughly
speaking, _before_ the start of the main loop. See
initialization-before-primary-loop suppression. This split would not be
necessary if valgrind supported "negative" suppressions (i.e. if
valgrind could suppress call stacks that do _not_ include Foo()).
* lack of ip_table cleanup affects DNS queries, not just /etc/hosts
* valgrind reports "Invalid read of size 8" errors for OpenSSL code
  used by Security::ServerOptions::updateContextEecdh()

Also fixed a spelling error in a comment and removed a placeholder(?).
IP cache entries store callbacks. Callbacks may lock objects that are
awaiting them. If that happens, Valgrind will show a memory leak.
Eliminating these false leak signals helps in leak triage by removing
difficult-to-rule-out suspects.

Context: Looking for an object that holds leaking HttpRequest that holds
leaking ConnStateData. Seeing leaking ipcache_entry objects that were
allocated after ConnStateData was. Suspecting that an ipcache_entry may
keep a shared pointer lock on something that is related to HttpRequest.
This change rejected that suspicion, but stopped ipcache_entry leaks.
This change partially addresses XXX in the beginning of SquidShutdown()
by allowing connection closure handlers in helper code to run for two
TLS-related helpers. TODO: Apply similar changes to all helpers.

Without this change, Helper::Session objects are not destroyed at Squid
exit, and Valgrind reports at least two MemBuf memory leaks associated
with Helper::Session fields:

    32 bytes in 1 blocks are definitely lost in loss record 354 of 1,458
        at 0x4848899: malloc (vg_replace_malloc.c:381)
        by 0xAE4247: xmalloc (xalloc.cc:105)
        by 0xAC4202: MemPoolMalloc::allocate() (PoolMalloc.cc:39)
        by 0x695C19: Mem::Allocator::alloc() (Allocator.h:47)
        by 0xAC0A95: Mem::AllocatorProxy::alloc() (AllocatorProxy.cc:18)
        by 0x695EA6: cbdata::operator new(unsigned long) (cbdata.cc:48)
        by 0x694FC0: cbdataInternalAlloc(int) (cbdata.cc:154)
        by 0x585932: MemBuf::operator new(unsigned long) (MemBuf.h:25)
        by 0x70A053: helperDispatch(Helper::Session*, Helper::Xaction*) (helper.cc:1478)
        by 0x703140: Helper::Client::submitRequest(Helper::Xaction*) (helper.cc:464)

    32 bytes in 1 blocks are definitely lost in loss record 355 of 1,458
        at 0x4848899: malloc (vg_replace_malloc.c:381)
        by 0xAE4247: xmalloc (xalloc.cc:105)
        by 0xAC4202: MemPoolMalloc::allocate() (PoolMalloc.cc:39)
        by 0x695C19: Mem::Allocator::alloc() (Allocator.h:47)
        by 0xAC0A95: Mem::AllocatorProxy::alloc() (AllocatorProxy.cc:18)
        by 0x695EA6: cbdata::operator new(unsigned long) (cbdata.cc:48)
        by 0x694FC0: cbdataInternalAlloc(int) (cbdata.cc:154)
        by 0x585932: MemBuf::operator new(unsigned long) (MemBuf.h:25)
        by 0x70A053: helperDispatch(Helper::Session*, Helper::Xaction*) (helper.cc:1478)
        by 0x703140: Helper::Client::submitRequest(Helper::Xaction*) (helper.cc:464)
…er I/O

One positive side effect of this workaround is that ALE and/or other
CodeContext objects have fewer chances to get stuck at exit (and be
declared as leaking by valgrind or find-alive.pl).
It is not yet clear whether cleaning IP cache on shutdown is a good idea
in general, but it should _not_ help with Valgrind memory leak reports
related to lookup callbacks (i.e. the primary motivation behind adding
this cleanup code) because the corresponding IP cache entries ought to
be locked[^1], and the new cache cleaning code skips locked entries.

[^1]: If entries with callbacks are not locked, then the requestor will
get stuck when the entry is purged (e.g., for cache capacity reasons,
during regular Squid operations; see ipcache_purgelru()).

This reverts commit 9ab2f85.
This suppression is not necessary to pass current tests, but it is a
good idea to suppress this known case of a limited-allocations "global",
especially since keeping those allocations "exposed" complicates triage
of other leak suspects by inciting developers to chaise this red
herring.
@rousskov rousskov changed the base branch from master to bag71 January 7, 2025 16:04
rousskov and others added 12 commits January 7, 2025 11:14
When Squid is built with optimizations, this function is usually
inlined. Valgrind does not mangle inlined functions when checking
suppressions, so the original fun:_ZL14initializeOnceiPPc suppression
does not match in optimized builds!

To combat the above problem, I added fun:initializeOnce, a second
initializeOnce() suppression. This one was targeting optimized builds.
After doing that, I began to worry that this function name is too
generic for its essentially "global" suppression context and renamed it
to SquidMainInitializeOnce().

Further testing suggests that dealing with optimized builds is probably
too difficult -- too many suppressions need to be "duplicated" to
accommodate optimization variations. The next change will commit to
non-optimized builds and will remove
initialization-before-primary-loop-optimized suppression added here, but
I wanted to keep this change for the record.

Also polished initializeOnce() description.
As detailed in the previous branch commit, testing optimized builds
requires maintaining multiple sets of suppressions. Technically, the
number of suppressions per "leak" can get large due to optimization
variations. The number of variations can be cut down using function name
masks, but such masks decrease suppression precision, increasing the
risk of accidentally matching an unknown/true/undisclosed leak.

Non-optimized builds also take less time to compile and ease triage of
coredumps and similar problems.

The drawback of using non-optimized builds is that deployed Squids run
optimized code. Ideally, we want to test deployed code.

TODO: Consider testing both builds, one with valgrind and one without.
Without this option, Valgrind reports too many false leaks. I did not
investigate why, assuming that this has something to do with cbdata
memory allocation trick in regular builds (and that this ./configure
option was added primarily to avoid those false positives).
The old "start as nobody" scheme was too awkward to support, especially
as we start adding more configuration files (e.g., suppression file)
because, on GitHub Actions runners, user "nobody" cannot read cloned
sources.

Moreover, it is arguably better to start Squid as root because many
(most?) deployments probably do that, and some Squid features require
that.

Latest squid-overlord agents listen on localhost (by default) and,
hence, are far more difficult to abuse, even when running as root, in
most test environments.

Also reduce chances that a developer starts Overlord accidentally by
effectively requiring passwordless sudo (which is typical for CI
environments). Folks can still start Overlord manually (as any user),
and test-functionality.sh will just use that running agent, of course.

Requires squid-overlord commit 953ef9b6.
Benefits from squid-overlord commit 47cf864e.
thus preventing Valgrind from reporting leak problems.
This reverts commit 08ec76e. That
(experimental) commit was wrong -- HeaderUpdater leak was caused by a
different problem. The correct fix will be committed next.
This fix also reduces memory leak false positives,
reported by Valgrind.

----

Cherry-picked SQUID-1034-convert-cbdata-htable-global commit 2d2ffd0
that became master/v8 commit bbcef1c (with an adjusted commit message).
... fixed by the previous branch commit.

Adjusted Ipc::UdsSender leak suppressions. Those leaks were (evidently
incorrectly) suspected to be caused by HeaderUpdater leaks, but they are
still present after HeaderUpdater leaks were fixed.
Resolved minor conflicts to facilitate GitHub Actions branch tests and
future bag71 updates with recent branch fixes.
@rousskov rousskov changed the base branch from bag71 to master February 12, 2025 14:41
@rousskov rousskov changed the base branch from master to bag71 February 12, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants