forked from squid-cache/squid
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: Enable automated valgrind tests #280
Open
rousskov
wants to merge
20
commits into
bag71
Choose a base branch
from
SQUID-1011-ci-test-reconfiguration
base: bag71
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+13
−44
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It is best to version valgrind suppression file in Squid repository (as opposed to some CI tool repository) because suppressions are Squid-version specific. The suppression file should not rot too much because automated CI tests will flag pull requests that add leaks. It would be nice if CI tests could also warn about no-longer-needed suppressions, but implementing that is difficult because it requires analyzing valgrind logs across all CI tests that use valgrind (including across all tested Squid build environments/configurations). SquidMain() was split into two functions (and two AsyncJob::Start() calls were moved) to be able to ignore leaks that happen, roughly speaking, _before_ the start of the main loop. See initialization-before-primary-loop suppression. This split would not be necessary if valgrind supported "negative" suppressions (i.e. if valgrind could suppress call stacks that do _not_ include Foo()).
* lack of ip_table cleanup affects DNS queries, not just /etc/hosts * valgrind reports "Invalid read of size 8" errors for OpenSSL code used by Security::ServerOptions::updateContextEecdh() Also fixed a spelling error in a comment and removed a placeholder(?).
IP cache entries store callbacks. Callbacks may lock objects that are awaiting them. If that happens, Valgrind will show a memory leak. Eliminating these false leak signals helps in leak triage by removing difficult-to-rule-out suspects. Context: Looking for an object that holds leaking HttpRequest that holds leaking ConnStateData. Seeing leaking ipcache_entry objects that were allocated after ConnStateData was. Suspecting that an ipcache_entry may keep a shared pointer lock on something that is related to HttpRequest. This change rejected that suspicion, but stopped ipcache_entry leaks.
This change partially addresses XXX in the beginning of SquidShutdown() by allowing connection closure handlers in helper code to run for two TLS-related helpers. TODO: Apply similar changes to all helpers. Without this change, Helper::Session objects are not destroyed at Squid exit, and Valgrind reports at least two MemBuf memory leaks associated with Helper::Session fields: 32 bytes in 1 blocks are definitely lost in loss record 354 of 1,458 at 0x4848899: malloc (vg_replace_malloc.c:381) by 0xAE4247: xmalloc (xalloc.cc:105) by 0xAC4202: MemPoolMalloc::allocate() (PoolMalloc.cc:39) by 0x695C19: Mem::Allocator::alloc() (Allocator.h:47) by 0xAC0A95: Mem::AllocatorProxy::alloc() (AllocatorProxy.cc:18) by 0x695EA6: cbdata::operator new(unsigned long) (cbdata.cc:48) by 0x694FC0: cbdataInternalAlloc(int) (cbdata.cc:154) by 0x585932: MemBuf::operator new(unsigned long) (MemBuf.h:25) by 0x70A053: helperDispatch(Helper::Session*, Helper::Xaction*) (helper.cc:1478) by 0x703140: Helper::Client::submitRequest(Helper::Xaction*) (helper.cc:464) 32 bytes in 1 blocks are definitely lost in loss record 355 of 1,458 at 0x4848899: malloc (vg_replace_malloc.c:381) by 0xAE4247: xmalloc (xalloc.cc:105) by 0xAC4202: MemPoolMalloc::allocate() (PoolMalloc.cc:39) by 0x695C19: Mem::Allocator::alloc() (Allocator.h:47) by 0xAC0A95: Mem::AllocatorProxy::alloc() (AllocatorProxy.cc:18) by 0x695EA6: cbdata::operator new(unsigned long) (cbdata.cc:48) by 0x694FC0: cbdataInternalAlloc(int) (cbdata.cc:154) by 0x585932: MemBuf::operator new(unsigned long) (MemBuf.h:25) by 0x70A053: helperDispatch(Helper::Session*, Helper::Xaction*) (helper.cc:1478) by 0x703140: Helper::Client::submitRequest(Helper::Xaction*) (helper.cc:464)
…er I/O One positive side effect of this workaround is that ALE and/or other CodeContext objects have fewer chances to get stuck at exit (and be declared as leaking by valgrind or find-alive.pl).
It is not yet clear whether cleaning IP cache on shutdown is a good idea in general, but it should _not_ help with Valgrind memory leak reports related to lookup callbacks (i.e. the primary motivation behind adding this cleanup code) because the corresponding IP cache entries ought to be locked[^1], and the new cache cleaning code skips locked entries. [^1]: If entries with callbacks are not locked, then the requestor will get stuck when the entry is purged (e.g., for cache capacity reasons, during regular Squid operations; see ipcache_purgelru()). This reverts commit 9ab2f85.
This suppression is not necessary to pass current tests, but it is a good idea to suppress this known case of a limited-allocations "global", especially since keeping those allocations "exposed" complicates triage of other leak suspects by inciting developers to chaise this red herring.
When Squid is built with optimizations, this function is usually inlined. Valgrind does not mangle inlined functions when checking suppressions, so the original fun:_ZL14initializeOnceiPPc suppression does not match in optimized builds! To combat the above problem, I added fun:initializeOnce, a second initializeOnce() suppression. This one was targeting optimized builds. After doing that, I began to worry that this function name is too generic for its essentially "global" suppression context and renamed it to SquidMainInitializeOnce(). Further testing suggests that dealing with optimized builds is probably too difficult -- too many suppressions need to be "duplicated" to accommodate optimization variations. The next change will commit to non-optimized builds and will remove initialization-before-primary-loop-optimized suppression added here, but I wanted to keep this change for the record. Also polished initializeOnce() description.
As detailed in the previous branch commit, testing optimized builds requires maintaining multiple sets of suppressions. Technically, the number of suppressions per "leak" can get large due to optimization variations. The number of variations can be cut down using function name masks, but such masks decrease suppression precision, increasing the risk of accidentally matching an unknown/true/undisclosed leak. Non-optimized builds also take less time to compile and ease triage of coredumps and similar problems. The drawback of using non-optimized builds is that deployed Squids run optimized code. Ideally, we want to test deployed code. TODO: Consider testing both builds, one with valgrind and one without.
Without this option, Valgrind reports too many false leaks. I did not investigate why, assuming that this has something to do with cbdata memory allocation trick in regular builds (and that this ./configure option was added primarily to avoid those false positives).
The old "start as nobody" scheme was too awkward to support, especially as we start adding more configuration files (e.g., suppression file) because, on GitHub Actions runners, user "nobody" cannot read cloned sources. Moreover, it is arguably better to start Squid as root because many (most?) deployments probably do that, and some Squid features require that. Latest squid-overlord agents listen on localhost (by default) and, hence, are far more difficult to abuse, even when running as root, in most test environments. Also reduce chances that a developer starts Overlord accidentally by effectively requiring passwordless sudo (which is typical for CI environments). Folks can still start Overlord manually (as any user), and test-functionality.sh will just use that running agent, of course. Requires squid-overlord commit 953ef9b6. Benefits from squid-overlord commit 47cf864e.
thus preventing Valgrind from reporting leak problems.
This reverts commit 08ec76e. That (experimental) commit was wrong -- HeaderUpdater leak was caused by a different problem. The correct fix will be committed next.
... fixed by the previous branch commit. Adjusted Ipc::UdsSender leak suppressions. Those leaks were (evidently incorrectly) suspected to be caused by HeaderUpdater leaks, but they are still present after HeaderUpdater leaks were fixed.
Resolved minor conflicts to facilitate GitHub Actions branch tests and future bag71 updates with recent branch fixes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TBD