-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tls_mgm: don't free anything in mod_destroy() #3269
Conversation
Any updates here? No progress has been made in the last 30 days, marking as stale. |
No updates here. |
Any updates here? No progress has been made in the last 30 days, marking as stale. |
No updates here. |
@jes , even if I agree with the issue, I do not agree with the solution. You cannot destroy infrastructure resources before giving a chance to the modules to do a proper shutdown - for an example a module may require a TCP conn for its shutdown - this is a theoretical example, I'm not sure if there is such module, not even if you actually can use the TCP layer in the shutdown sequence - but let's check this first. |
OK, good point. Then what about making If you think that might be OK then I'll update the PR. |
yeah, indeed, it is a "bit" broken for the tls_mgm module to trash its data while this is still in use by ongoing connections. So, a quick workaround here will be for the tls_mgm module NOT to free anything that may be used by the connections. |
This fixes a possible double-free during shutdown. The issue was that the `tls_mgm` module was unloaded (free'ing all of its "domains") before the connections were destroyed. This left connections with dangling pointers to the domains, and when the connections are cleaned up, if the reference count is 0, `tls_mgm` can then try to free the domain again, causing a crash.
5e78cb7
to
f1838f2
Compare
@bogdan-iancu thanks, I've force-pushed a commit that does that, and I'll change the title of the pull request. |
@jes , revisiting a bit this issue and doing some brainstorming with @liviuchircu and @razvancrainea , we got to the conclusion that the code, as it is right now, should work ok. |
@jes , There was a similar report to yours, see #3338 . So, trying to see how to investigate this further (as as per my prev comment, the free'ing should be controlled by refcnt, so it should be ok) - can you somehow (even if randomly) reproduce this crash? I can eventually create a quick patch for troubleshooting those refcnt's |
The easiest way for me to reproduce it was to put a deliberate segfault in I could be wrong but if you have TLS domains setup I don't think there is any way to get OpenSIPS to try to shutdown without triggering the double free. But maybe it only gets detected if you are using the debug allocator? |
@jes , could you please provide the simplest possible cfg for reproducing this and the patch you used for "crashing" the uac_registrant module ? I will try to put some time into reproducing the issue. BTW, are you still able to crash the latest head? |
@bogdan-iancu My patch to make it segfault after opening TLS contexts is:
And then add a TLS user and wait for it to be registered. However after |
@jes , thanks for the update. Agreed, let's close this one, feel free to open a new report if the problem re-surfaces. |
Summary
We were observing a crash on shutdown caused by a double-free of the TLS "domains" in
tls_mgm
. Not massively important because it only happens on shutdown, but still worth fixing.Details
The issue was that the
tls_mgm
module was unloaded (free'ing all of its "domains") before the connections were destroyed. This left connections with dangling pointers to the domains, and when the connections are cleaned up, if the reference count is 0,tls_mgm
can then try to free the domain again, causing a crash.Solution
The solution is to defer
destroy_modules()
until aftertcp_destroy()
.Compatibility
It is possible that there are other related bugs that could be solved in the same way (i.e. that
destroy_modules()
should move even further down) but I've tried to be conservative.Closing issues