-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
videoroom: Release some references created by remote publishers #3359
Conversation
I don't think that's wise: the map you see in the publisher's struct is only a shallow reference, since the actual map is stored in the publisher_stream struct, and we do have the |
Thanks for your feedback. I had a problem where the forwarders were not released properly when my controller websocket connection was closed and the patch fixed this. So maybe my problem must be fixed differently - I'll do some more tests and provide an update here. |
I'd check if the cause is that there are publisher streams (and maybe entire publishers) whose references never get to zero causing their forwarders not to be destroyed either. Did you check by uncommenting the |
508e84d
to
00e3d3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some more tests with refcount debugging enabled. My initial assumption was wrong, and the problem was actually in the remote publishers that stayed active.
@@ -13421,7 +13428,7 @@ static void *janus_videoroom_remote_publisher_thread(void *user_data) { | |||
} | |||
g_list_free(subscribers); | |||
/* Free streams */ | |||
g_list_free(publisher->streams); | |||
g_list_free_full(publisher->streams, (GDestroyNotify)(janus_videoroom_publisher_stream_unref)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other places use janus_videoroom_publisher_stream_destroy
but as this is also used for streams_byid
, the reference would not be decreased again, resulting in a leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should be a destroy, then, and the related maps both be unrefs. Besides, if that's the rationale for the list, then it should not only be fixed for remote publishers, but for all of them, since we do a simple g_list_free(participant->streams)
in janus_videoroom_hangup_media_internal
too. That said, we should thread carefully here, as I don't remember seeing leaks normally (all my local builds use libasan).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some debugging and it looks like that at the time janus_videoroom_subscriber_free
or janus_videoroom_publisher_free
are called, the streams
list and the streams_byid
/ streams_bymid
hashmaps are empty - at least in the scenarios I tested.
However at the end of the remove publisher thread, the streams list is not empty, resulting in the leak if janus_videoroom_publisher_stream_destroy
is used.
@@ -2433,6 +2433,8 @@ static void janus_videoroom_publisher_dereference_nodebug(janus_videoroom_publis | |||
janus_refcount_decrease_nodebug(&p->ref); | |||
} | |||
|
|||
static void janus_videoroom_session_destroy(janus_videoroom_session *session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you want the forward declaration or if one of the two functions should be moved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple of notes inline.
@@ -2468,6 +2470,9 @@ static void janus_videoroom_publisher_destroy(janus_videoroom_publisher *p) { | |||
} | |||
} | |||
janus_mutex_unlock(&p->rtp_forwarders_mutex); | |||
/* Release dummy session of the forwarder */ | |||
if(p->session && p->session->handle == NULL) | |||
janus_videoroom_session_destroy(p->session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here, and why does this mention a dummy session? This is not a function only used for dummy publishers: it's the destroy function for all publishers, and is not meant to be called directly, it's the callback for the cleanup from sessions
. At the very least, this should have a p->dummy
check too to ensure it's not done on regular publishers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the session created here:
janus-gateway/src/plugins/janus_videoroom.c
Line 7379 in 195bac0
/* We create a dummy session first, that's not actually bound to anything */ |
It belongs to a regular publisher, so p->dummy
will be false, I'll check for p->remote
which seems to be the flag to use here.
@@ -13421,7 +13428,7 @@ static void *janus_videoroom_remote_publisher_thread(void *user_data) { | |||
} | |||
g_list_free(subscribers); | |||
/* Free streams */ | |||
g_list_free(publisher->streams); | |||
g_list_free_full(publisher->streams, (GDestroyNotify)(janus_videoroom_publisher_stream_unref)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should be a destroy, then, and the related maps both be unrefs. Besides, if that's the rationale for the list, then it should not only be fixed for remote publishers, but for all of them, since we do a simple g_list_free(participant->streams)
in janus_videoroom_hangup_media_internal
too. That said, we should thread carefully here, as I don't remember seeing leaks normally (all my local builds use libasan).
00e3d3d
to
e02d752
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the really late reply, I added a comment to your question and updated the dummy session check.
@@ -2468,6 +2470,9 @@ static void janus_videoroom_publisher_destroy(janus_videoroom_publisher *p) { | |||
} | |||
} | |||
janus_mutex_unlock(&p->rtp_forwarders_mutex); | |||
/* Release dummy session of the forwarder */ | |||
if(p->session && p->session->handle == NULL) | |||
janus_videoroom_session_destroy(p->session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the session created here:
janus-gateway/src/plugins/janus_videoroom.c
Line 7379 in 195bac0
/* We create a dummy session first, that's not actually bound to anything */ |
It belongs to a regular publisher, so p->dummy
will be false, I'll check for p->remote
which seems to be the flag to use here.
@@ -13421,7 +13428,7 @@ static void *janus_videoroom_remote_publisher_thread(void *user_data) { | |||
} | |||
g_list_free(subscribers); | |||
/* Free streams */ | |||
g_list_free(publisher->streams); | |||
g_list_free_full(publisher->streams, (GDestroyNotify)(janus_videoroom_publisher_stream_unref)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some debugging and it looks like that at the time janus_videoroom_subscriber_free
or janus_videoroom_publisher_free
are called, the streams
list and the streams_byid
/ streams_bymid
hashmaps are empty - at least in the scenarios I tested.
However at the end of the remove publisher thread, the streams list is not empty, resulting in the leak if janus_videoroom_publisher_stream_destroy
is used.
@lminiero not sure if you have seen my latest comments. Please let me know you want more changes or if / how I should debug this further. |
@fancycode apologies, I haven't delved in detail in them yet, and I've been a bit busy lately. I'll have a look next week and get back to you: if that doesn't happen, please feel free to shout at me 😄 |
@lminiero No worries, take your time. |
@fancycode could you resolve the conflict? I think it was caused by #3447 since that change impacted remote publishers as well. |
e02d752
to
3167a36
Compare
Done |
Adding the report from ASan (commit 51fe38a)
|
adding the refcount debug output |
Just to be sure, you are testing the current |
@fancycode yes, I mentioned the commit hash, it's from current master. |
Ok, perfect :-) Thanks for confirming. |
@fancycode We prepared a patch according to the issues reported by ASan, UBSan and ref counting. Can you try the attached diff and report? (on top of current master).
|
We started tracking an alternative patch in #3475. @fancycode could you give that a try? |
Thanks!
Not yet, I hope to have feedback sometime the next days. |
Mentioning @spscream too, since they started using remote publishers extensively I think. |
@lminiero we don't pushed remote publishers to production yet, we will start use it in production under load within weeks. Anyway if any crashes or leaks will appear on dev/staging I'll let you know. |
I did some tests with #3475 and could no longer reproduce the leaks I had before. Thanks! |
Superseeded by #3475 |
Fixed some cases where thertp_forwarders
map was created without the destructor function. This resulted in the forwarders not being released when the publisher got destroyed.