Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.x] Performance issue/improvement audiobridge #3394

Closed
robby2016 opened this issue Jun 12, 2024 · 9 comments
Closed

[1.x] Performance issue/improvement audiobridge #3394

robby2016 opened this issue Jun 12, 2024 · 9 comments
Labels
enhancement multistream Related to Janus 1.x

Comments

@robby2016
Copy link
Contributor

We recently ran into the issue that Janus is consuming a lot of resources even if the instance isn't used and also after a fresh restart. After some investigation we figured out that the problem is probably related to a lot of permanent audiobridge rooms (saved in the config file). As far as I understand it the issue is, that on startup Janus starts one "janus_audiobridge_mixer_thread" per created audiobridge room. Sadly I can't really provide a usable PR but maybe I can suggest to move the creation of "audiobridge->thread" into the "janus_audiobridge_handler" and create the mixer only when a user joins. Ideally you could also stop the mixer once the last participant leaves, but that can be solved for now by simply restarting Janus.

For testing I move the creation of the mixer here.

if (audiobridge->thread == NULL) {
        char myname[16];
        GError *myerror = NULL;
        audiobridge->thread = g_thread_try_new(myname, &janus_audiobridge_mixer_thread, audiobridge, &myerror);
}
@robby2016 robby2016 added the multistream Related to Janus 1.x label Jun 12, 2024
@lminiero
Copy link
Member

I doubt the problem is the thread. The problem is likely that you have something like "always on" set on the mixer, meaning it will work even when there's no one in. That's something you can disable, and basically already what you were planning to add.

@robby2016
Copy link
Contributor Author

robby2016 commented Jun 12, 2024

Wow, thanks for the fast reply. I did see the "always_on" variable but thought that was only if you use rtp_forwarders. In our case the rooms don't use rtp forwarding and have a pretty minimal setup.
Thank you I'll take a closer look at the "always_on" property.

e.g.


room-name :
{
  description = "Room desc";
  is_private = "yes";
  sampling_rate = "16000";
  secret = "123456";
  pin = "123456";
  audiolevel_ext = "yes";
  audiolevel_event = "yes";
  audio_active_packets = "50";
  audio_level_average = "50";
  record = "false";
};

@atoppi
Copy link
Member

atoppi commented Jun 12, 2024

Could you please quantify both "a lot of resources" and "a lot of permanent audiobridge rooms" ?

The only thing that an idle mixer thread does is waking up every 5ms, checking every 15ms if there is some work to do:

if(passed < 15000) { /* Let's wait about 15ms at max */
g_usleep(5000);
continue;
}
/* If we're recording to a wav file, update the info */
if(g_atomic_int_get(&audiobridge->record) && !g_atomic_int_get(&audiobridge->wav_header_added)) {
JANUS_LOG(LOG_VERB, "Adding WAV header for recording %s (%s)...\n", audiobridge->room_id_str, audiobridge->room_name);
janus_audiobridge_rec_add_wav_header(audiobridge);
}
if(!g_atomic_int_get(&audiobridge->record) && g_atomic_int_get(&audiobridge->wav_header_added)) {
JANUS_LOG(LOG_VERB, "Updating WAV header for recording %s (%s)...\n", audiobridge->room_id_str, audiobridge->room_name);
janus_audiobridge_update_wav_header(audiobridge);
}
/* Update the reference time */
before.tv_usec += 20000;
if(before.tv_usec > 1000000) {
before.tv_sec++;
before.tv_usec -= 1000000;
}
/* Do we need to mix at all? */
janus_mutex_lock_nodebug(&audiobridge->mutex);
count = g_hash_table_size(audiobridge->participants);
rf_count = g_hash_table_size(audiobridge->rtp_forwarders);
pf_count = g_hash_table_size(audiobridge->anncs);
if((count+rf_count+pf_count) == 0) {
janus_mutex_unlock_nodebug(&audiobridge->mutex);
/* No participant and RTP forwarders, do nothing */
if(prev_count > 0) {
JANUS_LOG(LOG_INFO, "Last user/forwarder/file just left room %s, going idle...\n", audiobridge->room_id_str);
prev_count = 0;
}
continue;
}

I'd not expect a high usage of resources for this bunch of operations, even though this is not the best design in terms of efficiency when dealing with your scenario (many idle static rooms).

@robby2016
Copy link
Contributor Author

Sorry, you're right I should have given some more information about the setup and performance. On a small testing VPC (2 Cores) with 400 rooms configured, causing a load of around 5, basically making the system unusable. On a much larger system (dedicated) with 64 Cores I have tested it with around 5000 rooms, causing a load of around 10.

@atoppi
Copy link
Member

atoppi commented Jun 12, 2024

Confirmed the behavior. On my 16 cores machine, 10k idle rooms basically melt the CPU.

Prepared a flame graph with 2500 bridges:
perf-1718206307

The issue seems to be the scheduling of the tasks due to the huge amount of sleep calls.

@lminiero
Copy link
Member

I'd argue that this is not an issue. The AudioBridge implements a mixer, and hundreds or thousands of idle ones make little sense to me, and load is to be expected. At any rate, the fix would not IMHO be moving when you create the thread: the moment someone gets in and leaves a second later, and you're back to square one. Maybe this could be solved using conditions and signals, but I'm not sure, as I haven't given this enough thought. That said, to be honest this has a low priority for me: if you're willing to prepare a PR that doesn't involve moving the thread creation, I'd be happy to review it ✌️

@robby2016
Copy link
Contributor Author

First let me say thanks for the great help. I do understand, that this is maybe more of an "edge" case and doesn't have a high priority. Maybe we could mitigate the problem accordingly. Let's leave the threads alone and simply adjust the sleeping time of the mixer based on usage?

Something like this maybe?

@lminiero
Copy link
Member

lminiero commented Sep 4, 2024

@robby2016 apologies for the late answer. I don't think adjusting the sleep time would help, as it would make rooms poorly responsive in those transitions. I personally still think conditions/signals would be the way to go. If you're willing to explore that road in a PR, please do let us know and we'll review it.

@robby2016
Copy link
Contributor Author

@lminiero Agreed, a better solution would be something that would use conditions or signals somehow. I'll look into that. Thanks again for the great help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement multistream Related to Janus 1.x
Projects
None yet
Development

No branches or pull requests

3 participants