-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for Linux CGroup V2 causing dropped packets with recording and a memory limit set #3004
base: 0.x
Are you sure you want to change the base?
Conversation
Thanks for the detailed analysis of the problem and the root causes, this was an interesting read! I agree that it's probably not something we can merge as it is, mostly because I don't know what could be the implications and impact on performance in general on systems not affected by this. That said, I think it does make sense to add a way to simply enable this behaviour, e.g., at compile time. A simply way might be to make this functionality dependant on a define, e.g., one that defines the maximum value you use to trigger the action (100 in your case). I'm imagining something like
(where You can also add a commented out define in
which is what we do in |
@cb22 first of all, thanks for the very interesting analysis! Let me clarify that I'm not an expert of CGroups / Linux cache pages, and honestly this is the first time that I need to deal with such kind of challenge :-) According to Linux kernel docs
so IIUC this system call is useful for (micro?) optimizations and is not guaranteed to work. You mentioned that this is not a bug in Janus, but I'm wondering if this is an issue that could potentially arise in Cgroups V2 docker instances with "common" memory limits (e.g. ~GB). |
Thanks! @lminiero your solution does look good to me, it'd certainly be perfect for our use case. I'd be happy to make the changes and update this PR, depending on your thoughts on the next section. @atoppi I had so many false starts when debugging this, it's not every day one hits strange edge cases in the kernel :) Regarding Of course, if we should rely on this behaviour is indeed a very valid question. The only other alternatives I saw were:
(2) would be the best choice, but it might take quite a while or not be acceptable for the kernel depending on their design decisions.
Yes - the reduced memory limit just causes you to hit it faster. If you give janus say 2GB of memory, you'd then require ~2GB of video data to be written (without any participants leaving since this would cause files to get |
I don't understand why a cache page should increase its size to the actual file size. |
You can see this by running the following:
|
@cb22 would calling |
Oh, got it now. Thanks! |
@cb22 are you limiting the memory for the instance with the |
nvm, I see that |
@cb22 I'm unable to reproduce. Steps done:
After a certain amount of time:
By inspecting
I guess that take into account also the kernel page cache you are mentioning. I'm attaching a screenshot of the test. |
@atoppi apologies for the delay, I'll look over this thread and get back to you properly soon, just been a busy past few days! One thing off the top of my head - what kernel version are you running there? My tests were specifically with Debian, 5.10 and 5.16. @tmatth I remember thinking |
Yeah given your description I would honestly be surprised if it mitigated but it did occur to me that it could. Also IIUC the |
The host machine is running
The docker instance is running
|
@cb22 any update ? |
Hey @atoppi - finally got around to this! I managed to reproduce it reliably on my local system running Arch (5.18.2-arch1-1) - after splitting out most of our application specific signalling stuff etc. It happens both with host mode networking, and Docker mode networking (although, with Docker bridge networking, you won't see the details of the losses in the host's I've attached this here - janus-docker.tar.gz - feel free to run as is, or use parts of it. It should build cleanly, all commits / deps are pinned, and I just pointed the videoroom demo at it. Built using: Some perhaps relevant deviations from the usual configs:
Another thought is that perhaps Ubuntu have patched this issue in their kernel. (FWIW, we've been running my hack in production for a bit now, and we've seen our UDP packet loss go basically to 0 since the rollout. It used to be significant before that.) |
I've getting "Timeout expired for session ..." errors even for just 1 publisher after some seconds. |
Oh I noticed you set session timeout setting to 20 in janus.jcfg, increased it to 60 and now it's fixed. |
I've managed to reproduce the issue with your dockerfile. |
With my setup I got 0 drops, meanwhile with the one you shared I consistently get plenty of NACKs
and many "drops", as read by
as soon as the memory usage reaches the top limit. Since the host kernel is the same, that is not part of the problem. |
@atoppi that's really interesting - can you share your build and config, and how you're running it? I'll try and reproduce on my end too. |
@cb22 finally I managed to reproduce also on my dockerfile. So the issue is definitely confirmed. |
EDIT: I'm keeping the label (the PR is for 0.x), but bear in mind the issue is also present on |
That is a very unusual setting, though, and no browser will send more than 2 or 2.5mbps anyway in a PeerConnection, no matter what we advertise via REMB. |
True - I put that there to speed up reproducing the issue. For us, in production and with my original testing, it was definitely occurring with our standard bitrate cap of ~512kbps. |
Very likely because you have dozens of streams to handle (while I'm testing with just a couple of tabs). |
By inspecting
@cb22 any suggestion on how to further dig the reason for these drops? |
Sorry @atoppi somehow missed this. Right, so that's the same output I saw. A big clue came when I tried a newer kernel that had torvalds/linux@a3ce2b1 in it. If you're running a kernel with that feature ( My UDP memory sysctls were set sky high, so I was quite sure those weren't a problem. The call chain in the kernel is something like (https://github.com/torvalds/linux/blob/a3ce2b109a59ee9670706ae8126dcc04cfe261cd/net/ipv4/udp.c):
|
Unfortunately I don't have that feature on my kernel.
@cb22 How did you check this? I see that As of the general progress on the issue, we are concerned that this dropping is going to be a major issue in the next future for both Janus users and people using cgroups v2 with other services. |
When It should actually be easy enough to confirm by adding a |
Just a little update: I've been hit by the same issue while working on another project (unrelated to Janus) that was writing a huge log file to the disk. So I guess this is still something to consider even with an updated environment:
|
It looks like redis did a similar fix recently.
According to redis docs and comments, the first two calls are used to "make sure data will not remain on the OS's output buffers", since "in Linux posix_fadvise only take effect on writeback-ed pages, so a sync(or fsync, fdatasync) is needed to flush the dirty page before posix_fadvise if we reclaim write cache". Also they added a very useful macro to identify the OS support:
What is not very clear to me is the timing of cache reclaim they are doing. |
What I was thinking of is something like this:
Basically I'd change the draft by:
@cb22 in case you are still interested to the patch, could please take a look and provide some feedback? |
@atoppi at first glance, looks good to me! I can run it in production on a single instance for a few days and give some more feedback, if you'd like. We have monitoring set up for packet drops, so should be able to catch any issues. |
@cb22 that would be great! Thanks! |
@cb22 have you had the chance to test the patch? |
@cb22 just noticed that we never got feedback. Any update on this? |
I'd like to preface this by saying this is not a bug in Janus - but rather subtle / broken behaviour from CGroup V2 on Linux. With a memory limit set, the page cache is considered as 'in use' when allocating network buffers. This PR is a workaround that's worked so far for us, but it's likely not the best solution and might not be suitable for inclusion. This is more to get a discussion going, or even just point anyone who's had the same problem in the right direction!
We deploy Janus in Docker containers with a 512MB memory limit, and started noticing that whenever we have a large-ish number of participants in a room (>6) and recording turned on, within 30 minutes, incoming video packets would start getting lost by Janus. This would manifest as glitchy video (NACKs would be sent and they'd recover, but the cycle was continuous) and slowlink events.
Digging further, these losses would show up as drops in
/proc/net/udp
and with dropwatch. My first thought was that there was something to do with UDP receive buffer sizes going on here, but experimentation with manually tweakingSO_RCVBUF
in libnice didn't yield any results.It seemed related to recording; turning it on / off started and stopped the issue, and the issue disappeared with simply the
fwrite
calls inrecord.c
commented out. I tried a few combinations, such as a different FS, and tmpfs. Interestingly, running on tmpfs did not replicate the problem, but running an ext4 FS image backed by tmpfs did.Eventually, I traced it down to
__sk_mem_raise_allocated
failing in the kernel because it couldn't charge the CGroup for usage. CGroup V2 changed the behaviour, and now keeps track of the page cache on a per CGroup basis, rather than globally in CGroup V1 (why we hadn't hit this before!). Combined with Linux not freeing the page cache when trying to grow a buffer, led to video packets being dropped. I figured audio was unaffected since they're smaller and it didn't need to reallocate.This was easy to confirm; once the server had gotten into the dropping state, running
sync; echo 3 > /proc/sys/vm/drop_caches
fixed it completely up until the page cache filled up again.If you'd like to recreate this, it should be easy enough: