-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WorkerSettings: Add disableLiburing option (enable_liburing in Rust) #1442
Conversation
### Details - `createWorker({ disableLiburing: true })` disables LibUring usage despite it's supported by the worker and current host. - Related (still to be fixed) issue which brings lot of context: #1435
OMG Rust tests fail because of this:
I hate that And BTW it fails due to this obviously wrong code that I'm fixing now, but it doesn't justify the problem: if enable_liburing {
spawn_args.push("--disable_liburing".to_string());
} |
Co-authored-by: Nazar Mokrynskyi <[email protected]>
Ok, this is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming CI passes this looks fine to me
Why is this happening in CI???? https://github.com/versatica/mediasoup/actions/runs/10323061408/job/28579657794?pr=1442
|
OMG I don't want to spend all the weekend with this. |
Ok, |
I suspect you might be forcing a different version locally. It wants you to write this idiomatic Rust instead: let settings = WorkerSettings {
enable_liburing: false,
..WorkerSettings::default()
}; |
Why does it complain about this?
This is the code: worker_manager
.create_worker({
let mut settings = WorkerSettings::default();
settings.enable_liburing = false;
settings
})
.await
.expect("Failed to create worker") We do THE SAME in many other files such as here: let worker = worker_manager
.create_worker({
let mut settings = WorkerSettings::default();
settings.log_level = WorkerLogLevel::Debug;
settings.log_tags = vec![WorkerLogTag::Info];
settings.rtc_port_range = 0..=9999;
settings.dtls_files = Some(WorkerDtlsFiles {
certificate: "tests/integration/data/dtls-cert.pem".into(),
private_key: "tests/integration/data/dtls-key.pem".into(),
});
settings.libwebrtc_field_trials =
Some("WebRTC-Bwe-AlrLimitedBackoff/Disabled/".to_string());
settings.app_data = AppData::new(CustomAppData { bar: 456 });
settings
})
.await
.expect("Failed to create worker with custom settings"); |
I suspect it has a heuristic about number of fields or just didn't learn to handle more complex cases, not sure which one of those two |
Pushed, should be green now. |
Generally what it suggests does look a bit nicer and even shorter, but not always |
OMG now tests failing... |
WTH??? https://github.com/versatica/mediasoup/actions/runs/10323816619/job/28581996102?pr=1442
|
Memory issues... again 😕 |
When did this happen in a test recently? I don't remember. |
It fails consistently, this is depressing, no changes have been done that should affect this AT ALL: |
No idea why the issue happens consistently but indeed it's the already know issue in Rust due to the separate usrsctp thread. I worked on this some months ago and there is an unfinished and very complex PR for which honestly I don't have time to revive now. I think I will disable these failing tests because we do know that the underlying code can fail. Well, all this assuming that the failing test is using two workers with pipe transports. If not, ignore all I said here. |
Those are the ways Rust tests systematically fail. Since they run in parallel and taking into account that memory issues could make tests fail at any time once
I would like to say that this is pipe transport related but the third test attempt doesn't even execute any "pipe" test. |
This is completely amazing. Rust tests in v3 branch don't fail. But Rust tests in this PR branch fial consistently. Changes in this PR cannot be the culprit at all, they do not affect anything in Rust. |
@nazar-pc help please? I doubt that Rust tests are even failing due to this well know issue: In a comment above you can see that sometimes pipe transports do not even run and tests still fail. Maybe I'm wrong but AFAIK we only run more than 1 workers in Rust tests in those pipe transport related tests, so I'm completely lost. |
Either changes in this PR are the culprit (however unlikely that seems) or they trigger an issue that existed before as well, but was hard to reproduce.
There are as many tests running concurrently as CPU cores on the machine by default, so we have many workers with potentially different settings running at the same time and order is not deterministic. |
worker/src/Settings.cpp
Outdated
if (!optarg) | ||
{ | ||
MS_THROW_TYPE_ERROR("unknown configuration parameter: %s", optarg); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this deleted? It seems like an important safety check that caller provided something meaningful as an input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the new --disableLiburing
command line argument doesn't have any value so such a check throws if present. I can check that optargs
exist for all the other arguments but didn't consider it necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, but other options do require a value. Maybe we have a test that checks that and it causes memory corruption because you suddenly create a value out of null pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this change:
Is it enough? What do you mean with "you suddenly create a value out of null pointer"? Command line arguments are created by Node and Rust layers in their Worker
classes. Tests can not trigger wrong arguments passed to the worker.
Just wondering about this: In worker/utils.rs
:
pub(super) fn run_worker_with_channels<OE>(
id: WorkerId,
thread_initializer: Option<Arc<dyn Fn() + Send + Sync>>,
args: Vec<String>,
worker_closed: Arc<AtomicBool>,
on_exit: OE,
) -> WorkerRunResult
where
OE: FnOnce(Result<(), ExitError>) + Send + 'static,
{
let (channel, prepared_channel_read, prepared_channel_write) =
Channel::new(Arc::clone(&worker_closed));
let buffer_worker_messages_guard =
channel.buffer_messages_for(SubscriptionTarget::String(std::process::id().to_string()));
std::thread::Builder::new()
.name(format!("mediasoup-worker-{id}"))
.spawn(move || {
if let Some(thread_initializer) = thread_initializer {
thread_initializer();
}
let argc = args.len() as c_int;
let args_cstring = args
.into_iter()
.map(|s| -> CString { CString::new(s).unwrap() })
.collect::<Vec<_>>();
let argv = args_cstring
.iter()
.map(|arg| arg.as_ptr().cast::<c_char>())
.collect::<Vec<_>>();
let version = CString::new(env!("CARGO_PKG_VERSION")).unwrap();
let status_code = unsafe {
let (channel_read_fn, channel_read_ctx, _channel_write_callback) =
prepared_channel_read.deconstruct();
let (channel_write_fn, channel_write_ctx, _channel_read_callback) =
prepared_channel_write.deconstruct();
mediasoup_sys::mediasoup_worker_run(
argc,
argv.as_ptr(),
version.as_ptr(),
0,
0,
channel_read_fn,
channel_read_ctx,
channel_write_fn,
channel_write_ctx,
)
};
Here args
is a command line arguments string, something like:
"--logLevel=warn --disableLiburing"
Maybe something dangerous when doing this?:
let args_cstring = args
.into_iter()
.map(|s| -> CString { CString::new(s).unwrap() })
.collect::<Vec<_>>();
let argv = args_cstring
.iter()
.map(|arg| arg.as_ptr().cast::<c_char>())
.collect::<Vec<_>>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, no problem there IMHO. It just splits the string into these strings:
- "--logLevel=warn"
- "--disableLiburing"
It doesn't do anything like assuming/expecting a "=" symbol, so no danger here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why commit db2252c should fix this problem. It probably won't and, instead of wasting more time on this, I will change the new command line argument and add a value to it. No time to deal with ancient command line args stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're not calling it incorrectly right now, but we could. And that would blow up instead of crashing with a nice message. Do not trust input, at least not to the degree that impacts memory safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's safe and we don't assume anything. Arg values are now mandatory. See latest changes.
Do you mean that different tests run in parallel in the same Rust process? I mean, of course! Ok, I assume there is no way in Rust to create a separate process for each test file, right? |
Yes, it does run in the same process by default. You could probably make it run in separate processes somehow, but we're dealing with memory safe language, so why, right? That is not a solution to the problem we have here anyway 🙂 |
…ents that require a value
No, trust me, I couldn't.
Because there is also C code running its own separate thread: #1352 |
Ok, some news:
So it seems that we are sending some command line arg without value. I will print things to debug. |
Ignore, I forgot to add |
This is strongly depressing. Now WHAT????
|
I am 96% sure that the memory issue is here: |
Ohhh, so we are considering that liburing is enabled or disabled for all workers, but Rust tests run in parallel in same process and in just one of those tests we were disabling liburing. In the other ones liburing is enabled. So here the problem: void DepLibUring::ClassInit()
{
const auto mayor = io_uring_major_version();
const auto minor = io_uring_minor_version();
MS_DEBUG_TAG(info, "liburing version: \"%i.%i\"", mayor, minor);
if (Settings::configuration.liburingDisabled)
{
MS_DEBUG_TAG(info, "liburing disabled by user settings");
return;
}
// This must be called first.
DepLibUring::CheckRuntimeSupport();
if (DepLibUring::IsEnabled())
{
DepLibUring::liburing = new LibUring();
MS_DEBUG_TAG(info, "liburing enabled");
}
else
{
MS_DEBUG_TAG(info, "liburing not enabled");
}
}
|
.create_worker(WorkerSettings { | ||
enable_liburing: false, | ||
..WorkerSettings::default() | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nazar-pc, despite this works at intended (default settings are used but enable_liburing
which is set to false
instead), is this correct idiomatic syntax in Rust?
In Node the default object must be added first, then those fields whose value must be different:
options = { ...options, foo: false }
It surprises me that in Rust it works if the changes are added first before the object with default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, I believe ..
should be used at the very end, I don't think it compiles otherwise at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. For me idiomatic would be "here some values and after them those that override them" but it's ok. Rust guys have their own anti idiomatic concept of what idiomatic should mean.
Finally green. Merging! |
Details
createWorker({ disableLiburing: true })
disables liburing usage despite it's supported by the worker and current host.create_worker({ enable_liburing: false })
disables liburing usage despite it's supported by the worker and current host.