WorkerSettings: Add disableLiburing option (enable_liburing in Rust) #1442

ibc · 2024-08-09T16:34:30Z

Details

Node: createWorker({ disableLiburing: true }) disables liburing usage despite it's supported by the worker and current host.
Rust: create_worker({ enable_liburing: false }) disables liburing usage despite it's supported by the worker and current host.
Related (still to be fixed) issue which brings lot of context: Docker and io_uring don't work together - add prebuilt binary without liburing #1435

### Details - `createWorker({ disableLiburing: true })` disables LibUring usage despite it's supported by the worker and current host. - Related (still to be fixed) issue which brings lot of context: #1435

rust/src/worker.rs

ibc · 2024-08-09T17:08:07Z

OMG Rust tests fail because of this:

mediasoup-worker::mediasoup_worker_run() | settings error: unknown long option given as argument

I hate that getopt.h ancient thing...

And BTW it fails due to this obviously wrong code that I'm fixing now, but it doesn't justify the problem:

if enable_liburing {
    spawn_args.push("--disable_liburing".to_string());
}

rust/src/worker.rs

Co-authored-by: Nazar Mokrynskyi <[email protected]>

ibc · 2024-08-09T17:15:15Z

Ok, this is done.

nazar-pc

Assuming CI passes this looks fine to me

ibc · 2024-08-09T18:08:02Z

Why is this happening in CI????

https://github.com/versatica/mediasoup/actions/runs/10323061408/job/28579657794?pr=1442

cargo clippy works in my machine and we force Rust version!!!!

ibc · 2024-08-09T18:08:16Z

OMG I don't want to spend all the weekend with this.

ibc · 2024-08-09T18:10:24Z

Ok, cargo clippy --all-targets -- -D warnings fails in local too.

nazar-pc · 2024-08-09T18:10:41Z

I suspect you might be forcing a different version locally. It wants you to write this idiomatic Rust instead:

let settings = WorkerSettings {
    enable_liburing: false,
    ..WorkerSettings::default()
};

ibc · 2024-08-09T18:14:45Z

Why does it complain about this?

error: field assignment outside of initializer for an instance created with Default::default()
  --> rust/src/router/tests.rs:21:13
   |
21 |             settings.enable_liburing = false;
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: consider initializing the variable with `worker::WorkerSettings { enable_liburing: false, ..Default::default() }` and removing relevant reassignments
  --> rust/src/router/tests.rs:20:13
   |
20 |             let mut settings = WorkerSettings::default();
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is the code:

worker_manager
        .create_worker({
            let mut settings = WorkerSettings::default();
            settings.enable_liburing = false;

            settings
        })
        .await
        .expect("Failed to create worker")

We do THE SAME in many other files such as here:

let worker = worker_manager
            .create_worker({
                let mut settings = WorkerSettings::default();

                settings.log_level = WorkerLogLevel::Debug;
                settings.log_tags = vec![WorkerLogTag::Info];
                settings.rtc_port_range = 0..=9999;
                settings.dtls_files = Some(WorkerDtlsFiles {
                    certificate: "tests/integration/data/dtls-cert.pem".into(),
                    private_key: "tests/integration/data/dtls-key.pem".into(),
                });
                settings.libwebrtc_field_trials =
                    Some("WebRTC-Bwe-AlrLimitedBackoff/Disabled/".to_string());
                settings.app_data = AppData::new(CustomAppData { bar: 456 });

                settings
            })
            .await
            .expect("Failed to create worker with custom settings");

nazar-pc · 2024-08-09T18:16:15Z

I suspect it has a heuristic about number of fields or just didn't learn to handle more complex cases, not sure which one of those two

ibc · 2024-08-09T18:17:33Z

Pushed, should be green now.

nazar-pc · 2024-08-09T18:19:03Z

Generally what it suggests does look a bit nicer and even shorter, but not always

ibc · 2024-08-09T18:31:43Z

OMG now tests failing...

ibc · 2024-08-09T18:32:35Z

WTH???

https://github.com/versatica/mediasoup/actions/runs/10323816619/job/28581996102?pr=1442

  process didn't exit successfully: `/home/runner/work/mediasoup/mediasoup/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)

nazar-pc · 2024-08-09T18:33:00Z

Memory issues... again 😕

ibc · 2024-08-09T18:33:30Z

Memory corruption... again 😕

When did this happen in a test recently? I don't remember.

ibc · 2024-08-09T19:18:14Z

It fails consistently, this is depressing, no changes have been done that should affect this AT ALL:

ibc · 2024-08-09T19:55:22Z

No idea why the issue happens consistently but indeed it's the already know issue in Rust due to the separate usrsctp thread. I worked on this some months ago and there is an unfinished and very complex PR for which honestly I don't have time to revive now. I think I will disable these failing tests because we do know that the underlying code can fail. Well, all this assuming that the failing test is using two workers with pipe transports. If not, ignore all I said here.

ibc · 2024-08-12T09:03:56Z

Those are the ways Rust tests systematically fail. Since they run in parallel and taking into account that memory issues could make tests fail at any time once

running 27 tests
test data_structures::tests::dtls_fingerprint ... ok
test ortc::tests::generate_router_rtp_capabilities_succeeds ... ok
test ortc::tests::generate_router_rtp_capabilities_too_many_codecs ... ok
test ortc::tests::generate_router_rtp_capabilities_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_get_consumable_rtp_parameters_get_consumer_rtp_parameters_get_pipe_consumer_rtp_parameters_succeeds ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_unsupported ... ok
test router::audio_level_observer::tests::router_close_event ... ok
test router::active_speaker_observer::tests::router_close_event ... ok
test router::consumer::tests::transport_close_event ... ok
test router::consumer::tests::producer_close_event ... ok
test router::direct_transport::tests::router_close_event ... ok
test router::data_producer::tests::transport_close_event ... ok
test router::data_consumer::tests::transport_close_event ... ok
test router::data_consumer::tests::data_producer_close_event ... ok
test router::pipe_transport::tests::data_producer_close_is_transmitted_to_pipe_data_consumer ... ok
test router::plain_transport::tests::router_close_event ... ok
test router::producer::tests::transport_close_event ... ok
error: test failed, to rerun pass `-p mediasoup --lib`

Caused by:
  process didn't exit successfully: `/home/runner/work/mediasoup/mediasoup/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)
Error: Process completed with exit code 101.
##[debug]Finishing: cargo test

running 27 tests
test data_structures::tests::dtls_fingerprint ... ok
test ortc::tests::generate_router_rtp_capabilities_succeeds ... ok
test ortc::tests::generate_router_rtp_capabilities_too_many_codecs ... ok
test ortc::tests::generate_router_rtp_capabilities_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_get_consumable_rtp_parameters_get_consumer_rtp_parameters_get_pipe_consumer_rtp_parameters_succeeds ... ok
test router::audio_level_observer::tests::router_close_event ... ok
test router::active_speaker_observer::tests::router_close_event ... ok
test router::consumer::tests::producer_close_event ... ok
test router::data_consumer::tests::data_producer_close_event ... ok
test router::data_consumer::tests::transport_close_event ... ok
test router::consumer::tests::transport_close_event ... ok
test router::data_producer::tests::transport_close_event ... ok
test router::direct_transport::tests::router_close_event ... ok
test router::plain_transport::tests::router_close_event ... ok
test router::pipe_transport::tests::data_producer_close_is_transmitted_to_pipe_data_consumer ... ok
test router::producer::tests::transport_close_event ... ok
error: test failed, to rerun pass `-p mediasoup --lib`

Caused by:
  process didn't exit successfully: `/home/runner/work/mediasoup/mediasoup/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)
Error: Process completed with exit code 101.

running 27 tests
test data_structures::tests::dtls_fingerprint ... ok
test ortc::tests::generate_router_rtp_capabilities_succeeds ... ok
test ortc::tests::generate_router_rtp_capabilities_too_many_codecs ... ok
test ortc::tests::generate_router_rtp_capabilities_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_get_consumable_rtp_parameters_get_consumer_rtp_parameters_get_pipe_consumer_rtp_parameters_succeeds ... ok
test router::active_speaker_observer::tests::router_close_event ... ok
test router::audio_level_observer::tests::router_close_event ... ok
test router::consumer::tests::transport_close_event ... ok
test router::consumer::tests::producer_close_event ... ok
test router::data_consumer::tests::transport_close_event ... ok
test router::direct_transport::tests::router_close_event ... ok
test router::data_producer::tests::transport_close_event ... ok
test router::data_consumer::tests::data_producer_close_event ... ok
test router::plain_transport::tests::router_close_event ... ok
error: test failed, to rerun pass `-p mediasoup --lib`

Caused by:
  process didn't exit successfully: `/home/runner/work/mediasoup/mediasoup/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)
Error: Process completed with exit code 101.

I would like to say that this is pipe transport related but the third test attempt doesn't even execute any "pipe" test.

ibc · 2024-08-12T09:11:39Z

This is completely amazing. Rust tests in v3 branch don't fail. But Rust tests in this PR branch fial consistently. Changes in this PR cannot be the culprit at all, they do not affect anything in Rust.

ibc · 2024-08-12T09:13:27Z

@nazar-pc help please? I doubt that Rust tests are even failing due to this well know issue:

#1352

In a comment above you can see that sometimes pipe transports do not even run and tests still fail. Maybe I'm wrong but AFAIK we only run more than 1 workers in Rust tests in those pipe transport related tests, so I'm completely lost.

nazar-pc · 2024-08-12T09:20:15Z

Changes in this PR cannot be the culprit at all, they do not affect anything in Rust.

Either changes in this PR are the culprit (however unlikely that seems) or they trigger an issue that existed before as well, but was hard to reproduce.

Maybe I'm wrong but AFAIK we only run more than 1 workers in Rust tests in those pipe transport related tests, so I'm completely lost.

There are as many tests running concurrently as CPU cores on the machine by default, so we have many workers with potentially different settings running at the same time and order is not deterministic.

nazar-pc · 2024-08-12T09:23:50Z

worker/src/Settings.cpp

-		if (!optarg)
-		{
-			MS_THROW_TYPE_ERROR("unknown configuration parameter: %s", optarg);
-		}


Why was this deleted? It seems like an important safety check that caller provided something meaningful as an input.

Because the new --disableLiburing command line argument doesn't have any value so such a check throws if present. I can check that optargs exist for all the other arguments but didn't consider it necessary.

Well, but other options do require a value. Maybe we have a test that checks that and it causes memory corruption because you suddenly create a value out of null pointer?

I've made this change:

db2252c

Is it enough? What do you mean with "you suddenly create a value out of null pointer"? Command line arguments are created by Node and Rust layers in their Worker classes. Tests can not trigger wrong arguments passed to the worker.

Just wondering about this: In worker/utils.rs:

pub(super) fn run_worker_with_channels<OE>( id: WorkerId, thread_initializer: Option<Arc<dyn Fn() + Send + Sync>>, args: Vec<String>, worker_closed: Arc<AtomicBool>, on_exit: OE, ) -> WorkerRunResult where OE: FnOnce(Result<(), ExitError>) + Send + 'static, { let (channel, prepared_channel_read, prepared_channel_write) = Channel::new(Arc::clone(&worker_closed)); let buffer_worker_messages_guard = channel.buffer_messages_for(SubscriptionTarget::String(std::process::id().to_string())); std::thread::Builder::new() .name(format!("mediasoup-worker-{id}")) .spawn(move || { if let Some(thread_initializer) = thread_initializer { thread_initializer(); } let argc = args.len() as c_int; let args_cstring = args .into_iter() .map(|s| -> CString { CString::new(s).unwrap() }) .collect::<Vec<_>>(); let argv = args_cstring .iter() .map(|arg| arg.as_ptr().cast::<c_char>()) .collect::<Vec<_>>(); let version = CString::new(env!("CARGO_PKG_VERSION")).unwrap(); let status_code = unsafe { let (channel_read_fn, channel_read_ctx, _channel_write_callback) = prepared_channel_read.deconstruct(); let (channel_write_fn, channel_write_ctx, _channel_read_callback) = prepared_channel_write.deconstruct(); mediasoup_sys::mediasoup_worker_run( argc, argv.as_ptr(), version.as_ptr(), 0, 0, channel_read_fn, channel_read_ctx, channel_write_fn, channel_write_ctx, ) };

Here args is a command line arguments string, something like:

"--logLevel=warn --disableLiburing"

Maybe something dangerous when doing this?:

let args_cstring = args .into_iter() .map(|s| -> CString { CString::new(s).unwrap() }) .collect::<Vec<_>>(); let argv = args_cstring .iter() .map(|arg| arg.as_ptr().cast::<c_char>()) .collect::<Vec<_>>();

Nope, no problem there IMHO. It just splits the string into these strings:

"--logLevel=warn"

"--disableLiburing"

It doesn't do anything like assuming/expecting a "=" symbol, so no danger here.

I don't see why commit db2252c should fix this problem. It probably won't and, instead of wasting more time on this, I will change the new command line argument and add a value to it. No time to deal with ancient command line args stuff.

I know we're not calling it incorrectly right now, but we could. And that would blow up instead of crashing with a nice message. Do not trust input, at least not to the degree that impacts memory safety.

Now it's safe and we don't assume anything. Arg values are now mandatory. See latest changes.

ibc · 2024-08-12T09:29:04Z

Maybe I'm wrong but AFAIK we only run more than 1 workers in Rust tests in those pipe transport related tests, so I'm completely lost.

There are as many tests running concurrently as CPU cores on the machine by default, so we have many workers with potentially different settings running at the same time and order is not deterministic.

Do you mean that different tests run in parallel in the same Rust process? I mean, of course! Ok, I assume there is no way in Rust to create a separate process for each test file, right?

nazar-pc · 2024-08-12T09:31:07Z

Do you mean that different tests run in parallel in the same Rust process? I mean, of course! Ok, I assume there is no way in Rust to create a separate process for each test file, right?

Yes, it does run in the same process by default. You could probably make it run in separate processes somehow, but we're dealing with memory safe language, so why, right? That is not a solution to the problem we have here anyway 🙂

…ents that require a value

ibc · 2024-08-12T09:53:00Z

You could probably make it run in separate processes somehow

No, trust me, I couldn't.

but we're dealing with memory safe language, so why, right?

Because there is also C code running its own separate thread: #1352

ibc · 2024-08-12T10:02:44Z

Ok, some news:

I can reproduce the Rust test failure in a real Linux host, no need to wait for CI.
I've made command line argument mandatory here: 669ddca

Now Rust tests fail even in macOS with this error:

mediasoup-worker::mediasoup_worker_run() | settings error: missing value in command line 
argument: (null)
test router::data_producer::tests::transport_close_event ... ok
test router::tests::worker_close_event ... FAILED

So it seems that we are sending some command line arg without value. I will print things to debug.

ibc · 2024-08-12T10:07:48Z

Ok, some news:
I can reproduce the Rust test failure in a real Linux host, no need to wait for CI.

I've made command line argument mandatory here: 669ddca
Now Rust tests fail even in macOS with this error:
mediasoup-worker::mediasoup_worker_run() | settings error: missing value in command line 
argument: (null)
test router::data_producer::tests::transport_close_event ... ok
test router::tests::worker_close_event ... FAILED
So it seems that we are sending some command line arg without value. I will print things to debug.

Ignore, I forgot to add =true in Rust. Now things should work.

ibc · 2024-08-12T10:16:37Z

This is strongly depressing. Now WHAT????

running 27 tests
test data_structures::tests::dtls_fingerprint ... ok
test ortc::tests::generate_router_rtp_capabilities_succeeds ... ok
test ortc::tests::generate_router_rtp_capabilities_too_many_codecs ... ok
test ortc::tests::generate_router_rtp_capabilities_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_get_consumable_rtp_parameters_get_consumer_rtp_parameters_get_pipe_consumer_rtp_parameters_succeeds ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_unsupported ... ok
test router::audio_level_observer::tests::router_close_event ... ok
test router::active_speaker_observer::tests::router_close_event ... ok
test router::data_consumer::tests::transport_close_event ... ok
test router::consumer::tests::producer_close_event ... ok
test router::consumer::tests::transport_close_event ... ok
test router::data_consumer::tests::data_producer_close_event ... ok
test router::data_producer::tests::transport_close_event ... ok
error: test failed, to rerun pass `-p mediasoup --lib`

Caused by:
  process didn't exit successfully: `/home/deploy/src/mediasoup-v3-deploy/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)

ibc · 2024-08-12T10:22:31Z

I am 96% sure that the memory issue is here:

https://github.com/versatica/mediasoup/pull/1442/files#diff-5a9b1b4b01d5764dd21b1d1b7533d5795430dead60f84d450656db89da4e3930R18

ibc · 2024-08-12T10:30:08Z

Ohhh, so we are considering that liburing is enabled or disabled for all workers, but Rust tests run in parallel in same process and in just one of those tests we were disabling liburing. In the other ones liburing is enabled. So here the problem:

void DepLibUring::ClassInit()
{
	const auto mayor = io_uring_major_version();
	const auto minor = io_uring_minor_version();

	MS_DEBUG_TAG(info, "liburing version: \"%i.%i\"", mayor, minor);

	if (Settings::configuration.liburingDisabled)
	{
		MS_DEBUG_TAG(info, "liburing disabled by user settings");

		return;
	}

	// This must be called first.
	DepLibUring::CheckRuntimeSupport();

	if (DepLibUring::IsEnabled())
	{
		DepLibUring::liburing = new LibUring();

		MS_DEBUG_TAG(info, "liburing enabled");
	}
	else
	{
		MS_DEBUG_TAG(info, "liburing not enabled");
	}
}

DepLibUring::liburing singleton is thread_local static.
However we have static bool enabled.

ibc · 2024-08-12T10:36:05Z

rust/src/router/tests.rs

+        .create_worker(WorkerSettings {
+            enable_liburing: false,
+            ..WorkerSettings::default()
+        })


@nazar-pc, despite this works at intended (default settings are used but enable_liburing which is set to false instead), is this correct idiomatic syntax in Rust?

In Node the default object must be added first, then those fields whose value must be different:

options = { ...options, foo: false }

It surprises me that in Rust it works if the changes are added first before the object with default values.

It is, I believe .. should be used at the very end, I don't think it compiles otherwise at all

Ok. For me idiomatic would be "here some values and after them those that override them" but it's ok. Rust guys have their own anti idiomatic concept of what idiomatic should mean.

ibc · 2024-08-12T11:00:45Z

Finally green. Merging!

WorkerSettings: Add disableLiburing option

0799e37

### Details - `createWorker({ disableLiburing: true })` disables LibUring usage despite it's supported by the worker and current host. - Related (still to be fixed) issue which brings lot of context: #1435

ibc requested review from jmillan and nazar-pc August 9, 2024 16:34

ibc commented Aug 9, 2024

View reviewed changes

rust/src/worker.rs Outdated Show resolved Hide resolved

nazar-pc reviewed Aug 9, 2024

View reviewed changes

rust/src/worker.rs Outdated Show resolved Hide resolved

fix Rust

0529246

ibc requested a review from nazar-pc August 9, 2024 17:01

ibc changed the title ~~WorkerSettings: Add disableLiburing option~~ WorkerSettings: Add disableLiburing option (enable_liburing in Rust) Aug 9, 2024

ibc mentioned this pull request Aug 9, 2024

Docker and io_uring don't work together - add prebuilt binary without liburing #1435

Closed

nazar-pc reviewed Aug 9, 2024

View reviewed changes

rust/src/worker.rs Outdated Show resolved Hide resolved

I shouldn't write more code today

80fbae0

nazar-pc reviewed Aug 9, 2024

View reviewed changes

rust/src/worker.rs Outdated Show resolved Hide resolved

ibc and others added 3 commits August 9, 2024 19:13

make some existing tests disable liburing

975c5ac

Update rust/src/worker.rs

cc97c1a

Co-authored-by: Nazar Mokrynskyi <[email protected]>

stop please

53b68bc

nazar-pc approved these changes Aug 9, 2024

View reviewed changes

are you happy now Rust???

2893da0

cosmetic to see things

5ce4ff8

ibc mentioned this pull request Aug 12, 2024

Test foo #1443

Closed

nazar-pc reviewed Aug 12, 2024

View reviewed changes

Settings.cpp: assert that value is given for those command line argum…

db2252c

…ents that require a value

Force command line argument value

669ddca

fixes

6100cd1

just testing

ebaa031

ibc added 2 commits August 12, 2024 12:30

revert test

6011976

real fix

cd06123

ibc commented Aug 12, 2024

View reviewed changes

update CHANGELOGs

c44c71a

ibc merged commit 43ef325 into v3 Aug 12, 2024
34 checks passed

ibc deleted the worker-add-option-to-disable-liburing branch August 12, 2024 11:02

WorkerSettings: Add disableLiburing option (enable_liburing in Rust) #1442

WorkerSettings: Add disableLiburing option (enable_liburing in Rust) #1442

Conversation

ibc commented Aug 9, 2024 • edited Loading

Details

ibc commented Aug 9, 2024 • edited Loading

ibc commented Aug 9, 2024

nazar-pc left a comment

Choose a reason for hiding this comment

ibc commented Aug 9, 2024

ibc commented Aug 9, 2024

ibc commented Aug 9, 2024

nazar-pc commented Aug 9, 2024

ibc commented Aug 9, 2024

nazar-pc commented Aug 9, 2024

ibc commented Aug 9, 2024

nazar-pc commented Aug 9, 2024

ibc commented Aug 9, 2024

ibc commented Aug 9, 2024

nazar-pc commented Aug 9, 2024 • edited Loading

ibc commented Aug 9, 2024

ibc commented Aug 9, 2024 • edited Loading

ibc commented Aug 9, 2024

ibc commented Aug 12, 2024 • edited Loading

ibc commented Aug 12, 2024

ibc commented Aug 12, 2024

nazar-pc commented Aug 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibc commented Aug 12, 2024

nazar-pc commented Aug 12, 2024

ibc commented Aug 12, 2024

ibc commented Aug 12, 2024

ibc commented Aug 12, 2024

ibc commented Aug 12, 2024

ibc commented Aug 12, 2024

ibc commented Aug 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibc commented Aug 12, 2024

ibc commented Aug 9, 2024 •

edited

Loading

ibc commented Aug 9, 2024 •

edited

Loading

nazar-pc commented Aug 9, 2024 •

edited

Loading

ibc commented Aug 9, 2024 •

edited

Loading

ibc commented Aug 12, 2024 •

edited

Loading