[feature request] Cumulative metric #23

crusaderky · 2024-03-07T12:41:31Z

XREF GIL prometheus metrics are misleading dask/distributed#8557

distributed currently runs (simplified):

knock_knock =  KnockKnock(polling_interval_micros=1000)
knock_knock.start()
cumulative_gil_contention = 0
prev_ts = now = time()

def system_monitor_periodic_callback():
    global cumulative_gil_contention, prev_ts, now

    prev_ts, now = now, time()
    # snip: fairly long list of pure-python code
    cumulative_gil_contention += (now - prev_ts) * knock_knock.contention_metric 
    knock_knock.reset_contention_metric()

This can be problematic, as a user's function can acquire the GIL either between the sampling of time() and the sampling of contention_metric, or between sampling contention_metric and calling reset_contention_metric().

This is a screenshot of a torture test (code here) where:

phase 1: task runs a C extension that completely blocks the GIL for 1 second; wait 10~100ms for the next task; repeat
phase 2: task runs a C extension that completely blocks the GIL for 4 seconds; wait 10~100ms for the next task; repeat
phase 3: task runs a C extension that completely blocks the GIL for 15 seconds; wait 10~100ms for the next task; repeat
phase 4: task runs a hot pure-python for loop constantly

Those points at 0 in the middle of phases 1, 2, and 3 are incorrect.

Proposed design

GILKnocker should offer an API to read its own cumulative metric. This metric should hold the GIL between the moment it samples the timestamp and the moment it samples the GIL use. reset_contention_metric should not clear it.

The text was updated successfully, but these errors were encountered:

crusaderky mentioned this issue Mar 7, 2024

GIL prometheus metrics are misleading dask/distributed#8557

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Cumulative metric #23

[feature request] Cumulative metric #23

crusaderky commented Mar 7, 2024 •

edited

Loading

[feature request] Cumulative metric #23

[feature request] Cumulative metric #23

Comments

crusaderky commented Mar 7, 2024 • edited Loading

Proposed design

crusaderky commented Mar 7, 2024 •

edited

Loading