Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: rocprofv3 output #40

Open
maartenarnst opened this issue Jan 22, 2025 · 2 comments
Open

[Feature]: rocprofv3 output #40

maartenarnst opened this issue Jan 22, 2025 · 2 comments

Comments

@maartenarnst
Copy link

maartenarnst commented Jan 22, 2025

Suggestion Description

We have been experimenting with the new rocprofv3 provided by roc profiler-sdk. We use ROCm 6.3.1. It's really great to see such good progress! We have been using it for "application tracing" and "kernel profiling" of a Kokkos-based code. We have been using the Perfetto traceprocessor in Python for post processing.

These are a few suggestions for the output from rocprofv3:

i) The workgroup and grid sizes are currently provided as a reduced product:

It may be interesting to pass them as triples of integers.

ii) The counters are currently provided in the csv output, but not in the pftrace output. It may be interesting to output them in the pftrace table too, e.g. in the args table, similar to how the corr_id is currently handled.

iii) The way the counters are currently provided in the csv output appears somewhat inefficient when multiple counters are asked for. In such a case, entire rows in the csv file are replicated entirely several times, with only a different counter name and value for each counter. It. may be interesting to provide the counter results by extending the rows with multiple (name, value) pairs, thus avoiding duplication.

iv) It may be interesting to highlight in the docs the use of the Perfetto traceprocessor in Python as a way to postprocess the results from the pftrace file. I.e., I think that currently, the docs mention the perfetto UI, but they do not yet mention the traceprocessor.

Operating System

Ubuntu 24.04

GPU

VEGA906

ROCm Component

HPC using Kokkos with HIP backend

@jrmadsen
Copy link
Contributor

jrmadsen commented Feb 4, 2025

We are working on (ii). Yes, for (iii) it is inefficient but trivial to combine for multi-node data and application replay when the counters change between runs. We are working on a conversion script. For (iv), we are working on a SQL database schema + a Python package for post-processing. We do not intend to rely on Perfetto for numerous reasons. I’ll make a note of (i)

@maartenarnst
Copy link
Author

Sounds great! Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants