You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have been experimenting with the new rocprofv3 provided by roc profiler-sdk. We use ROCm 6.3.1. It's really great to see such good progress! We have been using it for "application tracing" and "kernel profiling" of a Kokkos-based code. We have been using the Perfettotraceprocessor in Python for post processing.
These are a few suggestions for the output from rocprofv3:
i) The workgroup and grid sizes are currently provided as a reduced product:
It may be interesting to pass them as triples of integers.
ii) The counters are currently provided in the csv output, but not in the pftrace output. It may be interesting to output them in the pftrace table too, e.g. in the args table, similar to how the corr_id is currently handled.
iii) The way the counters are currently provided in the csv output appears somewhat inefficient when multiple counters are asked for. In such a case, entire rows in the csv file are replicated entirely several times, with only a different counter name and value for each counter. It. may be interesting to provide the counter results by extending the rows with multiple (name, value) pairs, thus avoiding duplication.
iv) It may be interesting to highlight in the docs the use of the Perfettotraceprocessor in Python as a way to postprocess the results from the pftrace file. I.e., I think that currently, the docs mention the perfetto UI, but they do not yet mention the traceprocessor.
Operating System
Ubuntu 24.04
GPU
VEGA906
ROCm Component
HPC using Kokkos with HIP backend
The text was updated successfully, but these errors were encountered:
We are working on (ii). Yes, for (iii) it is inefficient but trivial to combine for multi-node data and application replay when the counters change between runs. We are working on a conversion script. For (iv), we are working on a SQL database schema + a Python package for post-processing. We do not intend to rely on Perfetto for numerous reasons. I’ll make a note of (i)
Suggestion Description
We have been experimenting with the new
rocprofv3
provided byroc profiler-sdk
. We useROCm
6.3.1. It's really great to see such good progress! We have been using it for "application tracing" and "kernel profiling" of aKokkos
-based code. We have been using thePerfetto
traceprocessor
inPython
for post processing.These are a few suggestions for the output from
rocprofv3
:i) The workgroup and grid sizes are currently provided as a reduced product:
rocprofiler-sdk/source/lib/output/generatePerfetto.cpp
Lines 518 to 521 in 042c761
It may be interesting to pass them as triples of integers.
ii) The counters are currently provided in the
csv
output, but not in thepftrace
output. It may be interesting to output them in thepftrace
table too, e.g. in theargs
table, similar to how thecorr_id
is currently handled.iii) The way the counters are currently provided in the
csv
output appears somewhat inefficient when multiple counters are asked for. In such a case, entire rows in the csv file are replicated entirely several times, with only a different counter name and value for each counter. It. may be interesting to provide the counter results by extending the rows with multiple (name, value) pairs, thus avoiding duplication.iv) It may be interesting to highlight in the docs the use of the
Perfetto
traceprocessor
inPython
as a way to postprocess the results from thepftrace
file. I.e., I think that currently, the docs mention theperfetto UI
, but they do not yet mention thetraceprocessor
.Operating System
Ubuntu 24.04
GPU
VEGA906
ROCm Component
HPC using Kokkos with HIP backend
The text was updated successfully, but these errors were encountered: