Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2387: Gather hashed trace user events at the end of run #2395

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

cwschilly
Copy link
Contributor

@cwschilly cwschilly commented Jan 10, 2025

Fixes #2387

Copy link

github-actions bot commented Jan 10, 2025

Pipelines results

PR tests (gcc-12, ubuntu, mpich, verbose, kokkos)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (intel icpx, ubuntu, mpich, verbose)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)



The following tests FAILED:
    1 - vt_example:tutorial_1 (Failed)
    2 - vt_example:tutorial_2 (Failed)
    3 - vt_example:callback_1 (Failed)
    4 - vt_example:callback_2 (Failed)
    5 - vt_example:callback_context_1 (Failed)
    6 - vt_example:callback_context_2 (Failed)
    7 - vt_example:lb_iter_1 (Failed)
    8 - vt_example:lb_iter_2 (Failed)
    9 - vt_example:jacobi1d_vt_1 (Failed)
   10 - vt_example:jacobi1d_vt_2 (Failed)
   11 - vt_example:jacobi2d_vt_1 (Failed)
   12 - vt_example:jacobi2d_vt_2 (Failed)
   13 - vt_example:jacobi3d_vt_1 (Failed)
   14 - vt_example:jacobi3d_vt_2 (Failed)
   15 - vt_example:migrate_collection_1 (Failed)
   16 - vt_example:migrate_collection_2 (Failed)
   17 - vt_example:polymorphic_collection_1 (Failed)
   18 - vt_example:polymorphic_collection_2 (Failed)
   19 - vt_example:insertable_collection_1 (Failed)
   20 - vt_example:insertable_collection_2 (Failed)
   21 - vt_example:reduce_integral_1 (Failed)
   22 - vt_example:reduce_integral_2 (Failed)
   23 - vt_example:transpose_1 (Failed)
   24 - vt_example:transpose_2 (Failed)
   25 - vt_example:4d_collection_1 (Failed)
   26 - vt_example:4d_collection_2 (Failed)
   27 - vt_example:group_rooted_1 (Failed)
   28 - vt_example:group_rooted_2 (Failed)
   29 - vt_example:group_collective_1 (Failed)
   30 - vt_example:group_collective_2 (Failed)
   31 - vt_example:hello_world_1 (Failed)
   32 - vt_example:hello_world_2 (Failed)
   33 - vt_example:hello_world_functor_1 (Failed)
   34 - vt_example:hello_world_functor_2 (Failed)
   35 - vt_example:hello_world_collection_1 (Failed)
   36 - vt_example:hello_world_collection_2 (Failed)
   37 - vt_example:hello_world_collection_collective_1 (Failed)
   38 - vt_example:hello_world_collection_collective_2 (Failed)
   39 - vt_example:hello_world_collection_staged_insert_1 (Failed)
   40 - vt_example:hello_world_collection_staged_insert_2 (Failed)
   41 - vt_example:hello_world_collection_reduce_1 (Failed)
   42 - vt_example:hello_world_collection_reduce_2 (Failed)
   43 - vt_example:hello_world_virtual_context_1 (Failed)
   44 - vt_example:hello_world_virtual_context_2 (Failed)
   45 - vt_example:hello_world_virtual_context_remote_1 (Failed)
   46 - vt_example:hello_world_virtual_context_remote_2 (Failed)
   47 - vt_example:ring_1 (Failed)
   48 - vt_example:ring_2 (Failed)
   49 - vt_example:objgroup_1 (Failed)
   50 - vt_example:objgroup_2 (Failed)
   51 - vt_example:hello_reduce_1 (Failed)
   52 - vt_example:hello_reduce_2 (Failed)
   53 - vt_example:lb_data_file_generator_1 (Failed)
   54 - vt_example:lb_data_file_generator_2 (Failed)
   55 - vt_example:rdma_simple_get_1 (Failed%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (clang-9, ubuntu, mpich)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-12, ubuntu, mpich)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-9, ubuntu, mpich, zoltan)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-11, ubuntu, mpich)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-10, ubuntu, openmpi, no LB)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-13, ubuntu, mpich)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-14, ubuntu, mpich, verbose)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)



The following tests FAILED:
    1 - vt_example:tutorial_1 (Failed)
    2 - vt_example:tutorial_2 (Failed)
    3 - vt_example:callback_1 (Failed)
    4 - vt_example:callback_2 (Failed)
    5 - vt_example:callback_context_1 (Failed)
    6 - vt_example:callback_context_2 (Failed)
    7 - vt_example:lb_iter_1 (Failed)
    8 - vt_example:lb_iter_2 (Failed)
    9 - vt_example:jacobi1d_vt_1 (Failed)
   10 - vt_example:jacobi1d_vt_2 (Failed)
   11 - vt_example:jacobi2d_vt_1 (Failed)
   12 - vt_example:jacobi2d_vt_2 (Failed)
   13 - vt_example:jacobi3d_vt_1 (Failed)
   14 - vt_example:jacobi3d_vt_2 (Failed)
   15 - vt_example:migrate_collection_1 (Failed)
   16 - vt_example:migrate_collection_2 (Failed)
   17 - vt_example:polymorphic_collection_1 (Failed)
   18 - vt_example:polymorphic_collection_2 (Failed)
   19 - vt_example:insertable_collection_1 (Failed)
   20 - vt_example:insertable_collection_2 (Failed)
   21 - vt_example:reduce_integral_1 (Failed)
   22 - vt_example:reduce_integral_2 (Failed)
   23 - vt_example:transpose_1 (Failed)
   24 - vt_example:transpose_2 (Failed)
   25 - vt_example:4d_collection_1 (Failed)
   26 - vt_example:4d_collection_2 (Failed)
   27 - vt_example:group_rooted_1 (Failed)
   28 - vt_example:group_rooted_2 (Failed)
   29 - vt_example:group_collective_1 (Failed)
   30 - vt_example:group_collective_2 (Failed)
   31 - vt_example:hello_world_1 (Failed)
   32 - vt_example:hello_world_2 (Failed)
   33 - vt_example:hello_world_functor_1 (Failed)
   34 - vt_example:hello_world_functor_2 (Failed)
   35 - vt_example:hello_world_collection_1 (Failed)
   36 - vt_example:hello_world_collection_2 (Failed)
   37 - vt_example:hello_world_collection_collective_1 (Failed)
   38 - vt_example:hello_world_collection_collective_2 (Failed)
   39 - vt_example:hello_world_collection_staged_insert_1 (Failed)
   40 - vt_example:hello_world_collection_staged_insert_2 (Failed)
   41 - vt_example:hello_world_collection_reduce_1 (Failed)
   42 - vt_example:hello_world_collection_reduce_2 (Failed)
   43 - vt_example:hello_world_virtual_context_1 (Failed)
   44 - vt_example:hello_world_virtual_context_2 (Failed)
   45 - vt_example:hello_world_virtual_context_remote_1 (Failed)
   46 - vt_example:hello_world_virtual_context_remote_2 (Failed)
   47 - vt_example:ring_1 (Failed)
   48 - vt_example:ring_2 (Failed)
   49 - vt_example:objgroup_1 (Failed)
   50 - vt_example:objgroup_2 (Failed)
   51 - vt_example:hello_reduce_1 (Failed)
   52 - vt_example:hello_reduce_2 (Failed)
   53 - vt_example:lb_data_file_generator_1 (Failed)
   54 - vt_example:lb_data_file_generator_2 (Failed)
   55 - vt_example:rdma_simple_get_1 (Failed%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (clang-10, ubuntu, mpich)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 11.2, gcc-9, ubuntu, mpich)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

/vt/src/vt/pipe/pipe_manager.impl.h(135): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&vt::vrt::collection::lb::GreedyLB::collectHandler, Target=vt::objgroup::proxy::ProxyElm<vt::vrt::collection::lb::GreedyLB>]"
          detected during:
            instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&vt::vrt::collection::lb::GreedyLB::collectHandler, Target=vt::objgroup::proxy::ProxyElm<vt::vrt::collection::lb::GreedyLB>]" 
/vt/src/vt/objgroup/proxy/proxy_objgroup.impl.h(221): here
            instantiation of "vt::objgroup::proxy::Proxy<ObjT>::PendingSendType vt::objgroup::proxy::Proxy<ObjT>::reduce<f,Op,Target,Args...>(Target, Args &&...) const [with ObjT=vt::vrt::collection::lb::GreedyLB, f=&vt::vrt::collection::lb::GreedyLB::collectHandler, Op=vt::collective::PlusOp, Target=vt::objgroup::proxy::ProxyElm<vt::vrt::collection::lb::GreedyLB>, Args=<vt::vrt::collection::lb::GreedyPayload>]" 
/vt/src/vt/vrt/collection/balance/greedylb/greedylb.cc(222): here

/vt/src/vt/pipe/pipe_manager.impl.h(135): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&MyObj::handler, Target=vt::objgroup::proxy::ProxyElm<MyObj>]"
          detected during instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&MyObj::handler, Target=vt::objgroup::proxy::ProxyElm<MyObj>]" 
/vt/examples/callback/callback.cc(147): here

/vt/src/vt/pipe/pipe_manager.impl.h(135): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&colHan, Target=vt::vrt::collection::VrtElmProxy<MyCol, vt::Index1D>]"
          detected during instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&colHan, Target=vt::vrt::collection::VrtElmProxy<MyCol, vt::Index1D>]" 
/vt/examples/callback/callback.cc(153): here

/vt/src/vt/pipe/pipe_manager.impl.h(135): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&MyObj::handler, Target=vt::objgroup::proxy::ProxyElm<MyObj>]"
          detected during instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&MyObj::handler, Target=vt::objgroup::proxy::ProxyElm<MyObj>]" 
/vt/examples/callback/callback.cc(147): here

/vt/src/vt/pipe/pipe_manager.impl.h(135): warning: missing return statement at end of non-void function "vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&colHan, Target=vt::vrt::collection::VrtElmProxy<MyCol, vt::Index1D>]"
          detected during instantiation of "auto vt::pipe::PipeManager::makeSend<f,Target>(Target) [with f=&colHan, Target=vt::vrt::collection::VrtElmProxy<MyCol, vt::Index1D>]" 
/vt/examples/callback/callback.cc(153%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (intel icpc, ubuntu, mpich)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhibited by limit max-total-size 
remark #11076: To get full report use -qopt-report=4 -qopt-report-phase ipo
remark #11074: Inlining inhibited by limit max-size 
remark #11074: Inlining inhi%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (gcc-11, ubuntu, mpich, trace runtime, coverage)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)



The following tests FAILED:
    1 - vt_example:tutorial_1 (Failed)
    2 - vt_example:tutorial_2 (Failed)
    3 - vt_example:callback_1 (Failed)
    4 - vt_example:callback_2 (Failed)
    5 - vt_example:callback_context_1 (Failed)
    6 - vt_example:callback_context_2 (Failed)
    7 - vt_example:lb_iter_1 (Failed)
    8 - vt_example:lb_iter_2 (Failed)
    9 - vt_example:jacobi1d_vt_1 (Failed)
   10 - vt_example:jacobi1d_vt_2 (Failed)
   11 - vt_example:jacobi2d_vt_1 (Failed)
   12 - vt_example:jacobi2d_vt_2 (Failed)
   13 - vt_example:jacobi3d_vt_1 (Failed)
   14 - vt_example:jacobi3d_vt_2 (Failed)
   15 - vt_example:migrate_collection_1 (Failed)
   16 - vt_example:migrate_collection_2 (Failed)
   17 - vt_example:polymorphic_collection_1 (Failed)
   18 - vt_example:polymorphic_collection_2 (Failed)
   19 - vt_example:insertable_collection_1 (Failed)
   20 - vt_example:insertable_collection_2 (Failed)
   21 - vt_example:reduce_integral_1 (Failed)
   22 - vt_example:reduce_integral_2 (Failed)
   23 - vt_example:transpose_1 (Failed)
   24 - vt_example:transpose_2 (Failed)
   25 - vt_example:4d_collection_1 (Failed)
   26 - vt_example:4d_collection_2 (Failed)
   27 - vt_example:group_rooted_1 (Failed)
   28 - vt_example:group_rooted_2 (Failed)
   29 - vt_example:group_collective_1 (Failed)
   30 - vt_example:group_collective_2 (Failed)
   31 - vt_example:hello_world_1 (Failed)
   32 - vt_example:hello_world_2 (Failed)
   33 - vt_example:hello_world_functor_1 (Failed)
   34 - vt_example:hello_world_functor_2 (Failed)
   35 - vt_example:hello_world_collection_1 (Failed)
   36 - vt_example:hello_world_collection_2 (Failed)
   37 - vt_example:hello_world_collection_collective_1 (Failed)
   38 - vt_example:hello_world_collection_collective_2 (Failed)
   39 - vt_example:hello_world_collection_staged_insert_1 (Failed)
   40 - vt_example:hello_world_collection_staged_insert_2 (Failed)
   41 - vt_example:hello_world_collection_reduce_1 (Failed)
   42 - vt_example:hello_world_collection_reduce_2 (Failed)
   43 - vt_example:hello_world_virtual_context_1 (Failed)
   44 - vt_example:hello_world_virtual_context_2 (Failed)
   45 - vt_example:hello_world_virtual_context_remote_1 (Failed)
   46 - vt_example:hello_world_virtual_context_remote_2 (Failed)
   47 - vt_example:ring_1 (Failed)
   48 - vt_example:ring_2 (Failed)
   49 - vt_example:objgroup_1 (Failed)
   50 - vt_example:objgroup_2 (Failed)
   51 - vt_example:hello_reduce_1 (Failed)
   52 - vt_example:hello_reduce_2 (Failed)
   53 - vt_example:lb_data_file_generator_1 (Failed)
   54 - vt_example:lb_data_file_generator_2 (Failed)
   55 - vt_example:rdma_simple_get_1 (Failed%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (nvidia cuda 12.2.0, gcc-9, ubuntu, mpich, verbose)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

/vt/lib/CLI/CLI/CLI11.hpp(1029): warning #2361-D: invalid narrowing conversion from "double" to "unsigned long"
          TT { std::declval<CC>() }
               ^
          detected during:
            instantiation of "vt::CLI::detail::is_direct_constructible<T, C>::test [with T=std::vector<std::string, std::allocator<std::string>>, C=double]" based on template arguments <std::vector<std::string, std::allocator<std::string>>, double> at line 1041
            instantiation of class "vt::CLI::detail::is_direct_constructible<T, C> [with T=std::vector<std::string, std::allocator<std::string>>, C=double]" at line 5005
            instantiation of "void vt::CLI::Option::results(T &) const [with T=std::vector<std::string, std::allocator<std::string>>]" at line 5034
            instantiation of "T vt::CLI::Option::as<T>() const [with T=std::vector<std::string, std::allocator<std::string>>]" at line 7315

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/vt/lib/CLI/CLI/CLI11.hpp(1029): warning #2361-D: invalid narrowing conversion from "int" to "unsigned long"
          TT { std::declval<CC>() }
               ^
          detected during:
            instantiation of "vt::CLI::detail::is_direct_constructible<T, C>::test [with T=std::vector<std::string, std::allocator<std::string>>, C=int]" based on template arguments <std::vector<std::string, std::allocator<std::string>>, int> at line 1041
            instantiation of class "vt::CLI::detail::is_direct_constructible<T, C> [with T=std::vector<std::string, std::allocator<std::string>>, C=int]" at line 5005
            instantiation of "void vt::CLI::Option::results(T &) const [with T=std::vector<std::string, std::allocator<std::string>>]" at line 5034
            instantiation of "T vt::CLI::Option::as<T>() const [with T=std::vector<std::string, std::allocator<std::string>>]" at line 7315

The following tests FAILED:
    1 - vt_example:tutorial_1 (Failed)
    2 - vt_example:tutorial_2 (Failed)
    3 - vt_example:tutorial_4 (Failed)
    4 - vt_example:callback_1 (Failed)
    5 - vt_example:callback_2 (Failed)
    6 - vt_example:callback_4 (Failed)
    7 - vt_example:callback_context_1 (Failed)
    8 - vt_example:callback_context_2 (Failed)
    9 - vt_example:callback_context_4 (Failed)
   10 - vt_example:lb_iter_1 (Failed)
   11 - vt_example:lb_iter_2 (Failed)
   12 - vt_example:lb_iter_4 (Failed)
   13 - vt_example:jacobi1d_vt_1 (Failed)
   14 - vt_example:jacobi1d_vt_2 (Failed)
   15 - vt_example:jacobi1d_vt_4 (Failed)
   16 - vt_example:jacobi2d_vt_1 (Failed)
   17 - vt_example:jacobi2d_vt_2 (Failed)
   18 - vt_example:jacobi2d_vt_4 (Failed)
   19 - vt_example:jacobi3d_vt_1 (Failed)
   20 - vt_example:jacobi3d_vt_2 (Failed)
   21 - %0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (clang-13, alpine, mpich)

Build for 56b67a9 (2025-02-14 16:19:23 UTC)

Compilation - successful

Testing - passed

Build log


@cwschilly cwschilly changed the title #2387: Allreduce hashed trace user events at the end of run #2387: Gather hashed trace user events at the end of run Jan 13, 2025
@cwschilly cwschilly force-pushed the 2387-hashed-trace-user-events-should-do-an-all-reduce-at-the-end-instead-of-sending-pt2pts-to-rank-0-during-the-run branch from 1742430 to 7ee3b85 Compare January 20, 2025 17:16
@cwschilly cwschilly force-pushed the 2387-hashed-trace-user-events-should-do-an-all-reduce-at-the-end-instead-of-sending-pt2pts-to-rank-0-during-the-run branch from ef7edca to 9e40158 Compare February 3, 2025 20:40
@cwschilly cwschilly marked this pull request as ready for review February 10, 2025 21:37
@cwschilly cwschilly force-pushed the 2387-hashed-trace-user-events-should-do-an-all-reduce-at-the-end-instead-of-sending-pt2pts-to-rank-0-during-the-run branch from 067d4a3 to 27b7d4c Compare February 10, 2025 21:37
@cwschilly
Copy link
Contributor Author

cwschilly commented Feb 12, 2025

@lifflander @nlslatt

This PR does not compile in trace_only mode because now trace.h includes vt/objgroup/proxy/proxy_objgroup.h, which down the line includes files from magistrate:

FAILED: src/CMakeFiles/vt-trace.dir/vt/trace/trace_lite.cc.o 
/usr/bin/ccache /usr/lib/ccache/g++  -I/vt/src -I/build/vt/release/vt-trace -isystem /vt/lib/fmt/include -isystem /vt/lib/EngFormat-Cpp/include -isystem /vt/lib/yaml-cpp/include -O3 -DNDEBUG -fdiagnostics-color=always -std=c++17 -MD -MT src/CMakeFiles/vt-trace.dir/vt/trace/trace_lite.cc.o -MF src/CMakeFiles/vt-trace.dir/vt/trace/trace_lite.cc.o.d -o src/CMakeFiles/vt-trace.dir/vt/trace/trace_lite.cc.o -c /vt/src/vt/trace/trace_lite.cc
In file included from /vt/src/vt/messaging/message/message.h:51,
                 from /vt/src/vt/messaging/message.h:47,
                 from /vt/src/vt/objgroup/active_func/active_func.h:48,
                 from /vt/src/vt/objgroup/proxy/proxy_objgroup_elm.h:50,
                 from /vt/src/vt/objgroup/proxy/proxy_objgroup.h:50,
                 from /vt/src/vt/trace/trace.h:52,
                 from /vt/src/vt/runtime/runtime.h:55,
                 from /vt/src/vt/trace/trace_user.h:49,
                 from /vt/src/vt/trace/trace_lite.cc:52:
/vt/src/vt/messaging/message/message_serialize.h:51:10: fatal error: checkpoint/checkpoint.h: No such file or directory
   51 | #include <checkpoint/checkpoint.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Is it possible to use proxy in theTrace() while in trace_only mode?

@lifflander
Copy link
Collaborator

@lifflander @nlslatt

This PR does not compile in trace_only mode because now trace.h includes vt/objgroup/proxy/proxy_objgroup.h, which down the line includes files from magistrate:

FAILED: src/CMakeFiles/vt-trace.dir/vt/trace/trace_lite.cc.o 
/usr/bin/ccache /usr/lib/ccache/g++  -I/vt/src -I/build/vt/release/vt-trace -isystem /vt/lib/fmt/include -isystem /vt/lib/EngFormat-Cpp/include -isystem /vt/lib/yaml-cpp/include -O3 -DNDEBUG -fdiagnostics-color=always -std=c++17 -MD -MT src/CMakeFiles/vt-trace.dir/vt/trace/trace_lite.cc.o -MF src/CMakeFiles/vt-trace.dir/vt/trace/trace_lite.cc.o.d -o src/CMakeFiles/vt-trace.dir/vt/trace/trace_lite.cc.o -c /vt/src/vt/trace/trace_lite.cc
In file included from /vt/src/vt/messaging/message/message.h:51,
                 from /vt/src/vt/messaging/message.h:47,
                 from /vt/src/vt/objgroup/active_func/active_func.h:48,
                 from /vt/src/vt/objgroup/proxy/proxy_objgroup_elm.h:50,
                 from /vt/src/vt/objgroup/proxy/proxy_objgroup.h:50,
                 from /vt/src/vt/trace/trace.h:52,
                 from /vt/src/vt/runtime/runtime.h:55,
                 from /vt/src/vt/trace/trace_user.h:49,
                 from /vt/src/vt/trace/trace_lite.cc:52:
/vt/src/vt/messaging/message/message_serialize.h:51:10: fatal error: checkpoint/checkpoint.h: No such file or directory
   51 | #include <checkpoint/checkpoint.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Is it possible to use proxy in theTrace() while in trace_only mode?

No, because in trace_only mode you might not have VT enabled (and thus any components). I think we can just disable the hashed events in trace-only mode. If we move this to trace.cc instead of trace_lite.cc (where I'm assuming it is now) we should be fine.

@cwschilly cwschilly force-pushed the 2387-hashed-trace-user-events-should-do-an-all-reduce-at-the-end-instead-of-sending-pt2pts-to-rank-0-during-the-run branch from 8665c22 to 444d54e Compare February 14, 2025 16:15
Copy link
Contributor

@cz4rs cz4rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I can reproduce the segfault easily:

Thread 1 "hello_world" received signal SIGSEGV, Segmentation fault.
vt::objgroup::ObjGroupManager::makeCollectiveImpl (this=0x0, label="Trace", base=std::unique_ptr<vt::objgroup::holder::HolderBase> = {...}, obj_ptr=0x3096ed0)
    at /home/cz4rs/code/vt/src/vt/objgroup/manager.cc:89
89        auto const id = cur_obj_id_++;
(gdb) bt
#0  vt::objgroup::ObjGroupManager::makeCollectiveImpl (this=0x0, label="Trace", base=std::unique_ptr<vt::objgroup::holder::HolderBase> = {...}, obj_ptr=0x3096ed0)
    at /home/cz4rs/code/vt/src/vt/objgroup/manager.cc:89
#1  0x0000000001c2dcf6 in vt::objgroup::ObjGroupManager::makeCollectiveObj<vt::trace::Trace> (this=0x0, label="Trace", obj=0x3096ed0, 
    holder=std::unique_ptr<vt::objgroup::holder::HolderBase> = {...}) at /home/cz4rs/code/vt/src/vt/objgroup/manager.impl.h:107
#2  0x0000000001c2a022 in vt::objgroup::ObjGroupManager::makeCollective<vt::trace::Trace> (this=0x0, obj=0x3096ed0, label="Trace")
    at /home/cz4rs/code/vt/src/vt/objgroup/manager.impl.h:82
#3  0x0000000001bb309f in vt::trace::Trace::construct (in_prog_name="hello_world") at /home/cz4rs/code/vt/src/vt/trace/trace.cc:133
#4  0x00000000016a6f30 in vt::runtime::component::ComponentConstructor<vt::trace::Trace, void, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>::apply<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&> (args="hello_world")
    at /home/cz4rs/code/vt/src/vt/runtime/component/component.h:73
#5  0x00000000016a6ee0 in vt::runtime::component::Component<vt::trace::Trace>::staticInit<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&> (args="hello_world") at /home/cz4rs/code/vt/src/vt/runtime/component/component.h:123
#6  0x0000000001672470 in vt::runtime::component::(anonymous namespace)::tupleConsImpl<vt::trace::Trace, std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>, 0ul> (tup=..., seq=...) at /home/cz4rs/code/vt/src/vt/runtime/component/component_pack.impl.h:57
#7  0x0000000001672430 in vt::runtime::component::(anonymous namespace)::tupleCons<vt::trace::Trace, std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&> > (tup=...) at /home/cz4rs/code/vt/src/vt/runtime/component/component_pack.impl.h:64
#8  0x00000000016a6aec in vt::runtime::component::ComponentPack::registerComponent<vt::trace::Trace, vt::ctx::Context, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(vt::trace::Trace**, vt::runtime::component::BaseComponent::DepsPack<vt::ctx::Context>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1}::operator()() (this=0x3075488) at /home/cz4rs/code/vt/src/vt/runtime/component/component_pack.impl.h:93
#9  0x00000000016a6ab9 in vt::runtime::component::MovableFnTyped<vt::runtime::component::ComponentPack::registerComponent<vt::trace::Trace, vt::ctx::Context, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(vt::trace::Trace**, vt::runtime::component::BaseComponent::DepsPack<vt::ctx::Context>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1}>::invoke() (this=0x3075480)
    at /home/cz4rs/code/vt/src/vt/runtime/component/movable_fn.h:78
#10 0x000000000179d6af in vt::runtime::component::ComponentPack::construct (this=0x3067d20) at /home/cz4rs/code/vt/src/vt/runtime/component/component_pack.cc:63
#11 0x000000000165facb in vt::runtime::Runtime::initializeComponents (this=0x2f1f510) at /home/cz4rs/code/vt/src/vt/runtime/runtime.cc:963
#12 0x000000000165eca8 in vt::runtime::Runtime::initialize (this=0x2f1f510, force_now=true) at /home/cz4rs/code/vt/src/vt/runtime/runtime.cc:453
#13 0x000000000165ec5e in vt::runtime::Runtime::tryInitialize (this=0x2f1f510) at /home/cz4rs/code/vt/src/vt/runtime/runtime.cc:402
#14 0x000000000165ee4a in vt::runtime::Runtime::initialize (this=0x2f1f510, force_now=false) at /home/cz4rs/code/vt/src/vt/runtime/runtime.cc:491
#15 0x00000000016586f6 in vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::initialize (argc=@0x7fffffffc988: 1, argv=@0x7fffffffc980: 0x307dea0, is_interop=false, 
    comm=0x0, appConfig=0x0) at /home/cz4rs/code/vt/src/vt/collective/collective_ops.cc:238
#16 0x000000000165a35d in vt::initialize (argc=@0x7fffffffc988: 1, argv=@0x7fffffffc980: 0x307dea0, comm=0x0, appConfig=0x0)
    at /home/cz4rs/code/vt/src/vt/collective/startup.cc:78
#17 0x0000000001609692 in main (argc=1, argv=0x307dea0) at /home/cz4rs/code/vt/examples/hello_world/hello_world.cc:52

@cz4rs
Copy link
Contributor

cz4rs commented Feb 18, 2025

@cwschilly Let me know if you need help debugging the failures. I'll try to do a proper review soon-ish.

@cwschilly cwschilly force-pushed the 2387-hashed-trace-user-events-should-do-an-all-reduce-at-the-end-instead-of-sending-pt2pts-to-rank-0-during-the-run branch from b55a195 to 9d4a193 Compare February 26, 2025 20:12
Copy link
Collaborator

@lifflander lifflander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge when it passes!

Copy link
Contributor

@cz4rs cz4rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lifflander lifflander force-pushed the 2387-hashed-trace-user-events-should-do-an-all-reduce-at-the-end-instead-of-sending-pt2pts-to-rank-0-during-the-run branch from 9d4a193 to a7343f4 Compare February 27, 2025 00:48
Copy link

clang-format output for this changeset:

diff --git a/src/vt/pipe/callback/cb_union/cb_raw_base.fwd.h b/src/vt/pipe/callback/cb_union/cb_raw_base.fwd.h
index 416fb3403..974ac0695 100644
--- a/src/vt/pipe/callback/cb_union/cb_raw_base.fwd.h
+++ b/src/vt/pipe/callback/cb_union/cb_raw_base.fwd.h
@@ -44,9 +44,11 @@
 #if !defined INCLUDED_VT_PIPE_CALLBACK_CB_UNION_CB_RAW_BASE_FWD_H
 #define INCLUDED_VT_PIPE_CALLBACK_CB_UNION_CB_RAW_BASE_FWD_H
 
-namespace vt { namespace pipe { namespace callback { namespace cbunion {
+namespace vt {
+namespace pipe { namespace callback { namespace cbunion {
 
-template <typename... Args> struct CallbackTyped;
+template <typename... Args>
+struct CallbackTyped;
 
 struct CallbackRawBaseSingle;
 
diff --git a/src/vt/runtime/component/component_pack.cc b/src/vt/runtime/component/component_pack.cc
index 6775c83bb..00a0f8941 100644
--- a/src/vt/runtime/component/component_pack.cc
+++ b/src/vt/runtime/component/component_pack.cc
@@ -143,7 +143,9 @@ std::list<int> ComponentPack::topoSort() {
   return order;
 }
 
-void ComponentPack::topoSortImpl(int v, std::list<int>& order, bool* visited, bool* visiting) {
+void ComponentPack::topoSortImpl(
+  int v, std::list<int>& order, bool* visited, bool* visiting
+) {
   //fmt::print("impl v={}\n",v);
   vtAbortIf(visiting[v] == true, "Already visiting this node, cycle detected");
   visiting[v] = true;
diff --git a/src/vt/runtime/component/component_pack.h b/src/vt/runtime/component/component_pack.h
index 2b2fa8e41..32a49bef9 100644
--- a/src/vt/runtime/component/component_pack.h
+++ b/src/vt/runtime/component/component_pack.h
@@ -162,7 +162,8 @@ private:
    * \param[in] visited array of visited vertices
    * \param[in] visiting array of currently visiting vertices
    */
-  void topoSortImpl(int v, std::list<int>& order, bool* visited, bool* visiting);
+  void
+  topoSortImpl(int v, std::list<int>& order, bool* visited, bool* visiting);
 
   /**
    * \internal \brief Detect cycles in the dependence graph
diff --git a/src/vt/runtime/runtime.cc b/src/vt/runtime/runtime.cc
index 083468bc9..cb330adee 100644
--- a/src/vt/runtime/runtime.cc
+++ b/src/vt/runtime/runtime.cc
@@ -749,10 +749,11 @@ void Runtime::initializeComponents() {
 # if vt_check_enabled(trace_enabled)
   // The Trace and Scheduler components have a co-dependency. However,
   // the lifetime of theTrace should be longer than that of theSched.
-  p_->registerComponent<trace::Trace>(&theTrace, Deps<
-      ctx::Context,  // Everything depends on theContext
-      objgroup::ObjGroupManager
-  >{},
+  p_->registerComponent<trace::Trace>(
+    &theTrace,
+    Deps<
+      ctx::Context, // Everything depends on theContext
+      objgroup::ObjGroupManager>{},
     prog_name
   );
 # endif
@@ -768,12 +769,12 @@ void Runtime::initializeComponents() {
 #endif
 
   p_->registerComponent<objgroup::ObjGroupManager>(
-    &theObjGroup, Deps<
-      ctx::Context              // Everything depends on theContext
+    &theObjGroup,
+    Deps<ctx::Context // Everything depends on theContext
 
-      // Break this dependency for startup ordering
-      // messaging::ActiveMessenger // Depends on active messenger to send
-    >{}
+         // Break this dependency for startup ordering
+         // messaging::ActiveMessenger // Depends on active messenger to send
+         >{}
   );
 
   p_->registerComponent<messaging::ActiveMessenger>(
diff --git a/src/vt/trace/trace.cc b/src/vt/trace/trace.cc
index e98bae813..675c68e32 100644
--- a/src/vt/trace/trace.cc
+++ b/src/vt/trace/trace.cc
@@ -127,14 +127,13 @@ void Trace::setProxy(objgroup::proxy::Proxy<Trace> in_proxy) {
 }
 #endif
 
-/*static*/ std::unique_ptr<Trace> Trace::construct(std::string const& in_prog_name) {
+/*static*/ std::unique_ptr<Trace>
+Trace::construct(std::string const& in_prog_name) {
   auto ptr = std::make_unique<Trace>(in_prog_name);
-  #if !vt_check_enabled(trace_only)
-  auto proxy = theObjGroup()->makeCollective<Trace>(
-    ptr.get(), "Trace"
-  );
+#if !vt_check_enabled(trace_only)
+  auto proxy = theObjGroup()->makeCollective<Trace>(ptr.get(), "Trace");
   proxy.get()->setProxy(proxy);
-  #endif
+#endif
   return ptr;
 }
 
@@ -219,9 +218,9 @@ void Trace::setUserEvents(const UserEventRegistry& events) {
 }
 
 void Trace::gatherUserEvents() {
-  #if !vt_check_enabled(trace_only)
+#if !vt_check_enabled(trace_only)
   proxy_.reduce<&reducedEventsHan, vt::collective::PlusOp>(0, user_event_);
-  #endif
+#endif
 }
 
 UserEventIDType Trace::registerUserEventRoot(std::string const& name) {
@@ -241,10 +240,12 @@ void Trace::registerUserEventManual(
 void reducedEventsHan(
   [[maybe_unused]] const UserEventRegistry& gathered_user_events
 ) {
-  #if vt_check_enabled(trace_enabled)
-  vtAssert(theContext()->getNode() == 0, "User events must be gathered on node 0");
+#if vt_check_enabled(trace_enabled)
+  vtAssert(
+    theContext()->getNode() == 0, "User events must be gathered on node 0"
+  );
   theTrace()->setUserEvents(gathered_user_events);
-  #endif
+#endif
 }
 
 void insertNewUserEvent(
diff --git a/src/vt/trace/trace.h b/src/vt/trace/trace.h
index 88fc1ef3a..393cb1a97 100644
--- a/src/vt/trace/trace.h
+++ b/src/vt/trace/trace.h
@@ -135,9 +135,9 @@ struct Trace : runtime::component::Component<Trace>, TraceLite {
   void startup() override;
   void finalize() override;
 
-  #if !vt_check_enabled(trace_only)
+#if !vt_check_enabled(trace_only)
   void setProxy(objgroup::proxy::Proxy<Trace> in_proxy);
-  #endif
+#endif
 
   /**
    * \brief Initiate a paired processing event.
@@ -407,31 +407,15 @@ struct Trace : runtime::component::Component<Trace>, TraceLite {
 
   template <typename SerializerT>
   void serialize(SerializerT& s) {
-    s | incremental_flush_mode_
-      | traces_
-      | open_events_
-      | event_holds_
-      | cur_event_
-      | enabled_
-      | idle_begun_
-      | start_time_
-      | user_event_
-      | prog_name_
-      | trace_name_
-      | full_trace_name_
-      | full_sts_name_
-      | full_dir_name_
-      | wrote_sts_file_
-      | trace_write_count_
-      | spec_proxy_
-      #if !vt_check_enabled(trace_only)
+    s | incremental_flush_mode_ | traces_ | open_events_ | event_holds_ |
+      cur_event_ | enabled_ | idle_begun_ | start_time_ | user_event_ |
+      prog_name_ | trace_name_ | full_trace_name_ | full_sts_name_ |
+      full_dir_name_ | wrote_sts_file_ | trace_write_count_ | spec_proxy_
+#if !vt_check_enabled(trace_only)
       | proxy_
-      #endif
-      | trace_enabled_cur_phase_
-      | flush_event_
-      | between_sched_event_type_
-      | between_sched_event_
-      | inside_invoke_context_;
+#endif
+      | trace_enabled_cur_phase_ | flush_event_ | between_sched_event_type_ |
+      between_sched_event_ | inside_invoke_context_;
 
     s.skip(log_file_); // definition unavailable
   }
@@ -445,10 +429,10 @@ private:
 
   ObjGroupProxyType spec_proxy_ = vt::no_obj_group;
 
-  #if !vt_check_enabled(trace_only)
+#if !vt_check_enabled(trace_only)
   // Objgroup proxy
   objgroup::proxy::Proxy<Trace> proxy_;
-  #endif
+#endif
 
   // Processing event between top-level loops.
   TraceEntryIDType between_sched_event_type_ = no_trace_entry_id;
diff --git a/src/vt/trace/trace_user_event.cc b/src/vt/trace/trace_user_event.cc
index e0a13179d..269794027 100644
--- a/src/vt/trace/trace_user_event.cc
+++ b/src/vt/trace/trace_user_event.cc
@@ -126,15 +126,13 @@ bool UserEventRegistry::insertEvent(
       std::forward_as_tuple(name)
     );
     return true;
-  } else if (user_event_[event] != name){
+  } else if (user_event_[event] != name) {
     user_event_[event] += " COLLISION " + name;
   }
   return false;
 }
 
-UserEventRegistry operator+(
-  UserEventRegistry r1, UserEventRegistry const& r2
-) {
+UserEventRegistry operator+(UserEventRegistry r1, UserEventRegistry const& r2) {
   for (auto& [hash, event_str] : r2.getEvents()) {
     r1.insertEvent(hash, event_str);
   }
diff --git a/src/vt/trace/trace_user_event.h b/src/vt/trace/trace_user_event.h
index de4f0bad5..07f10553e 100644
--- a/src/vt/trace/trace_user_event.h
+++ b/src/vt/trace/trace_user_event.h
@@ -123,7 +123,8 @@ struct UserEventRegistry {
 
   friend void insertNewUserEvent(UserEventIDType event, std::string const& name);
 
-  friend UserEventRegistry operator+(UserEventRegistry r1, UserEventRegistry const& r2);
+  friend UserEventRegistry
+  operator+(UserEventRegistry r1, UserEventRegistry const& r2);
 
   template <typename Serializer>
   void serialize(Serializer& s) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hashed trace user events should do an all-reduce at the end instead of sending pt2pts to rank 0 during the run
3 participants