Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[memprof] Dump call site matching information #125130

Merged
merged 3 commits into from
Feb 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -970,6 +970,7 @@ static void
readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
const TargetLibraryInfo &TLI,
std::map<uint64_t, AllocMatchInfo> &FullStackIdToAllocMatchInfo,
std::set<std::vector<uint64_t>> &MatchedCallSites,
DenseMap<uint64_t, LocToLocMap> &UndriftMaps) {
auto &Ctx = M.getContext();
// Previously we used getIRPGOFuncName() here. If F is local linkage,
Expand Down Expand Up @@ -1210,6 +1211,13 @@ readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
addCallsiteMetadata(I, InlinedCallStack, Ctx);
// Only need to find one with a matching call stack and add a single
// callsite metadata.

// Accumulate call site matching information upon request.
if (ClPrintMemProfMatchInfo) {
std::vector<uint64_t> CallStack;
append_range(CallStack, InlinedCallStack);
MatchedCallSites.insert(std::move(CallStack));
}
break;
}
}
Expand Down Expand Up @@ -1266,13 +1274,17 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
// it to an allocation in the IR.
std::map<uint64_t, AllocMatchInfo> FullStackIdToAllocMatchInfo;

// Set of the matched call sites, each expressed as a sequence of an inline
// call stack.
std::set<std::vector<uint64_t>> MatchedCallSites;

for (auto &F : M) {
if (F.isDeclaration())
continue;

const TargetLibraryInfo &TLI = FAM.getResult<TargetLibraryAnalysis>(F);
readMemprof(M, F, MemProfReader.get(), TLI, FullStackIdToAllocMatchInfo,
UndriftMaps);
MatchedCallSites, UndriftMaps);
}

if (ClPrintMemProfMatchInfo) {
Expand All @@ -1281,6 +1293,13 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
<< " context with id " << Id << " has total profiled size "
<< Info.TotalSize << (Info.Matched ? " is" : " not")
<< " matched\n";

for (const auto &CallStack : MatchedCallSites) {
errs() << "MemProf callsite match for inline call stack";
for (uint64_t StackId : CallStack)
errs() << " " << StackId;
errs() << "\n";
}
}

return PreservedAnalyses::none();
Expand Down
114 changes: 114 additions & 0 deletions llvm/test/Transforms/PGOProfile/memprof-dump-matched-call-sites.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
; Tests that the compiler dumps call site matches upon request.
;
; The test case is generated from:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Elaborated Tests to generate the yaml and the IR?
https://llvm.org/docs/TestingGuide.html#elaborated-tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer! I wasn't aware of the gen feature, but the ability to regenerate the whole thing is very appealing. That said, I like cleaned-up IR in this particular case because I care about debugging information being human readable. I'll definitely keep "Elaborated Tested" in my tool box. Thanks again!

;
; // main
; // |
; // f1 (noinline)
; // |
; // f2
; // |
; // f3 (noinline)
; // |
; // new
;
; __attribute__((noinline)) char *f3() { return ::new char[4]; }
;
; static char *f2() { return f3(); }
;
; __attribute__((noinline)) static char *f1() { return f2(); }
;
; int main() {
; f1();
; return 0;
; }
;
; Here we expect to match two inline call stacks:
;
; - [main]
; - [f1, f2]
;
; Note that f3 is considered to be an allocation site, not a call site, because
; it directly calls new after inlining.

; REQUIRES: x86_64-linux
; RUN: split-file %s %t
; RUN: llvm-profdata merge %t/memprof-dump-matched-call-site.yaml -o %t/memprof-dump-matched-call-site.memprofdata
; RUN: opt < %t/memprof-dump-matched-call-site.ll -passes='memprof-use<profile-filename=%t/memprof-dump-matched-call-site.memprofdata>' -memprof-print-match-info -S 2>&1 | FileCheck %s

;--- memprof-dump-matched-call-site.yaml
---
HeapProfileRecords:
- GUID: main
AllocSites: []
CallSites:
- - { Function: main, LineOffset: 1, Column: 3, IsInlineFrame: false }
- GUID: _ZL2f1v
AllocSites: []
CallSites:
- - { Function: _ZL2f2v, LineOffset: 0, Column: 28, IsInlineFrame: true }
- { Function: _ZL2f1v, LineOffset: 0, Column: 54, IsInlineFrame: false }
- GUID: _ZL2f2v
AllocSites: []
CallSites:
- - { Function: _ZL2f2v, LineOffset: 0, Column: 28, IsInlineFrame: true }
- { Function: _ZL2f1v, LineOffset: 0, Column: 54, IsInlineFrame: false }
- GUID: _Z2f3v
AllocSites:
- Callstack:
- { Function: _Z2f3v, LineOffset: 0, Column: 47, IsInlineFrame: false }
- { Function: _ZL2f2v, LineOffset: 0, Column: 28, IsInlineFrame: true }
- { Function: _ZL2f1v, LineOffset: 0, Column: 54, IsInlineFrame: false }
- { Function: main, LineOffset: 1, Column: 3, IsInlineFrame: false }
MemInfoBlock:
AllocCount: 1
TotalSize: 4
TotalLifetime: 0
TotalLifetimeAccessDensity: 0
CallSites: []
...
;--- memprof-dump-matched-call-site.ll
; CHECK: MemProf notcold context with id 3894143216621363392 has total profiled size 4 is matched
; CHECK: MemProf callsite match for inline call stack 4745611964195289084 10616861955219347331
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be impossible to compare against the allocation match messages which use the full context hash - should we (optionally?) emit the constituent stack ids of the allocations as well?

Copy link
Contributor Author

@kazutakahirata kazutakahirata Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we cannot compare these stack IDs against anything else dumped by the compiler. I am thinking about an external tool to dump the full call stack along its constituent stack IDs.

If we are interested in assessing how well we are matching call sites, we need an external tool to know the universe (i.e. the list of all full call stacks) so that we can compute the set complement -- "universe" - "all call site matches" (roughly speaking). We could ask one invocation of the compiler to dump the entire profile contents, including constituent stack IDs, but picking one compiler invocation feels a bit weird. It's cleaner to have a separate tool to dump the profile IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other alternative is to have the compiler optionally dump out this correspondence for each matched allocation, however, there is already a need to deduplicate these across the compilations and that would increase the volume quite a bit. The upside is that we wouldn't need another tool and tool invocation. The other upside is that the compiler trims unneeded parts of the context during matching, which would reduce the volume and the need for redoing this analysis externally. But that can be considered for a follow on change if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@teresajohnson can you elaborate on this -- compiler trims unneeded parts of the context during matching? Are you referring to this FIXME?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I meant the trimming done in CallStackTrie::buildAndAttachMIBMetadata called here:

bool MemprofMDAttached = AllocTrie.buildAndAttachMIBMetadata(CI);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @teresajohnson about this offline -- the fact that it trims unneeded contexts is orthogonal to how we want to evaluate the matching quality so we can continue the current approach. We can evaluate whether more precision is required if simple heuristics for prioritization (e.g. look at N frames from the leaf) don't yield good results.

; CHECK: MemProf callsite match for inline call stack 5401059281181789382

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define ptr @_Z2f3v() {
entry:
%call = call ptr @_Znam(i64 0), !dbg !3
ret ptr null
}

declare ptr @_Znam(i64)

define i32 @main() {
entry:
call void @_ZL2f1v(), !dbg !7
ret i32 0
}

define void @_ZL2f1v() {
entry:
%call.i = call ptr @_Z2f3v(), !dbg !9
ret void
}

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}

!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1)
!1 = !DIFile(filename: "match.cc", directory: "/")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = !DILocation(line: 11, column: 47, scope: !4)
!4 = distinct !DISubprogram(name: "f3", linkageName: "_Z2f3v", scope: !1, file: !1, line: 11, type: !5, scopeLine: 11, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
!5 = !DISubroutineType(types: !6)
!6 = !{}
!7 = !DILocation(line: 18, column: 3, scope: !8)
!8 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 17, type: !5, scopeLine: 17, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
!9 = !DILocation(line: 13, column: 28, scope: !10, inlinedAt: !11)
!10 = distinct !DISubprogram(name: "f2", linkageName: "_ZL2f2v", scope: !1, file: !1, line: 13, type: !5, scopeLine: 13, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !0)
!11 = distinct !DILocation(line: 15, column: 54, scope: !12)
!12 = distinct !DISubprogram(name: "f1", linkageName: "_ZL2f1v", scope: !1, file: !1, line: 15, type: !13, scopeLine: 15, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !0)
!13 = !DISubroutineType(cc: DW_CC_nocall, types: !6)
10 changes: 10 additions & 0 deletions llvm/test/Transforms/PGOProfile/memprof.ll
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,16 @@
; MEMPROFMATCHINFO: MemProf cold context with id 15737101490731057601 has total profiled size 10 is matched
; MEMPROFMATCHINFO: MemProf cold context with id 16342802530253093571 has total profiled size 10 is matched
; MEMPROFMATCHINFO: MemProf cold context with id 18254812774972004394 has total profiled size 10 is matched
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 748269490701775343
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 1544787832369987002
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 2061451396820446691
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 2104812325165620841
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 6281715513834610934
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 8467819354083268568
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 8690657650969109624
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 9086428284934609951
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 12481870273128938184
; MEMPROFMATCHINFO: MemProf callsite match for inline call stack 12699492813229484831

; ModuleID = 'memprof.cc'
source_filename = "memprof.cc"
Expand Down