-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[memprof] Dump call site matching information #125130
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -0,0 +1,114 @@ | ||||
; Tests that the compiler dumps call site matches upon request. | ||||
; | ||||
; The test case is generated from: | ||||
; | ||||
; // main | ||||
; // | | ||||
; // f1 (noinline) | ||||
; // | | ||||
; // f2 | ||||
; // | | ||||
; // f3 (noinline) | ||||
; // | | ||||
; // new | ||||
; | ||||
; __attribute__((noinline)) char *f3() { return ::new char[4]; } | ||||
; | ||||
; static char *f2() { return f3(); } | ||||
; | ||||
; __attribute__((noinline)) static char *f1() { return f2(); } | ||||
; | ||||
; int main() { | ||||
; f1(); | ||||
; return 0; | ||||
; } | ||||
; | ||||
; Here we expect to match two inline call stacks: | ||||
; | ||||
; - [main] | ||||
; - [f1, f2] | ||||
; | ||||
; Note that f3 is considered to be an allocation site, not a call site, because | ||||
; it directly calls new after inlining. | ||||
|
||||
; REQUIRES: x86_64-linux | ||||
; RUN: split-file %s %t | ||||
; RUN: llvm-profdata merge %t/memprof-dump-matched-call-site.yaml -o %t/memprof-dump-matched-call-site.memprofdata | ||||
; RUN: opt < %t/memprof-dump-matched-call-site.ll -passes='memprof-use<profile-filename=%t/memprof-dump-matched-call-site.memprofdata>' -memprof-print-match-info -S 2>&1 | FileCheck %s | ||||
|
||||
;--- memprof-dump-matched-call-site.yaml | ||||
--- | ||||
HeapProfileRecords: | ||||
- GUID: main | ||||
AllocSites: [] | ||||
CallSites: | ||||
- - { Function: main, LineOffset: 1, Column: 3, IsInlineFrame: false } | ||||
- GUID: _ZL2f1v | ||||
AllocSites: [] | ||||
CallSites: | ||||
- - { Function: _ZL2f2v, LineOffset: 0, Column: 28, IsInlineFrame: true } | ||||
- { Function: _ZL2f1v, LineOffset: 0, Column: 54, IsInlineFrame: false } | ||||
- GUID: _ZL2f2v | ||||
AllocSites: [] | ||||
CallSites: | ||||
- - { Function: _ZL2f2v, LineOffset: 0, Column: 28, IsInlineFrame: true } | ||||
- { Function: _ZL2f1v, LineOffset: 0, Column: 54, IsInlineFrame: false } | ||||
- GUID: _Z2f3v | ||||
AllocSites: | ||||
- Callstack: | ||||
- { Function: _Z2f3v, LineOffset: 0, Column: 47, IsInlineFrame: false } | ||||
- { Function: _ZL2f2v, LineOffset: 0, Column: 28, IsInlineFrame: true } | ||||
- { Function: _ZL2f1v, LineOffset: 0, Column: 54, IsInlineFrame: false } | ||||
- { Function: main, LineOffset: 1, Column: 3, IsInlineFrame: false } | ||||
MemInfoBlock: | ||||
AllocCount: 1 | ||||
TotalSize: 4 | ||||
TotalLifetime: 0 | ||||
TotalLifetimeAccessDensity: 0 | ||||
CallSites: [] | ||||
... | ||||
;--- memprof-dump-matched-call-site.ll | ||||
; CHECK: MemProf notcold context with id 3894143216621363392 has total profiled size 4 is matched | ||||
; CHECK: MemProf callsite match for inline call stack 4745611964195289084 10616861955219347331 | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These will be impossible to compare against the allocation match messages which use the full context hash - should we (optionally?) emit the constituent stack ids of the allocations as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, we cannot compare these stack IDs against anything else dumped by the compiler. I am thinking about an external tool to dump the full call stack along its constituent stack IDs. If we are interested in assessing how well we are matching call sites, we need an external tool to know the universe (i.e. the list of all full call stacks) so that we can compute the set complement -- "universe" - "all call site matches" (roughly speaking). We could ask one invocation of the compiler to dump the entire profile contents, including constituent stack IDs, but picking one compiler invocation feels a bit weird. It's cleaner to have a separate tool to dump the profile IMHO. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The other alternative is to have the compiler optionally dump out this correspondence for each matched allocation, however, there is already a need to deduplicate these across the compilations and that would increase the volume quite a bit. The upside is that we wouldn't need another tool and tool invocation. The other upside is that the compiler trims unneeded parts of the context during matching, which would reduce the volume and the need for redoing this analysis externally. But that can be considered for a follow on change if needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @teresajohnson can you elaborate on this -- There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No I meant the trimming done in CallStackTrie::buildAndAttachMIBMetadata called here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Chatted with @teresajohnson about this offline -- the fact that it trims unneeded contexts is orthogonal to how we want to evaluate the matching quality so we can continue the current approach. We can evaluate whether more precision is required if simple heuristics for prioritization (e.g. look at N frames from the leaf) don't yield good results. |
||||
; CHECK: MemProf callsite match for inline call stack 5401059281181789382 | ||||
|
||||
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" | ||||
target triple = "x86_64-unknown-linux-gnu" | ||||
|
||||
define ptr @_Z2f3v() { | ||||
entry: | ||||
%call = call ptr @_Znam(i64 0), !dbg !3 | ||||
ret ptr null | ||||
} | ||||
|
||||
declare ptr @_Znam(i64) | ||||
|
||||
define i32 @main() { | ||||
entry: | ||||
call void @_ZL2f1v(), !dbg !7 | ||||
ret i32 0 | ||||
} | ||||
|
||||
define void @_ZL2f1v() { | ||||
entry: | ||||
%call.i = call ptr @_Z2f3v(), !dbg !9 | ||||
ret void | ||||
} | ||||
|
||||
!llvm.dbg.cu = !{!0} | ||||
!llvm.module.flags = !{!2} | ||||
|
||||
!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1) | ||||
!1 = !DIFile(filename: "match.cc", directory: "/") | ||||
!2 = !{i32 2, !"Debug Info Version", i32 3} | ||||
!3 = !DILocation(line: 11, column: 47, scope: !4) | ||||
!4 = distinct !DISubprogram(name: "f3", linkageName: "_Z2f3v", scope: !1, file: !1, line: 11, type: !5, scopeLine: 11, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0) | ||||
!5 = !DISubroutineType(types: !6) | ||||
!6 = !{} | ||||
!7 = !DILocation(line: 18, column: 3, scope: !8) | ||||
!8 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 17, type: !5, scopeLine: 17, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0) | ||||
!9 = !DILocation(line: 13, column: 28, scope: !10, inlinedAt: !11) | ||||
!10 = distinct !DISubprogram(name: "f2", linkageName: "_ZL2f2v", scope: !1, file: !1, line: 13, type: !5, scopeLine: 13, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !0) | ||||
!11 = distinct !DILocation(line: 15, column: 54, scope: !12) | ||||
!12 = distinct !DISubprogram(name: "f1", linkageName: "_ZL2f1v", scope: !1, file: !1, line: 15, type: !13, scopeLine: 15, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !0) | ||||
!13 = !DISubroutineType(cc: DW_CC_nocall, types: !6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use Elaborated Tests to generate the yaml and the IR?
https://llvm.org/docs/TestingGuide.html#elaborated-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer! I wasn't aware of the
gen
feature, but the ability to regenerate the whole thing is very appealing. That said, I like cleaned-up IR in this particular case because I care about debugging information being human readable. I'll definitely keep "Elaborated Tested" in my tool box. Thanks again!