Skip to content

TApplencourt/mkl-verbose-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

___  ___ _   __ _       _   _ _________________  _____ _____ _____   _____ _____  _____ _      _   _______ _____ 
|  \/  || | / /| |     | | | |  ___| ___ \ ___ \|  _  /  ___|  ___| |_   _|  _  ||  _  | |    | | / /_   _|_   _|
| .  . || |/ / | |     | | | | |__ | |_/ / |_/ /| | | \ `--.| |__     | | | | | || | | | |    | |/ /  | |   | |  
| |\/| ||    \ | |     | | | |  __||    /| ___ \| | | |`--. \  __|    | | | | | || | | | |    |    \  | |   | |  
| |  | || |\  \| |____ \ \_/ / |___| |\ \| |_/ /\ \_/ /\__/ / |___    | | \ \_/ /\ \_/ / |____| |\  \_| |_  | |  
\_|  |_/\_| \_/\_____/  \___/\____/\_| \_\____/  \___/\____/\____/    \_/  \___/  \___/\_____/\_| \_/\___/  \_/  
                                                                                                                 

When building applications that call Intel MKL functions / cuBLAS functions, it may be useful to determine:

  • which computational functions are called,
  • what parameters are passed to them, and
  • how much time is spent to execute the functions.

You can get an application to print this information to a standard output device by enabling Intel MKL Verbose (https://software.intel.com/en-us/articles/verbose-mode-supported-in-intel-mkl-112) or cuBLAS log (https://docs.nvidia.com/cuda/cublas/index.html#cublasLoggerConfigure)

We propose script to facilitate the generation and the parsing of Math Verbose log file:

  • To parse the log file and generate summary tables, use mkl_parse.py`` or cublas_parse.py`
  • To restrict the verbosity for one MPI_RANK, use mkl_hook wrapper.

{mkl,cublas}_parse

Generate a summary for "{MKL,cublas}_verbosed" log files. For LAPACK, display the cummulative time spend by function, the range of arguments of those function, and the most expensive function calls.

For FTT, display the rank and I/O tensors. Support batched calls.

Requirement / Installation:

python3 is need and the following packages:

  • tabulate
  • greenlet
  • tqdm

One can simply do:

pip install -r requirements.txt
conda install --file requirements.txt

Usage

usage: {mkl,cublas}_parse.py [-h] [-n #] [filename]

Generate a summary for "{MKL,cublas}_verbosed" log files. $MKLROOT will be used to
match MKL arguments name with respective values.

positional arguments:
  filename              If filename is not provided, std.in will be used

optional arguments:
  -h, --help            show this help message and exit
  -n #, --num_routines #
                        The number of routines to print out

Please set $MKLROOT to get the name of BLAS/LAPACK arguments in the summary.

MKL Example

>> cat log.out | wc -l
141013

>> more log.out 
MKL_VERBOSE Intel(R) MKL 2019.0 Update 2 Product build 20190118 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.40GHz intel_thread
MKL_VERBOSE FFT(dcfi30x30x30,tLim:4,desc:0x1c9c140) 452.82us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcbi30x30x30,tLim:4,desc:0x1cb66c0) 118.36us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcfi30x30x30,tLim:4,desc:0x1cc0240) 165.08us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcbi30x30x30,tLim:4,desc:0x1cc9680) 84.77us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcfi30x30x30,tLim:4,desc:0x1cc0240) 102.58us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcbi30x30x30,tLim:4,desc:0x1cc9680) 83.00us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcfi30x30x30,tLim:4,desc:0x1cc0240) 102.86us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcbi30x30x30,tLim:4,desc:0x1cc9680) 81.79us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcfi30x30x30,tLim:4,desc:0x1cc0240) 99.21us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcbi30x30x30,tLim:4,desc:0x1cc9680) 83.10us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE FFT(dcfi30x30x30,tLim:4,desc:0x1cc0240) 99.13us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4

MKL_VERBOSE DGESVD(S,S,45,45,0x3decc70,45,0x3d4eb30,0x3df0bc0,45,0x3ef4cf0,45,0x3ef8c40,2250,0) 1.39ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1 WDiv:HOST:+0.000
MKL_VERBOSE DGETRF(3072,3072,0x7fddf19e9010,3072,0x333da80,0) 376.01ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1 WDiv:HOST:+0.000
...

>> cat tests/mkl_verbose.raw.log | ./mkl_parse.py -n 2
~= SUMMARY ~=
                 Count (#)    Time (s)
-------------  -----------  ----------
BLAS / LAPACK          295   9.91731
FFT                     11   0.0014727

~= BLAS / LAPACK ~=

Top 2 function by execution time (accumulated by names)
Name      Count (#)    Time (s)        %
------  -----------  ----------  -------
DGETRI            8     4.08207  41.1611
DGETRF            8     3.01759  30.4275
other           279     2.81765  28.4114

Top 2 function by execution time (accumulated by arguments)
Name    Argv                                                                     Count (#)    Time (s)        %
------  ---------------------------------------------------------------------  -----------  ----------  -------
DGETRI  [('n', '3072'), ('lda', '3072'), ('lwork', '1179648'), ('info', '0')]            8     4.08207  41.1611
DGETRF  [('m', '3072'), ('n', '3072'), ('lda', '3072'), ('info', '0')]                   8     3.01759  30.4275
other                                                                                  279     2.81765  28.4114

Top 2 function by execution time
Name    Argv                                                                     Time (s)         %
------  ---------------------------------------------------------------------  ----------  --------
DGETRI  [('n', '3072'), ('lda', '3072'), ('lwork', '1179648'), ('info', '0')]     0.52748   5.31878
DGETRI  [('n', '3072'), ('lda', '3072'), ('lwork', '1179648'), ('info', '0')]     0.52185   5.26201
other                                                                             8.86798  89.4192

~= FFT ~=

Top 2 FFT call  by execution time (accumulated)
precision    domain    direction    placement    dimensions      Count (#)    Time (s)        %
-----------  --------  -----------  -----------  ------------  -----------  ----------  -------
Double       Complex   Forward      in-place     30x30x30                6  0.00102168  69.3746
Double       Complex   Backward     in-place     30x30x30                5  0.00045102  30.6254

Top 2 FFT call by execution time
precision    domain    direction    placement    dimensions      Time (s)        %
-----------  --------  -----------  -----------  ------------  ----------  -------
Double       Complex   Forward      in-place     30x30x30      0.00045282  30.7476
Double       Complex   Forward      in-place     30x30x30      0.00016508  11.2093

mkl_hook

Wrapper for your application. Assure that only one MPI_RANK set the MKL_VERBOSE enviroment.

Usage:
./mkl_hook $prog $argv

About

Tools to run and parse MKL verbose mode

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published