MI300X (gfx942) support for broadcast operations #621

joelandman · 2024-04-16T14:54:18Z

Simple reproducer, not sure if this specific use case is supported or not. CPU and GPU versions for comparison. MI300X GPU, Ubuntu 22.04. ROCm 6.1 pre-release.

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8* (2024-03-01 10:14 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 9354 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 128 virtual cores)
Environment:
  LD_LIBRARY_PATH = /home/amd/local/lib::/home/amd/local/lib:/home/amd/.npm_modules/lib

using AMDGPU

julia> AMDGPU.devices()
┌────┬─────────────────────┬────────────────────────┬───────────┬─────────────┐
│ Id │                Name │               GCN arch │ Wavefront │      Memory │
├────┼─────────────────────┼────────────────────────┼───────────┼─────────────┤
│  1 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  2 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  3 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  4 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  5 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  6 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  7 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  8 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
└────┴─────────────────────┴────────────────────────┴───────────┴─────────────┘


# CPU version
a_h = rand(Float16,5,5)
z_h = a_h .- Float16(0.5)

# GPU version 1
a_d = ROCMatrix(rand(Float16,5,5))
z_d = a_d .- Float16(0.5)

# GPU version 2
b_d = AMDGPU.rand(Float16,5,5)
y_d = b_d .- Float16(0.5)

The a_h and z_h are as expected.

julia> # CPU version
       a_h = rand(Float16,5,5)
5×5 Matrix{Float16}:
 0.0796  0.5674  0.3735  0.588    0.1387
 0.3408  0.747   0.1177  0.01953  0.165
 0.962   0.4517  0.1626  0.834    0.1772
 0.1313  0.248   0.0947  0.311    0.46
 0.51    0.6123  0.593   0.1958   0.356

julia> z_h = a_h .- Float16(0.5)
5×5 Matrix{Float16}:
 -0.4204     0.0674   -0.1265   0.0879  -0.3613
 -0.1592     0.2471   -0.3823  -0.4805  -0.335
  0.462     -0.04834  -0.3374   0.334   -0.3228
 -0.3687    -0.252    -0.4053  -0.189   -0.04004
  0.009766   0.1123    0.0928  -0.3042  -0.144

The a_d and b_d are properly set, though the subtraction yields this

julia> # GPU version 1
       a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.4282  0.3154    0.796    0.391    0.6763
 0.413   0.9087    0.791    0.613    0.5547
 0.768   0.004883  0.09033  0.12305  0.9023
 0.6484  0.4707    0.827    0.9595   0.8643
 0.3164  0.2783    0.4043   0.2222   0.9355

julia> z_d = a_d .- Float16(0.5)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
warning: sramecc 'On' was requested for a processor that does not support it!
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
warning: sramecc 'On' was requested for a processor that does not support it!
ERROR: LLVM error: Cannot select: 0x55d229b85998: i32,ch = load<(dereferenceable invariant load (s8) from %ir..kernarg.offset7.cast + 33, basealign 8, addrspac                                e 4), zext from i8> 0x55d22a1a9d88, 0x55d228b82c20, undef:i64
  0x55d228b82c20: i64 = add 0x55d22e9b76d0, Constant:i64<153>
    0x55d22e9b76d0: i64,ch = CopyFromReg 0x55d22a1a9d88, Register:i64 %0
      0x55d22e9b7390: i64 = Register %0
    0x55d228b829b0: i64 = Constant<153>
  0x55d22a33edf0: i64 = undef
In function: _Z3_3516ROCKernelContext14ROCDeviceArrayI7Float16Li2ELi1EE11BroadcastedI13ROCArrayStyleILi2E9HIPBufferE5TupleI5OneToI5Int64ES6_IS7_EE1_S5_I8Extrud                                edIS0_IS1_Li2ELi1EES5_I4BoolS10_ES5_IS7_S7_EES1_EES7_
Stacktrace:
  [1] handle_error(reason::Cstring)
    @ LLVM ~/.julia/packages/LLVM/bzSzE/src/core/context.jl:168
  [2] LLVMTargetMachineEmitToMemoryBuffer(T::LLVM.TargetMachine, M::LLVM.Module, codegen::LLVM.API.LLVMCodeGenFileType, ErrorMessage::Base.RefValue{…}, OutMemB                                uf::Base.RefValue{…})
    @ LLVM.API ~/.julia/packages/LLVM/bzSzE/lib/15/libLLVM.jl:4241
  [3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
    @ LLVM ~/.julia/packages/LLVM/bzSzE/src/targetmachine.jl:45
  [4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/mcgen.jl:84
  [5] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:466 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [8] macro expansion
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:463 [inlined]
  [9] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:92
 [10] emit_asm
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:86 [inlined]
 [11]
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:154
 [12] codegen
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:115 [inlined]
 [13]
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:111
 [14] compile
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:103 [inlined]
 [15] #40
    @ ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:172 [inlined]
 [16] JuliaContext(f::AMDGPU.Compiler.var"#40#41"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}}; kwargs::@Kwargs{}                                )
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
 [17] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
 [18] hipcompile(job::GPUCompiler.CompilerJob)
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:171
 [19] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(AMDGPU.Compiler.hipcompi                                le), linker::typeof(AMDGPU.Compiler.hiplink))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:128
 [20] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:103
 [21] macro expansion
    @ ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:139 [inlined]
 [22] macro expansion
    @ ./lock.jl:267 [inlined]
 [23] hipfunction(f::GPUArrays.var"#35#37", tt::Type{Tuple{…}}; kwargs::@Kwargs{name::Nothing})
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:133
 [24] hipfunction
    @ ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:132 [inlined]
 [25] macro expansion
    @ ~/.julia/packages/AMDGPU/gtxsf/src/highlevel.jl:172 [inlined]
 [26] #gpu_call#48
    @ ~/.julia/packages/AMDGPU/gtxsf/src/gpuarrays.jl:8 [inlined]
 [27] gpu_call
    @ ~/.julia/packages/AMDGPU/gtxsf/src/gpuarrays.jl:5 [inlined]
 [28] gpu_call(::GPUArrays.var"#35#37", ::ROCArray{…}, ::Base.Broadcast.Broadcasted{…}, ::Int64; target::ROCArray{…}, elements::Nothing, threads::Int64, blocks                                ::Int64, name::Nothing)
    @ GPUArrays ~/.julia/packages/GPUArrays/OKkAu/src/device/execution.jl:69
 [29] gpu_call
    @ ~/.julia/packages/GPUArrays/OKkAu/src/device/execution.jl:34 [inlined]
 [30] _copyto!
    @ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:82 [inlined]
 [31] copyto!
    @ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:44 [inlined]
 [32] copy
    @ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:29 [inlined]
 [33] materialize(bc::Base.Broadcast.Broadcasted{AMDGPU.ROCArrayStyle{2, AMDGPU.Runtime.Mem.HIPBuffer}, Nothing, typeof(-), Tuple{ROCArray{…}, Float16}})
    @ Base.Broadcast ./broadcast.jl:903
 [34] top-level scope
    @ REPL[77]:1
 [35] top-level scope
    @ ~/.julia/packages/AMDGPU/gtxsf/src/tls.jl:200
Some type information was truncated. Use `show(err)` to see complete types.

The text was updated successfully, but these errors were encountered:

joelandman · 2024-04-16T15:11:23Z

Worth noting that this works on an MI50, and an integrated GPU on 7950x.

MI50

julia> using AMDGPU

julia> AMDGPU.devices()
┌────┬────────────────────┬────────────────────────┬───────────┬────────────┐
│ Id │               Name │               GCN arch │ Wavefront │     Memory │
├────┼────────────────────┼────────────────────────┼───────────┼────────────┤
│  1 │     AMD Radeon VII │ gfx906:sramecc+:xnack- │        64 │ 15.984 GiB │
│  2 │ AMD Radeon RX 6600 │                gfx1032 │        32 │  7.984 GiB │
└────┴────────────────────┴────────────────────────┴───────────┴────────────┘


julia> # CPU version
       a_h = rand(Float16,5,5)
5×5 Matrix{Float16}:
 0.1758  0.2559  0.8525  0.0625  0.987
 0.0957  0.4429  0.949   0.593   0.4824
 0.46    0.945   0.9917  0.738   0.010254
 0.779   0.7344  0.9824  0.544   0.0332
 0.503   0.977   0.31    0.3086  0.523

julia> z_h = a_h .- Float16(0.5)

       # GPU version 1
5×5 Matrix{Float16}:
 -0.3242   -0.2441    0.3525  -0.4375    0.4868
 -0.4043   -0.05713   0.4492   0.0928   -0.01758
 -0.04004   0.4448    0.4917   0.2378   -0.4897
  0.2788    0.2344    0.4824   0.04395  -0.4668
  0.00293   0.477    -0.19    -0.1914    0.02295

julia> a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.3027  0.502   0.3276  0.0796  0.456
 0.1606  0.4282  0.1875  0.816   0.2573
 0.5347  0.8003  0.5215  0.103   0.0908
 0.7695  0.8228  0.802   0.8037  0.187
 0.475   0.1553  0.608   0.8735  0.25

julia> z_d = a_d .- Float16(0.5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 -0.1973    0.001953  -0.1724   -0.4204  -0.04395
 -0.3394   -0.0718    -0.3125    0.316   -0.2427
  0.03467   0.3003     0.02148  -0.397   -0.4092
  0.2695    0.3228     0.3018    0.3037  -0.313
 -0.0249   -0.3447     0.1079    0.3735  -0.25

julia> # GPU version 2
       b_d = AMDGPU.rand(Float16,5,5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.674    0.4595  0.624    0.0912  0.821
 0.02998  0.4895  0.02676  0.385   0.4805
 0.522    0.978   0.4788   0.684   0.8164
 0.1853   0.9688  0.39     0.3337  0.5186
 0.00983  0.3857  0.4546   0.846   0.3872

julia> y_d = b_d .- Float16(0.5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
  0.1738   -0.04053   0.124    -0.4087   0.3208
 -0.47     -0.0105   -0.4731   -0.115   -0.01953
  0.02197   0.478    -0.02124   0.1841   0.3164
 -0.3147    0.4688   -0.1101   -0.1663   0.01855
 -0.4902   -0.11426  -0.0454    0.3462  -0.1128

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8* (2024-03-01 10:14 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen Threadripper 1950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver1)
Threads: 8 default, 0 interactive, 4 GC (on 32 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/opt/rocm-6.1.0-13294/lib:/nvme/home/joe/local/lib

7950X


julia> using AMDGPU

julia> AMDGPU.devices()
┌────┬─────────────────────┬──────────┬───────────┬───────────┐
│ Id │                Name │ GCN arch │ Wavefront │    Memory │
├────┼─────────────────────┼──────────┼───────────┼───────────┤
│  1 │ AMD Radeon Graphics │  gfx1030 │        32 │ 8.000 GiB │
└────┴─────────────────────┴──────────┴───────────┴───────────┘


julia> a_h = rand(Float16,5,5)
5×5 Matrix{Float16}:
 0.2427  0.2471  0.9004  0.56     0.273
 0.5806  0.3276  0.943   0.5425   0.4692
 0.267   0.1074  0.5127  0.543    0.418
 0.708   0.8306  0.273   0.2222   0.929
 0.9204  0.5894  0.561   0.09766  0.1562

julia> z_h = a_h .- Float16(0.5)
5×5 Matrix{Float16}:
 -0.2573   -0.253     0.4004     0.06006  -0.227
  0.08057  -0.1724    0.4429     0.04248  -0.03076
 -0.2329   -0.3926    0.012695   0.04297  -0.08203
  0.208     0.3306   -0.227     -0.2778    0.4292
  0.4204    0.08936   0.06104   -0.4023   -0.3438

julia> a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.6113   0.4038  0.931   0.2935  0.8135
 0.02002  0.994   0.3389  0.249   0.508
 0.1992   0.5254  0.963   0.4     0.749
 0.844    0.709   0.1333  0.3687  0.9595
 0.1138   0.4258  0.2104  0.735   0.294

julia> z_d = a_d .- Float16(0.5)

5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
  0.1113  -0.0962    0.4312  -0.2065   0.3135
 -0.48     0.4941   -0.1611  -0.251    0.007812
 -0.3008   0.02539   0.463   -0.1001   0.249
  0.3442   0.209    -0.3667  -0.1313   0.4595
 -0.3862  -0.0742   -0.2896   0.2349  -0.206

julia> # GPU version 2
       b_d = AMDGPU.rand(Float16,5,5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.7783  0.3125  0.989    0.4648  0.1595
 0.7236  0.7017  0.8687   0.3203  0.914
 0.962   0.72    0.03864  0.386   0.156
 0.1991  0.754   0.69     0.517   0.9272
 0.5283  0.822   0.859    0.2283  0.7993

julia> y_d = b_d .- Float16(0.5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
  0.2783   -0.1875   0.4888  -0.03516  -0.3403
  0.2236    0.2017   0.3687  -0.1797    0.414
  0.462     0.2202  -0.4614  -0.114    -0.344
 -0.3008    0.254    0.19     0.01709   0.4272
  0.02832   0.3218   0.359   -0.2717    0.2993

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8* (2024-03-01 10:14 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 32 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/usr/local/cuda-12.3/lib64:/nvme/home/joe/local/lib
  JULIA_HOME = /nvme/home/joe/local

luraess · 2024-04-16T15:57:54Z

Do we miss something to support gfx942 @pxl-th ?

joelandman · 2024-04-16T16:05:43Z

Note: gfx942 is new and not widely available, so I didn't expect everything to work. I'm happy to work on this with you though.

pxl-th · 2024-04-17T14:08:58Z

Probably because of Julia's 1.10 LLVM version, which is 15, but gfx942 officially was added in LLVM 17 IIUC:
llvm/llvm-project@9d05727

You can try Julia 1.11 early release (which has LLVM 16), but I haven't tested it at all with AMD GPUs yet.
In the worst case, we'd have to wait for LLVM 17 to arrive in Julia, which is this PR:
JuliaLang/julia#53070

efaulhaber · 2024-04-19T19:52:08Z

Julia 1.11:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.0-beta1 (2024-04-10)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using AMDGPU
Precompiling AMDGPU...
Info Given AMDGPU was explicitly requested, output will be shown live 
ERROR: LoadError: UndefVarError: `CodeCache` not defined in `GPUCompiler`
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:42
 [2] top-level scope
   @ ~/.julia/packages/AMDGPU/gtxsf/src/AMDGPU.jl:75
 [3] include
   @ ./Base.jl:558 [inlined]
 [4] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
   @ Base ./loading.jl:2721
 [5] top-level scope
   @ stdin:4
in expression starting at ~/.julia/packages/AMDGPU/gtxsf/src/AMDGPU.jl:1
in expression starting at stdin:4
  ✗ AMDGPU
  0 dependencies successfully precompiled in 5 seconds. 108 already precompiled.

ERROR: The following 1 direct dependency failed to precompile:

AMDGPU 

Failed to precompile AMDGPU [21141c5a-9bdb-4563-92ae-f87d6854732e] to "~/.julia/compiled/v1.11/AMDGPU/jl_hqPvGn".
ERROR: LoadError: UndefVarError: `CodeCache` not defined in `GPUCompiler`
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:42
 [2] top-level scope
   @ ~/.julia/packages/AMDGPU/gtxsf/src/AMDGPU.jl:75
 [3] include
   @ ./Base.jl:558 [inlined]
 [4] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
   @ Base ./loading.jl:2721
 [5] top-level scope
   @ stdin:4
in expression starting at ~/.julia/packages/AMDGPU/gtxsf/src/AMDGPU.jl:1
in expression starting at stdin:

pxl-th · 2024-05-20T13:45:50Z

AMDGPU 0.9 now supports Julia 1.11 and maybe MI300X.
Just make sure to launch Julia with JULIA_LLVM_ARGS="-opaque-pointers" env variable set to use system-wide ROCm device libraries instead of our patched ones.

paulnovo · 2024-06-08T16:11:40Z

Just got a similar issue as the original post with Jullia 1.11.0-beta2, ROCm 6.1.2, and AMDGPU 0.9.5. With and without setting JULIA_LLVM_ARGS="-opaque-pointers".

julia> a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.644    0.2002  0.208   0.4048   0.6567
 0.774    0.4253  0.667   0.03662  0.1997
 0.7725   0.6445  0.95    0.2876   0.715
 0.2764   0.4453  0.6836  0.4277   0.1118
 0.02197  0.5454  0.3564  0.354    0.8027

julia> z_d = a_d .- Float16(0.5)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
...
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
warning: sramecc 'On' was requested for a processor that does not support it!
ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#35#37")(::AMDGPU.ROCKernelContext, ::AMDGPU.Device.ROCDeviceMatrix{…}, ::Base.Broadcast.Broadcasted{…}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to pointerset(ptr::Core.LLVMPtr{T, A}, x::T, i::I, ::Val{align}) where {T, A, I, align} @ LLVM.Interop none:0)
Stacktrace:
 [1] unsafe_store! (repeats 3 times)
   @ /workspace/packages/LLVM/6cDbl/src/interop/pointer.jl:88
 [2] malloc_hc
   @ /workspace/packages/AMDGPU/OUSjX/src/device/runtime.jl:98
 [3] malloc
   @ /workspace/packages/AMDGPU/OUSjX/src/device/gcn/memory_dynamic.jl:12
 [4] malloc
   @ /workspace/packages/GPUCompiler/nWT2N/src/runtime.jl:88
 [5] macro expansion
   @ /workspace/packages/GPUCompiler/nWT2N/src/runtime.jl:183
 [6] macro expansion
   @ ./none:0
 [7] box
   @ ./none:0
 [8] box_uint64
   @ /workspace/packages/GPUCompiler/nWT2N/src/runtime.jl:212
 [9] multiple call sites
   @ unknown:0
...

I have been testing on Runpod and built a Julia-1.11-rc AMD ROCm template you can use to deploy a MI300X. I am happy to help with any debugging as well.

pxl-th · 2024-06-09T07:57:09Z

We then need Julia 1.12, which has LLVM 17 (1.11 has LLVM 16).
I haven't tested it yet, as 1.11 itself is still in beta, but I can take a look shortly

paulnovo · 2024-06-11T12:56:08Z

I just built Julia from source (also added version 17 to compatible version of LLD_jll and LLVM_jll for AMDGPU), and got the same issue:

# ./julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.12.0-DEV.706 (2024-06-11)
 _/ |\__'_|_|_|\__'_|  |  Commit e7893a1fa4 (0 days old master)
|__/                   |

julia> versioninfo()
Julia Version 1.12.0-DEV.706
Commit e7893a1fa4 (2024-06-11 09:53 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 192 × AMD EPYC 9474F 48-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-17.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 192 virtual cores)
Environment:
  JULIA_DEPOT_PATH = /root/

julia> using AMDGPU

julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────┐
│ Available │ Name             │ Version   │ Path                                                                    │
├───────────┼──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤
│     +     │ LLD              │ -         │ /opt/rocm/llvm/bin/ld.lld                                               │
│     +     │ Device Libraries │ -         │ /root/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45/amdgcn/bitcode │
│     +     │ HIP              │ 6.1.40093 │ /opt/rocm/lib/libamdhip64.so                                            │
│     +     │ rocBLAS          │ 4.1.2     │ /opt/rocm/lib/librocblas.so.4                                           │
│     +     │ rocSOLVER        │ 3.25.0    │ /opt/rocm/lib/librocsolver.so.0                                         │
│     +     │ rocALUTION       │ -         │ /opt/rocm/lib/librocalution.so.1                                        │
│     +     │ rocSPARSE        │ -         │ /opt/rocm/lib/librocsparse.so.1                                         │
│     +     │ rocRAND          │ 2.10.5    │ /opt/rocm/lib/librocrand.so.1                                           │
│     +     │ rocFFT           │ 1.0.27    │ /opt/rocm/lib/librocfft.so.0                                            │
│     +     │ MIOpen           │ 3.1.0     │ /opt/rocm/lib/libMIOpen.so.1                                            │
└───────────┴──────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────┘

[ Info: AMDGPU devices
┌────┬─────────────────────┬────────────────────────┬───────────┬─────────────┐
│ Id │                Name │               GCN arch │ Wavefront │      Memory │
├────┼─────────────────────┼────────────────────────┼───────────┼─────────────┤
│  1 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
└────┴─────────────────────┴────────────────────────┴───────────┴─────────────┘

julia> a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.5596  0.292   0.8354  0.3677   0.641
 0.1567  0.978   0.4614  0.2144   0.717
 0.4023  0.8706  0.9004  0.9033   0.2319
 0.3042  0.3652  0.48    0.02197  0.1309
 0.7817  0.1909  0.4595  0.3193   0.846

julia> z_d = a_d .- Float16(0.5)
ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#35#37")(::AMDGPU.ROCKernelContext, ::AMDGPU.Device.ROCDeviceMatrix{…}, ::Base.Broadcast.Broadcasted{…}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to pointerset(ptr::Core.LLVMPtr{T, A}, x::T, i::I, ::Val{align}) where {T, A, I, align} @ LLVM.Interop none:0)
Stacktrace:
 [1] unsafe_store! (repeats 3 times)
   @ ~/packages/LLVM/6cDbl/src/interop/pointer.jl:88
...

Notably, the 'gfx942' is not a recognized processor for this target (ignoring processor) messages are gone now.

pxl-th · 2024-06-14T07:34:57Z

AMDGPU.jl needs to account for changes in Julia 1.12, I haven't done that yet

giordano · 2024-07-18T16:26:30Z

AMDGPU.jl needs to account for changes in Julia 1.12, I haven't done that yet

Can you give an indication of what needs to be done? I can't promise anything, but I may or may not have a chance to look into this (if it doesn't take too long 🥲)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MI300X (gfx942) support for broadcast operations #621

MI300X (gfx942) support for broadcast operations #621

joelandman commented Apr 16, 2024 •

edited

Loading

joelandman commented Apr 16, 2024 •

edited

Loading

luraess commented Apr 16, 2024

joelandman commented Apr 16, 2024

pxl-th commented Apr 17, 2024

efaulhaber commented Apr 19, 2024

pxl-th commented May 20, 2024

paulnovo commented Jun 8, 2024

pxl-th commented Jun 9, 2024

paulnovo commented Jun 11, 2024

pxl-th commented Jun 14, 2024

giordano commented Jul 18, 2024

MI300X (gfx942) support for broadcast operations #621

MI300X (gfx942) support for broadcast operations #621

Comments

joelandman commented Apr 16, 2024 • edited Loading

joelandman commented Apr 16, 2024 • edited Loading

luraess commented Apr 16, 2024

joelandman commented Apr 16, 2024

pxl-th commented Apr 17, 2024

efaulhaber commented Apr 19, 2024

pxl-th commented May 20, 2024

paulnovo commented Jun 8, 2024

pxl-th commented Jun 9, 2024

paulnovo commented Jun 11, 2024

pxl-th commented Jun 14, 2024

giordano commented Jul 18, 2024

joelandman commented Apr 16, 2024 •

edited

Loading

joelandman commented Apr 16, 2024 •

edited

Loading