Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BFloat16.jl support #2441

Open
maleadt opened this issue Jul 12, 2024 · 1 comment
Open

BFloat16.jl support #2441

maleadt opened this issue Jul 12, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@maleadt
Copy link
Member

maleadt commented Jul 12, 2024

Julia 1.11 introduces BFloat16 codegen support, so let's use this issue to track support for that.

Right now, it looks like we support the type, but somehow still emit conversions:

julia> BFloat16s.llvm_storage
true

julia> BFloat16s.llvm_arithmetic
true

julia> function kernel(x)
       @inbounds x[threadIdx().x] += BFloat16(1)
         return
       end

julia> x = CuArray{BFloat16}(undef, 1024);

julia> @device_code_llvm debuginfo=:none @cuda kernel(x)
; PTX CompilerJob of MethodInstance for kernel(::CuDeviceVector{BFloat16, 1}) for sm_89
define ptx_kernel void @_Z6kernel13CuDeviceArrayI8BFloat16Li1ELi1EE({ i64, i32 } %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0) local_unnamed_addr {
conversion:
  %.fca.0.extract = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0
  %1 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
  %2 = bitcast i8 addrspace(1)* %.fca.0.extract to bfloat addrspace(1)*
  %3 = zext i32 %1 to i64
  %4 = getelementptr inbounds bfloat, bfloat addrspace(1)* %2, i64 %3
  %5 = load bfloat, bfloat addrspace(1)* %4, align 2
  %6 = fpext bfloat %5 to float
  %7 = fadd float %6, 1.000000e+00
  %8 = fptrunc float %7 to bfloat
  store bfloat %8, bfloat addrspace(1)* %4, align 2
  ret void
}

In addition, the logic in BFloat16s.jl isn't great, as we determine support based on the host processor. It's not clear if we can do better though; this looks a lot like the literal Int issue (where we can't make GPU code use Int32 when the host is Int64).

@maleadt maleadt added the bug Something isn't working label Jul 12, 2024
@maleadt
Copy link
Member Author

maleadt commented Sep 16, 2024

Update: looks like we hit a selection error now

julia> using CUDA, BFloat16s

julia> function foobar(C::AbstractArray, a::Number, b::Number)
           @inbounds C[] = a*b
           return
       end
foobar (generic function with 1 method)

julia> @cuda foobar(CuArray(Float64[0]), one(BFloat16), one(Int32))
ERROR: LLVM error: Cannot select: 0x22563be0: f64 = fp_extend 0x22563b70, /home/tim/.julia/packages/BFloat16s/u3WQc/src/bfloat16.jl:210 @[ number.jl:7 @[ /home/tim/Julia/pkg/CUDA/src/device/array.jl:166 @[ /home/tim/Julia/pkg/CUDA/src/device/array.jl:178 @[ REPL[3]:2 ] ] ] ]
  0x22563b70: bf16 = fmul 0x22563b00, 0x22563320, /home/tim/.julia/packages/BFloat16s/u3WQc/src/bfloat16.jl:227 @[ promotion.jl:430 @[ REPL[3]:2 ] ]
    0x22563b00: bf16 = sint_to_fp 0x22563390, /home/tim/.julia/packages/BFloat16s/u3WQc/src/bfloat16.jl:188 @[ number.jl:7 @[ promotion.jl:375 @[ promotion.jl:400 @[ promotion.jl:430 @[ REPL[3]:2 ] ] ] ] ]
      0x22563390: i32,ch = load<(dereferenceable invariant load (s32) from `i32 addrspace(101)* null`, addrspace 101)> 0x20cf8dc0, TargetExternalSymbol:i64'_Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32_param_3', undef:i64
        0x22563940: i64 = TargetExternalSymbol'_Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32_param_3'
        0x22563010: i64 = undef
    0x22563320: bf16,ch = load<(dereferenceable invariant load (s16) from `bfloat addrspace(101)* null`, addrspace 101)> 0x20cf8dc0, TargetExternalSymbol:i64'_Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32_param_2', undef:i64
      0x225637f0: i64 = TargetExternalSymbol'_Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32_param_2'
      0x22563010: i64 = undef
In function: _Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant