[XLA-HLO:GPU] Type errors on BERT_LARGE_FP16_JAX_* models #117

pzread · 2023-08-04T19:20:53Z

docker run --gpus all --mount="type=bind,src="${PWD}",target=/work" --workdir="/work" "gcr.io/iree-oss/openxla-benchmark/cuda11.8-cudnn8.9@sha256:c39107c4160e749b7c4bac18862c6c1b6d56e1aa60644a4fe323e315ffba0a0b" /work/xla-tools-dir/hlo_runner_main --hlo_file=/work/xla_hlo_before_optimizations.txt --device_type=gpu --num_repeats=50 --input_format=text --num_replicas=1 --num_partitions=1 --logtostderr
2023-08-04 19:15:21.721351: I xla/service/service.cc:168] XLA service 0x5640370dddd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-08-04 19:15:21.721415: I xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA A100-SXM4-40GB, Compute Capability 8.0
2023-08-04 19:15:21.721767: I xla/pjrt/gpu/se_gpu_pjrt_client.cc:633] Using BFC allocator.
2023-08-04 19:15:21.721826: I xla/pjrt/gpu/gpu_helpers.cc:105] XLA backend allocating 31753961472 bytes on device 0 for BFCAllocator.
2023-08-04 19:15:31.158463: I xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8900
2023-08-04 19:15:34.067278: I xla/stream_executor/gpu/asm_compiler.cc:328] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_295', 996 bytes spill stores, 1108 bytes spill loads

2023-08-04 19:15:36.668819: W xla/service/gpu/runtime/support.cc:58] Intercepted XLA runtime error:
INTERNAL: Unexpected GEMM dtype: f32 f32 f16
2023-08-04 19:15:36.699421: F xla/tools/multihost_hlo_runner/hlo_runner_main.cc:121] Non-OK-status: xla::FunctionalHloRunner::LoadAndRunAndDump( *client.value(), preproc_options, raw_compile_options, running_options, {hlo_file}, input_format, dump_output_literal_to, task_id) status: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.gemm' failed: Unexpected GEMM dtype: f32 f32 f16; current tracing scope: custom-call; current profiling annotation: XlaModule:#hlo_module=extracted,program_id=131#.

Reproduce:

wget -O xla_hlo_before_optimizations.txt https://storage.googleapis.com/iree-model-artifacts/jax/jax_models_0.4.13_1688607404/BERT_LARGE_FP16_JAX_384XI32_BATCH1/xla_hlo_before_optimizations.txt

docker run --gpus all --mount="type=bind,src="${PWD}",target=/work" --workdir="/work" "gcr.io/iree-oss/openxla-benchmark/cuda11.8-cudnn8.9@sha256:c39107c4160e749b7c4bac18862c6c1b6d56e1aa60644a4fe323e315ffba0a0b" /work/xla-tools-dir/hlo_runner_main --hlo_file=/work/xla_hlo_before_optimizations.txt --device_type=gpu --num_repeats=50 --input_format=text --num_replicas=1 --num_partitions=1 --logtostderr

The text was updated successfully, but these errors were encountered:

pzread added the bug Something isn't working label Aug 8, 2023

mariecwhite mentioned this issue Aug 23, 2023

[Tracking Bug] List of disabled workloads. #125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XLA-HLO:GPU] Type errors on BERT_LARGE_FP16_JAX_* models #117

[XLA-HLO:GPU] Type errors on BERT_LARGE_FP16_JAX_* models #117

pzread commented Aug 4, 2023

[XLA-HLO:GPU] Type errors on BERT_LARGE_FP16_JAX_* models #117

[XLA-HLO:GPU] Type errors on BERT_LARGE_FP16_JAX_* models #117

Comments

pzread commented Aug 4, 2023