You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docker run --gpus all --mount="type=bind,src="${PWD}",target=/work" --workdir="/work" "gcr.io/iree-oss/openxla-benchmark/cuda11.8-cudnn8.9@sha256:c39107c4160e749b7c4bac18862c6c1b6d56e1aa60644a4fe323e315ffba0a0b" /work/xla-tools-dir/hlo_runner_main --hlo_file=/work/xla_hlo_before_optimizations.txt --device_type=gpu --num_repeats=50 --input_format=text --num_replicas=1 --num_partitions=1 --logtostderr
2023-08-04 19:15:21.721351: I xla/service/service.cc:168] XLA service 0x5640370dddd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-08-04 19:15:21.721415: I xla/service/service.cc:176] StreamExecutor device (0): NVIDIA A100-SXM4-40GB, Compute Capability 8.0
2023-08-04 19:15:21.721767: I xla/pjrt/gpu/se_gpu_pjrt_client.cc:633] Using BFC allocator.
2023-08-04 19:15:21.721826: I xla/pjrt/gpu/gpu_helpers.cc:105] XLA backend allocating 31753961472 bytes on device 0 for BFCAllocator.
2023-08-04 19:15:31.158463: I xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8900
2023-08-04 19:15:34.067278: I xla/stream_executor/gpu/asm_compiler.cc:328] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_295', 996 bytes spill stores, 1108 bytes spill loads
2023-08-04 19:15:36.668819: W xla/service/gpu/runtime/support.cc:58] Intercepted XLA runtime error:
INTERNAL: Unexpected GEMM dtype: f32 f32 f16
2023-08-04 19:15:36.699421: F xla/tools/multihost_hlo_runner/hlo_runner_main.cc:121] Non-OK-status: xla::FunctionalHloRunner::LoadAndRunAndDump( *client.value(), preproc_options, raw_compile_options, running_options, {hlo_file}, input_format, dump_output_literal_to, task_id) status: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.gemm' failed: Unexpected GEMM dtype: f32 f32 f16; current tracing scope: custom-call; current profiling annotation: XlaModule:#hlo_module=extracted,program_id=131#.
Reproduce:
wget -O xla_hlo_before_optimizations.txt https://storage.googleapis.com/iree-model-artifacts/jax/jax_models_0.4.13_1688607404/BERT_LARGE_FP16_JAX_384XI32_BATCH1/xla_hlo_before_optimizations.txt
docker run --gpus all --mount="type=bind,src="${PWD}",target=/work" --workdir="/work""gcr.io/iree-oss/openxla-benchmark/cuda11.8-cudnn8.9@sha256:c39107c4160e749b7c4bac18862c6c1b6d56e1aa60644a4fe323e315ffba0a0b" /work/xla-tools-dir/hlo_runner_main --hlo_file=/work/xla_hlo_before_optimizations.txt --device_type=gpu --num_repeats=50 --input_format=text --num_replicas=1 --num_partitions=1 --logtostderr
The text was updated successfully, but these errors were encountered:
Reproduce:
The text was updated successfully, but these errors were encountered: