FP16 inference support in WebGPU #8360

shanumante-sc · 2024-08-16T16:32:55Z

System information

Describe the feature and the current behavior/state.

We are looking into using WebGPU backend for inference and see a decent improvement (~5-10%) over WebGL for our models, but it is much lower than our expectation.
One potential way to speed up inference would be to use fp16 instead of fp32 data type for tensors. The WebGL backend already supports fp16 which we use. WebGPU also supports fp16, atleast on Chrome desktop (https://chromestatus.com/feature/5180552617656320)
Ideally we would like to use F32_F16 precision as defined in tflite to get best tradeoff between precision loss and performance.

Will this change the current api? How?

An environment flag to set precision (similar to WebGL) would be ideal for ease of integration.

Who will benefit with this feature?

shanumante-sc added the type:feature New feature or request label Aug 16, 2024

shmishra99 assigned pyu10055 Aug 19, 2024

shmishra99 added comp:webgpu stat:awaiting tensorflower labels Aug 19, 2024

Provide feedback