Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP16 inference support in WebGPU #8360

Open
shanumante-sc opened this issue Aug 16, 2024 · 0 comments
Open

FP16 inference support in WebGPU #8360

shanumante-sc opened this issue Aug 16, 2024 · 0 comments

Comments

@shanumante-sc
Copy link

shanumante-sc commented Aug 16, 2024

System information

  • TensorFlow.js version (you are using): 4.20
  • Are you willing to contribute it (Yes/No): Maybe :)

Describe the feature and the current behavior/state.

  • We are looking into using WebGPU backend for inference and see a decent improvement (~5-10%) over WebGL for our models, but it is much lower than our expectation.
  • One potential way to speed up inference would be to use fp16 instead of fp32 data type for tensors. The WebGL backend already supports fp16 which we use. WebGPU also supports fp16, atleast on Chrome desktop (https://chromestatus.com/feature/5180552617656320)
  • Ideally we would like to use F32_F16 precision as defined in tflite to get best tradeoff between precision loss and performance.

Will this change the current api? How?

  • An environment flag to set precision (similar to WebGL) would be ideal for ease of integration.

Who will benefit with this feature?

  • All consumers of WebGPU backend.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants