Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue running cross-encoder onnx model exported with optimum-cli #361

Open
2 of 4 tasks
rawsh opened this issue Sep 13, 2024 · 2 comments
Open
2 of 4 tasks

Issue running cross-encoder onnx model exported with optimum-cli #361

rawsh opened this issue Sep 13, 2024 · 2 comments

Comments

@rawsh
Copy link
Contributor

rawsh commented Sep 13, 2024

System Info

py3.10
infinity-emb 0.0.55

Running with optimum engine fails:

INFO     2024-09-13 15:17:02,874 datasets INFO: PyTorch version 2.4.0 available.                                                            config.py:59
INFO:     Started server process [76741]
INFO:     Waiting for application startup.
INFO     2024-09-13 15:17:03,950 infinity_emb INFO: model=`rawsh/ms-marco-TinyBERT-L-2-ONNX` selected, using engine=`optimum` and     select_model.py:62
         device=`cpu`                                                                                                                                   
INFO     2024-09-13 15:17:04,356 infinity_emb INFO: Optimized model found at                                                        utils_optimum.py:120
         /Users/robert/.cache/huggingface/hub/infinity_onnx/CPUExecutionProvider/rawsh/ms-marco-TinyBERT-L-2-ONNX/model_optimized.o                     
         nnx, skipping optimization                                                                                                                     
ERROR:    Traceback (most recent call last):
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/infinity_server.py", line 63, in lifespan
    app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/engine.py", line 259, in from_args
    return cls(engines=tuple(engines))
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/engine.py", line 67, in from_args
    engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/engine.py", line 53, in __init__
    self._model, self._min_inference_t, self._max_inference_t = select_model(
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/inference/select_model.py", line 76, in select_model
    loaded_engine.warmup(batch_size=engine_args.batch_size, n_tokens=1)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/transformer/abstract.py", line 170, in warmup
    return run_warmup(self, inp)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/transformer/abstract.py", line 178, in run_warmup
    embed = model.encode_core(feat)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/transformer/crossencoder/optimum.py", line 78, in encode_core
    outputs = self.model(**features, return_dict=True)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/optimum/modeling_base.py", line 99, in __call__
    return self.forward(*args, **kwargs)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 1460, in forward
    onnx_inputs = self._prepare_onnx_inputs(use_torch, **model_inputs)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 943, in _prepare_onnx_inputs
    if onnx_inputs[input_name].dtype != self.input_dtypes[input_name]:
AttributeError: 'NoneType' object has no attribute 'dtype'

Information

  • Docker
  • The CLI directly via pip

Tasks

  • An officially supported command
  • My own modifications

Reproduction

converted model to onnx: rawsh/ms-marco-TinyBERT-L-2-ONNX
process:

optimum-cli export onnx --model cross-encoder/ms-marco-TinyBERT-L-2-v2 ms-marco-tinybert
huggingface-cli upload rawsh/ms-marco-TinyBERT-L-2-ONNX ms-marco-tinybert .

(unrelated: can't figure out how to run a local model)

run with

infinity_emb v2 --model-id rawsh/ms-marco-TinyBERT-L-2-ONNX --device cpu --engine optimum

Expected behavior

no error when running with optimum

@michaelfeil
Copy link
Owner

Has Xenova the model prepared to onnx?

@rawsh-rubrik
Copy link

@michaelfeil Yes, also throws the same for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants