-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to accelerate bge m3 sparse embeding module when inference? #294
Comments
To accelerate the bge-m3 sparse embedding module during inference, consider the following steps:
Example Code Adjustments: # engine.py
self._batch_handler = BatchHandler(
max_batch_size=self._engine_args.batch_size * 2, # Increase batch size
model=self._model,
batch_delay=self._min_inference_t / 4, # Adjust batch delay
vector_disk_cache_path=self._engine_args.vector_disk_cache_path,
verbose=logger.level <= 10,
lengths_via_tokenize=self._engine_args.lengths_via_tokenize,
) # batch_handler.py
class BatchHandler:
def __init__(self, ...):
...
self._threadpool = ThreadPoolExecutor(max_workers=8) # Increase thread pool size
... These changes should help in accelerating the sparse embedding process for the bge-m3 model. References/libs/infinity_emb/infinity_emb/engine.py
|
These methods are difficult to work. |
Feature request
how to accelerate bge m3 sparse embeding module when inference?
Motivation
the sparse embeding process is too slow during infer bge-m3 after accelerate the dense emb inference
Your contribution
you can give a idea,I will learn how to make it work
The text was updated successfully, but these errors were encountered: