Integration with Triton inference server #955
JaimeArboleda
announced in
Announcements
Replies: 2 comments
-
@JaimeArboleda Thank you, always great to hear from the community. Two comments:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks a lot Peter. I will take a look at docling-serve, I did not know about it. It looks very promising. With respect to the second point, I was thinking about a similar thing but for models like tableformer, layout detector, EasyOCR and so on, in this cased served via Triton inference server. Does it make sense? However, maybe if docling-serve is well optimized it will be enough for us. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Question
First of all thank you for this great, great package. Kudos to IBM for creating and open sourcing it.
We are planning to have docling as a kind of service for converting every document in our organization. And we need to handle it efficiently, because we expect a lot of requests. I think now there is GPU support with vanilla pytorch. Would it be possible to, for example, serve the models that do the heavy work by a triton inference server, to increase the conversion speed?
I tried searching for triton and found nothing, so I guess the answer is no, but I would like to know if this idea is at least in the roadmap.
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions