Integration with Triton inference server #955

JaimeArboleda · 2025-02-13T06:51:33Z

JaimeArboleda
Feb 13, 2025

Question

First of all thank you for this great, great package. Kudos to IBM for creating and open sourcing it.

We are planning to have docling as a kind of service for converting every document in our organization. And we need to handle it efficiently, because we expect a lot of requests. I think now there is GPU support with vanilla pytorch. Would it be possible to, for example, serve the models that do the heavy work by a triton inference server, to increase the conversion speed?

I tried searching for triton and found nothing, so I guess the answer is no, but I would like to know if this idea is at least in the roadmap.

Thanks in advance!

PeterStaar-IBM · 2025-02-13T07:50:17Z

PeterStaar-IBM
Feb 13, 2025
Maintainer

@JaimeArboleda Thank you, always great to hear from the community.

Two comments:

If you want to scale, please look at the work we are doing in docling-serve. This will natively scale Docling in Kubernetes and OpenShift
We have started with some work on calling models (served locally via vllm or remotely), especially for picture description (see here: https://github.com/DS4SD/docling/blob/main/docs/examples/pictures_description_api.py). I think this is what you are referring to?

0 replies

JaimeArboleda · 2025-02-14T13:05:55Z

JaimeArboleda
Feb 14, 2025
Author

Thanks a lot Peter. I will take a look at docling-serve, I did not know about it. It looks very promising. With respect to the second point, I was thinking about a similar thing but for models like tableformer, layout detector, EasyOCR and so on, in this cased served via Triton inference server. Does it make sense? However, maybe if docling-serve is well optimized it will be enough for us.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration with Triton inference server #955

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Integration with Triton inference server #955

JaimeArboleda Feb 13, 2025

Question

Replies: 2 comments

PeterStaar-IBM Feb 13, 2025 Maintainer

JaimeArboleda Feb 14, 2025 Author

JaimeArboleda
Feb 13, 2025

PeterStaar-IBM
Feb 13, 2025
Maintainer

JaimeArboleda
Feb 14, 2025
Author