Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Agentic Loop over a large document #2542

Closed
ismailsimsek opened this issue Apr 29, 2024 · 6 comments
Closed

[Issue]: Agentic Loop over a large document #2542

ismailsimsek opened this issue Apr 29, 2024 · 6 comments
Labels
multimodal language + vision, speech etc. rag retrieve-augmented generative agents

Comments

@ismailsimsek
Copy link

ismailsimsek commented Apr 29, 2024

Describe the issue

Is it possible to read large PDF document in chunks using agents. Without programmatic loop.

Could it be done using task-decomposition? have anyone done something similar?

Steps to reproduce

something like below:

  1. Agents uses tool and reads PDF document 50 pages each time (total 300 pages)
  2. Agents summarizes all the chunks to 6 page output
  3. Summary written to file

Screenshots and logs

No response

Additional Information

No response

@WaelKarkoub
Copy link
Contributor

@ismailsimsek are you asking for an OCR capability or a rag capability? If OCR, I believe it's planned in the multimodality road map #1975

@WaelKarkoub WaelKarkoub added the multimodal language + vision, speech etc. label Apr 29, 2024
@ismailsimsek
Copy link
Author

ismailsimsek commented Apr 30, 2024

Currently trying to get it work with RAG ( RetrieveUserProxyAgent + GroupChatManager )

Appreciate if anyone could point to similar solutions..

Current code is here: https://github.com/ismailsimsek/aistorybooks/blob/story-book/classic_storiesv2.py
PR ismailsimsek/aistorybooks#3

currently just trying to summarize PDF, later on planning to add image generation too

@WaelKarkoub WaelKarkoub added the rag retrieve-augmented generative agents label Apr 30, 2024
@ismailsimsek ismailsimsek changed the title [Issue]: Looping over a large document [Issue]: Agentic Loop over a large document May 2, 2024
@thinkall
Copy link
Collaborator

thinkall commented May 24, 2024

Currently trying to get it work with RAG ( RetrieveUserProxyAgent + GroupChatManager )

Appreciate if anyone could point to similar solutions..

Current code is here: https://github.com/ismailsimsek/aistorybooks/blob/story-book/classic_storiesv2.py PR ismailsimsek/aistorybooks#3

currently just trying to summarize PDF, later on planning to add image generation too

The current RetrieveUserProxyAgent should support PDF files. Have you tried it?

@ismailsimsek
Copy link
Author

ismailsimsek commented May 24, 2024

@thinkall i will check it. what i am looking into is summarizing the PDF in small chunks, since its too big. in a loop, is that possible using the agents to loop and process chunks one by one?

@thinkall
Copy link
Collaborator

thinkall commented May 24, 2024

@thinkall i will check it. what i am looking into is summarizing the PDF in small chunks, since its too big. in a loop, is that possible using the agents to loop and process chunks one by one?

The agent will split the pdf into chunks and save it into vector db.

@thinkall
Copy link
Collaborator

Close as it's not active for a long time. Please reopen if the issue still persist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multimodal language + vision, speech etc. rag retrieve-augmented generative agents
Projects
None yet
Development

No branches or pull requests

3 participants