Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修正notebook/C3 搭建知识库/3.数据处理.ipynb之下的langchain代码 #123

Closed
anarchysaiko opened this issue Jul 3, 2024 · 3 comments

Comments

@anarchysaiko
Copy link

由于langchain做了很大调整,将原来的langchain分割成了langchain、langchain-corelangchain-community三个包,因此“PDF”文档部分代码需要安装langchain-communityPyMuPDF才可运行,同时应当将代码改为如下:

from langchain_community.document_loaders import PyMuPDFLoader
# 创建一个 PyMuPDFLoader Class 实例,输入为待加载的 pdf 文档路径
loader = PyMuPDFLoader("../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf")

# 调用 PyMuPDFLoader Class 的函数 load 对 pdf 文件进行加载
pdf_pages = loader.load()
@Halukisan
Copy link

md应该怎么加载,from langchain.document_loaders.markdown import UnstructuredMarkdownLoader,导不进去,有PermissionError,还有nltk的错误

@lta155
Copy link
Contributor

lta155 commented Sep 15, 2024

md应该怎么加载,from langchain.document_loaders.markdown import UnstructuredMarkdownLoader,导不进去,有PermissionError,还有nltk的错误

有部分同学在第一部分内容碰到过这个问题,可以尝试下方链接的方法解决
https://github.com/datawhalechina/llm-universe/blob/4182b48827947a4d95453f88d3b01478a9548e39/docs/C1/7.%E7%8E%AF%E5%A2%83%E9%85%8D%E7%BD%AE.md#31-%E4%B8%8B%E8%BD%BD-nltk-%E7%9B%B8%E5%85%B3%E8%B5%84%E6%BA%90

@lta155
Copy link
Contributor

lta155 commented Sep 15, 2024

由于langchain做了很大调整,将原来的langchain分割成了langchain、langchain-corelangchain-community三个包,因此“PDF”文档部分代码需要安装langchain-communityPyMuPDF才可运行,同时应当将代码改为如下:

from langchain_community.document_loaders import PyMuPDFLoader
# 创建一个 PyMuPDFLoader Class 实例,输入为待加载的 pdf 文档路径
loader = PyMuPDFLoader("../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf")

# 调用 PyMuPDFLoader Class 的函数 load 对 pdf 文件进行加载
pdf_pages = loader.load()

虽然langchain调整很多,但安装requirements.txt中的包后如下两种方式都可以调用PyMuPDFLoader

from langchain.document_loaders.pdf import PyMuPDFLoader
from langchain_community.document_loaders import PyMuPDFLoader

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants