Document AI Test

This is a POC for how to integrate with the Document AI API on GCP

Installation

Use the package manager pip to install requirments.txt

pip install -r requirements.txt

Next, you need to login to your gcp account using the GCP SDK

gcloud auth application-default login

You also need to set up a GCP project to run in.

Usage

Once you've logged in, you now need to set up the GCP storage bucket that will house the pdfs and the Document AI parser. You can do this by going to Google Cloud Storage

Next, you want to upload a pdf to your newly created bucket. You can choose the one attached to this repo. Make a note of the bucket name.

Next, go to Document AI. Go to Explore Processors. Search for 'Invoice Parser' Select Create Processor. Any name and location should work.

Now, you should have a default parser (under Processor Details). You need to make a note of the processor ID that we will use in our API call.

Next, fill in your bucket_name, project_id, location, and processor_id in the main.py file

Then, simply running,

python main.py

should output a table of entities that is in the document.

Disclaimer

This is not production ready code. This is merely a toy piece of code to show how GCP's Document AI system works. Use at your own peril :)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ReadMe.md		ReadMe.md
doc_ai_utils.py		doc_ai_utils.py
eda.ipynb		eda.ipynb
gcp_utils.py		gcp_utils.py
main.py		main.py
requirements.txt		requirements.txt
wordpress-pdf-invoice-plugin-sample.pdf		wordpress-pdf-invoice-plugin-sample.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document AI Test

Installation

Usage

Disclaimer

About

Releases

Packages

Languages

johndolan29/document-ai-test

Folders and files

Latest commit

History

Repository files navigation

Document AI Test

Installation

Usage

Disclaimer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages