Skip to content

johndolan29/document-ai-test

Repository files navigation

Document AI Test

This is a POC for how to integrate with the Document AI API on GCP

Installation

Use the package manager pip to install requirments.txt

pip install -r requirements.txt

Next, you need to login to your gcp account using the GCP SDK

gcloud auth application-default login

You also need to set up a GCP project to run in.

Usage

Once you've logged in, you now need to set up the GCP storage bucket that will house the pdfs and the Document AI parser. You can do this by going to Google Cloud Storage

Next, you want to upload a pdf to your newly created bucket. You can choose the one attached to this repo. Make a note of the bucket name.

Next, go to Document AI. Go to Explore Processors. Search for 'Invoice Parser' Select Create Processor. Any name and location should work.

Now, you should have a default parser (under Processor Details). You need to make a note of the processor ID that we will use in our API call.

Next, fill in your bucket_name, project_id, location, and processor_id in the main.py file

Then, simply running,

python main.py

should output a table of entities that is in the document.

Disclaimer

This is not production ready code. This is merely a toy piece of code to show how GCP's Document AI system works. Use at your own peril :)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published