Skip to content

naresh-bachwani/RRD_BYOB_OCR_Denoising

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

OCR Denoising

This is pdf to docx file converter script using OCR denoising technique. This script gives very high accuracy, converting pdf into exact layout in docx.

Requirements

  • Python 3+ - Pyhton 3.6+ verion
  • pdfminer (pip install pdfminer)
  • python-docx (pip install python-docx)
  • autocorrect (pip install autocorrect)
  • re (pip install regex)

How to run

Open pdf2word.py and change the base path to directory of your pdfs. Run to get all pdfs in that folder converted to word document.

About

submission to the rrd byob competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages