Skip to content

Commit

Permalink
Merge pull request #329 from axa-group/dependabot/pip/demo/jupyter-no…
Browse files Browse the repository at this point in the history
…tebook/bleach-3.1.1

Bump bleach from 3.1.0 to 3.1.1 in /demo/jupyter-notebook
  • Loading branch information
jvalls-axa authored Feb 25, 2020
2 parents 7ca6534 + 470d7df commit bb58291
Show file tree
Hide file tree
Showing 3 changed files with 84 additions and 93 deletions.
169 changes: 80 additions & 89 deletions demo/jupyter-notebook/Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ There is only a few required keys:
- `version` `[Number]` is the version number of the API.
- `extractor` `[Object]` is a bunch of parameters about the extraction.
- `cleaner` `[Array]` is a list of every cleaning tools that will be called.
- `output` `[Object]` contains the list of fromats to export and some other details.
- `output` `[Object]` contains the list of formats to export and some other details.

Cleaning tools have default parameters that work pretty well, but you can override the parameters by providing the in the config.

Expand Down Expand Up @@ -81,7 +81,7 @@ Different extractors are available for each input file format.
- **PDF files:** two extractors are currently available for PDF files:
- `pdfminer`, which is an advanced python based extractor capable of extracting low and high level textual structures (from characters to paragraphs),
- `pdfjs`, Mozilla's free solution for parsing documents. This is the recommended extractor to parse large documents (200+ pages).
- **Images:** four OCR extractors are supported for images:
- **Images:** five OCR extractors are supported for images:
- `tesseract` which is an Open Source OCR software,
- `abbyy`, that relies on ABBYY Finereader, a paid solution for OCR on documents and images,
- `google-vision`, which uses the [Google Vision](https://cloud.google.com/vision/) API to detect the contents of an image (see the [google vision documentation for more](../server/src/input/google-vision/README.md)),
Expand Down Expand Up @@ -182,7 +182,7 @@ The `includeMarginals: boolean` parameter allows to chose whether the output wil

```json
{
"version": 0.5,
"version": 0.9,
"extractor": {
"pdf": "pdfminer",
"ocr": "tesseract",
Expand Down
2 changes: 1 addition & 1 deletion docs/json-output.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ The text element type is of multiple levels of subtypes:

#### 1.2.2. Table type

The following strucutre defines a table with a single row, single column containing a single cell with a paragraph of text as the cell content.
The following structure defines a table with a single row, single column containing a single cell with a paragraph of text as the cell content.

```js
{
Expand Down

0 comments on commit bb58291

Please sign in to comment.