Merge pull request #329 from axa-group/dependabot/pip/demo/jupyter-no…

…tebook/bleach-3.1.1 Bump bleach from 3.1.0 to 3.1.1 in /demo/jupyter-notebook
axa-group · Feb 25, 2020 · bb58291 · bb58291
2 parents 7ca6534 + 470d7df
commit bb58291
Show file tree

Hide file tree

Showing 3 changed files with 84 additions and 93 deletions.
diff --git a/demo/jupyter-notebook/Pipfile.lock b/demo/jupyter-notebook/Pipfile.lock
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -19,7 +19,7 @@ There is only a few required keys:
 - `version` `[Number]` is the version number of the API.
 - `extractor` `[Object]` is a bunch of parameters about the extraction.
 - `cleaner` `[Array]` is a list of every cleaning tools that will be called.
-- `output` `[Object]` contains the list of fromats to export and some other details.
+- `output` `[Object]` contains the list of formats to export and some other details.
 
 Cleaning tools have default parameters that work pretty well, but you can override the parameters by providing the in the config.
 
@@ -81,7 +81,7 @@ Different extractors are available for each input file format.
 - **PDF files:** two extractors are currently available for PDF files:
   - `pdfminer`, which is an advanced python based extractor capable of extracting low and high level textual structures (from characters to paragraphs),
   - `pdfjs`, Mozilla's free solution for parsing documents. This is the recommended extractor to parse large documents (200+ pages).
-- **Images:** four OCR extractors are supported for images:
+- **Images:** five OCR extractors are supported for images:
   - `tesseract` which is an Open Source OCR software,
   - `abbyy`, that relies on ABBYY Finereader, a paid solution for OCR on documents and images,
   - `google-vision`, which uses the [Google Vision](https://cloud.google.com/vision/) API to detect the contents of an image (see the [google vision documentation for more](../server/src/input/google-vision/README.md)),
@@ -182,7 +182,7 @@ The `includeMarginals: boolean` parameter allows to chose whether the output wil
 
 ```json
 {
-  "version": 0.5,
+  "version": 0.9,
   "extractor": {
     "pdf": "pdfminer",
     "ocr": "tesseract",

diff --git a/docs/json-output.md b/docs/json-output.md
@@ -157,7 +157,7 @@ The text element type is of multiple levels of subtypes:
 
 #### 1.2.2. Table type
 
-The following strucutre defines a table with a single row, single column containing a single cell with a paragraph of text as the cell content.
+The following structure defines a table with a single row, single column containing a single cell with a paragraph of text as the cell content.
 
 ```js
 {
-Original file line number
+Diff line change
@@ Expand Up / @@ -157,7 +157,7 @@ The text element type is of multiple levels of subtypes: @@
     #### 1.2.2. Table type
-    The following strucutre defines a table with a single row, single column containing a single cell with a paragraph of text as the cell content.
+    The following structure defines a table with a single row, single column containing a single cell with a paragraph of text as the cell content.
     ```js
     {
@@ Expand Down @@