Skip to content

Latest commit

 

History

History
204 lines (121 loc) · 9.64 KB

CHANGELOG.md

File metadata and controls

204 lines (121 loc) · 9.64 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

1.6.1 (2025-03-21)

Bug Fixes

  • client: Polling would error out on httpx.ReadTimeout (#400) (aea1255)
  • core: Allow PDFs based on extension if the pages can be counted (#396) (cfbfd01)
  • core: Auto-fix clippy warnings (#393) (0605227)
  • Fixed prompts and retries for LLMs (#394) (4b31588)

1.6.0 (2025-03-20)

Features

  • Added new cropped image viewing, updated upload component defaults for image VLM processing, and some bug fixes for segment highlighting + JSON viewing (#388) (6115ee0)

Bug Fixes

  • core: Auto-fix clippy warnings (#386) (ccb56f9)
  • core: Update default generation strategies for Picture and Page segments (5316485)
  • Downgraded cuda version for doctr (36db353)

1.5.1 (2025-03-16)

Bug Fixes

  • Added imagemagick to docker images (d3ac921)
  • Added retry when finish reason is length (#383) (a8dd777)
  • Correct Rust lint workflow configuration (0b1a1eb)

1.5.0 (2025-03-13)

Features

  • core: Added compatibility to Google AI Studio (#380) (f56b74c)

Bug Fixes

1.4.2 (2025-03-12)

Bug Fixes

  • Github action now removes v from version before tagging (6c77a1f)
  • Moved infrastructure from values.yaml to infrastructure.yaml (e4ba284)

1.4.1 (2025-03-12)

Bug Fixes

  • Continue on error on docker build (aca0b44)

1.4.0 (2025-03-12)

Features

  • /health return current version (627e8c9)

Bug Fixes

  • Updated changelog paths (d20b811)

1.3.5 (2025-03-12)

Bug Fixes

  • Added back segmentation docker with self hosted runner (0984ba2)

1.3.4 (2025-03-11)

Bug Fixes

  • Removed segmenetation from docker build (5dc9e6e)

1.3.3 (2025-03-11)

Bug Fixes

  • Updated rust version for docker builds (e5a3633)

1.3.2 (2025-03-11)

Bug Fixes

  • Release-please docker build (6e1ff43)

1.3.1 (2025-03-11)

Bug Fixes

  • Docker compose updated uses pr (f45abd1)

1.3.0 (2025-03-11)

Features

Bug Fixes

  • Debugging please release (e574177)
  • Debugging please release with core changes (558a6f9)
  • Docker builds use root version (82e1768)
  • Docker compose files update separately (15328a2)
  • Image tag updates not full image (7b8791f)
  • Only trigger docker build after releases created (676c280)

1.2.0 (2025-03-11)

Features

  • Added release please for automated releases (#363) (d808d4e)

Bug Fixes

  • Await was missing in response (1ad37d8)
  • Await was missing in response (632adce)

Added

  • Added route POST /task/parse and PATCH /task/{task_id}/parse to parse a task. These routes are exactly the same as the POST /task and PATCH /task/{task_id} routes, but don't use a multipart request.

The old routes are deprecated but will continue to work for the foreseeable future.

  • Batch parallelization, so individual tasks can take full advantage of unused GPU resources.

Changed

  • OCR All is now the default strategy
  • Significant improvements to OCR quality

Removed

  • Removed terraform directory

Fixed

  • Fixed bug in saving output from the python client

[1.1.0] - 2025-01-29

Added

  • Added chunk_processing config to control chunking
  • Added high_resolution config to control image density
  • Added segmentation_processing config to control LLM processing on the segments
  • Added segmentation_strategy to control segmentation
  • Added expires_in to API and self deployment config, it is the number of seconds before the task expires and is deleted
  • Concurrent OCR and segmentation
  • Concurrent page processing
  • CPU support - run with docker compose up -f compose-cpu.yaml -d
  • Python client - pip install chunkr-ai
  • PATCH /task/{task_id} - allows you to update the configuration for a task. Only the steps that are updated will be re-run.
  • DELETE /task/{task_id} - allows you to delete a task as long as it Status is not Processing
  • GET /task/{task_id}/cancel - allows you to cancel a task before Status is Processing
  • Helm chart
  • Cloudflared tunnel support for https
  • Azure support for self deployment
  • Minio support for storage
  • Python client
  • Optionally get base64 encoded files from the API rather than a presigned URL
  • Upload base64 encoded files and presigned URLs, when using the Python client

Changed

  • Combined all workers into a task worker. See 279
  • Redis is now part of the kubernetes deployment
  • Documentation
  • Improved segmentation quality and speed
  • Dashboard has table view - search, deletion, cancellation
  • Viewer - better ux
  • Better usage tracking - includes graph
  • Landing page

Fixed

  • List items incorrect heuristics. See 276
  • Reading order

Removed

(All changes maintain compatibility with old configs)

  • Deprecated model config
  • Deprecated target_chunk_length, you can now use chunk_processing.target_length instead
  • Deprecated structured_extraction.json_schema.type
  • Deprecated ocr_strategy.Off
  • Deprecated expires_at in the Python client