All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
1.6.1 (2025-03-21)
- client: Polling would error out on httpx.ReadTimeout (#400) (aea1255)
- core: Allow PDFs based on extension if the pages can be counted (#396) (cfbfd01)
- core: Auto-fix clippy warnings (#393) (0605227)
- Fixed prompts and retries for LLMs (#394) (4b31588)
1.6.0 (2025-03-20)
- Added new cropped image viewing, updated upload component defaults for image VLM processing, and some bug fixes for segment highlighting + JSON viewing (#388) (6115ee0)
- core: Auto-fix clippy warnings (#386) (ccb56f9)
- core: Update default generation strategies for Picture and Page segments (5316485)
- Downgraded cuda version for doctr (36db353)
1.5.1 (2025-03-16)
- Added imagemagick to docker images (d3ac921)
- Added retry when finish reason is length (#383) (a8dd777)
- Correct Rust lint workflow configuration (0b1a1eb)
1.5.0 (2025-03-13)
- Fix keycloak tag (df9efa5)
1.4.2 (2025-03-12)
- Github action now removes v from version before tagging (6c77a1f)
- Moved infrastructure from values.yaml to infrastructure.yaml (e4ba284)
1.4.1 (2025-03-12)
- Continue on error on docker build (aca0b44)
1.4.0 (2025-03-12)
- /health return current version (627e8c9)
- Updated changelog paths (d20b811)
1.3.5 (2025-03-12)
- Added back segmentation docker with self hosted runner (0984ba2)
1.3.4 (2025-03-11)
- Removed segmenetation from docker build (5dc9e6e)
1.3.3 (2025-03-11)
- Updated rust version for docker builds (e5a3633)
1.3.2 (2025-03-11)
- Release-please docker build (6e1ff43)
1.3.1 (2025-03-11)
- Docker compose updated uses pr (f45abd1)
1.3.0 (2025-03-11)
- Debugging please release (e574177)
- Debugging please release with core changes (558a6f9)
- Docker builds use root version (82e1768)
- Docker compose files update separately (15328a2)
- Image tag updates not full image (7b8791f)
- Only trigger docker build after releases created (676c280)
1.2.0 (2025-03-11)
- Added route
POST /task/parse
andPATCH /task/{task_id}/parse
to parse a task. These routes are exactly the same as thePOST /task
andPATCH /task/{task_id}
routes, but don't use a multipart request.
The old routes are deprecated but will continue to work for the foreseeable future.
- Batch parallelization, so individual tasks can take full advantage of unused GPU resources.
- OCR
All
is now the default strategy - Significant improvements to OCR quality
- Removed terraform directory
- Fixed bug in saving output from the python client
- Added
chunk_processing
config to control chunking - Added
high_resolution
config to control image density - Added
segmentation_processing
config to control LLM processing on the segments - Added
segmentation_strategy
to control segmentation - Added
expires_in
to API and self deployment config, it is the number of seconds before the task expires and is deleted - Concurrent OCR and segmentation
- Concurrent page processing
- CPU support - run with
docker compose up -f compose-cpu.yaml -d
- Python client -
pip install chunkr-ai
- PATCH
/task/{task_id}
- allows you to update the configuration for a task. Only the steps that are updated will be re-run. - DELETE
/task/{task_id}
- allows you to delete a task as long as it Status is notProcessing
- GET
/task/{task_id}/cancel
- allows you to cancel a task before Status isProcessing
- Helm chart
- Cloudflared tunnel support for https
- Azure support for self deployment
- Minio support for storage
- Python client
- Optionally get base64 encoded files from the API rather than a presigned URL
- Upload base64 encoded files and presigned URLs, when using the Python client
- Combined all workers into a
task
worker. See 279 - Redis is now part of the kubernetes deployment
- Documentation
- Improved segmentation quality and speed
- Dashboard has table view - search, deletion, cancellation
- Viewer - better ux
- Better usage tracking - includes graph
- Landing page
- List items incorrect heuristics. See 276
- Reading order
(All changes maintain compatibility with old configs)
- Deprecated
model
config - Deprecated
target_chunk_length
, you can now usechunk_processing.target_length
instead - Deprecated
structured_extraction.json_schema.type
- Deprecated
ocr_strategy.Off
- Deprecated
expires_at
in the Python client