Skip to content

Commit

Permalink
Partial sync of codebase
Browse files Browse the repository at this point in the history
  • Loading branch information
hauntsaninja committed Oct 3, 2024
1 parent 9f7f69d commit e060298
Show file tree
Hide file tree
Showing 16 changed files with 302 additions and 144 deletions.
21 changes: 11 additions & 10 deletions .github/workflows/build_wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,19 @@ jobs:
matrix:
# cibuildwheel builds linux wheels inside a manylinux container
# it also takes care of procuring the correct python version for us
os: [ubuntu-latest, windows-latest, macos-13]
python-version: [38, 39, 310, 311, 312]
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: [39, 310, 311, 312, 313]

steps:
- uses: actions/checkout@v4

- uses: pypa/cibuildwheel@v2.18.0
- uses: pypa/cibuildwheel@v2.21.2
env:
CIBW_BUILD: "cp${{ matrix.python-version}}-*"

- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
with:
name: dist
name: cibw-wheels-${{ matrix.os }}-${{ strategy.job-index }}
path: ./wheelhouse/*.whl

build_wheels_aarch64:
Expand All @@ -37,7 +37,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest]
python-version: [38, 39, 310, 311, 312]
python-version: [39, 310, 311, 312, 313]

steps:
- uses: actions/checkout@v4
Expand All @@ -48,24 +48,25 @@ jobs:
platforms: arm64

- name: Build wheels
uses: pypa/cibuildwheel@v2.18.0
uses: pypa/cibuildwheel@v2.21.2
env:
CIBW_BUILD: "cp${{ matrix.python-version}}-*"
CIBW_ARCHS: aarch64
CIBW_BUILD_VERBOSITY: 3
# https://github.com/rust-lang/cargo/issues/10583
CIBW_ENVIRONMENT_LINUX: PATH="$PATH:$HOME/.cargo/bin" CARGO_NET_GIT_FETCH_WITH_CLI=true
- uses: actions/upload-artifact@v3

- uses: actions/upload-artifact@v4
with:
name: dist
name: cibw-wheels-${{ matrix.os }}-${{ strategy.job-index }}
path: ./wheelhouse/*.whl

build_sdist:
name: sdist
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
name: Install Python
with:
python-version: "3.9"
Expand Down
30 changes: 28 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,26 @@

This is the changelog for the open source version of tiktoken.

## [v0.8.0]

- Support for `o1-` and `chatgpt-4o-` models
- Build wheels for Python 3.13
- Add possessive quantifiers to limit backtracking in regular expressions, thanks to @l0rinc!
- Provide a better error message and type for invalid token decode
- Permit tuples in type hints
- Better error message for passing invalid input to `get_encoding`
- Better error messages during plugin loading
- Add a `__version__` attribute
- Update versions of `pyo3`, `regex`, `fancy-regex`
- Drop support for Python 3.8

## [v0.7.0]

- Support for `gpt-4o`
- Performance improvements

## [v0.6.0]

- Optimise regular expressions for a 20% performance improvement, thanks to @paplorinc!
- Add `text-embedding-3-*` models to `encoding_for_model`
- Check content hash for downloaded files
Expand All @@ -16,14 +31,17 @@ This is the changelog for the open source version of tiktoken.
Thank you to @paplorinc, @mdwelsh, @Praneet460!

## [v0.5.2]

- Build wheels for Python 3.12
- Update version of PyO3 to allow multiple imports
- Avoid permission errors when using default cache logic

## [v0.5.1]

- Add `encoding_name_for_model`, undo some renames to variables that are implementation details

## [v0.5.0]

- Add `tiktoken._educational` submodule to better document how byte pair encoding works
- Ensure `encoding_for_model` knows about several new models
- Add `decode_with_offets`
Expand All @@ -32,23 +50,28 @@ Thank you to @paplorinc, @mdwelsh, @Praneet460!
- Update versions of dependencies

## [v0.4.0]

- Add `decode_batch` and `decode_bytes_batch`
- Improve error messages and handling

## [v0.3.3]

- `tiktoken` will now make a best effort attempt to replace surrogate pairs with the corresponding
Unicode character and will replace lone surrogates with the Unicode replacement character.
Unicode character and will replace lone surrogates with the Unicode replacement character.

## [v0.3.2]

- Add encoding for GPT-4

## [v0.3.1]

- Build aarch64 wheels
- Make `blobfile` an optional dependency

Thank you to @messense for the environment variable that makes cargo not OOM under emulation!

## [v0.3.0]

- Improve performance by 5-20%; thank you to @nistath!
- Add `gpt-3.5-turbo` models to `encoding_for_model`
- Add prefix matching to `encoding_for_model` to better support future model versions
Expand All @@ -57,16 +80,19 @@ Thank you to @messense for the environment variable that makes cargo not OOM und
- Add packaging metadata

## [v0.2.0]
- Add ``tiktoken.encoding_for_model`` to get the encoding for a specific model

- Add `tiktoken.encoding_for_model` to get the encoding for a specific model
- Improve portability of caching logic

Thank you to @fritzo, @arvid220u, @khanhvu207, @henriktorget for various small corrections

## [v0.1.2]

- Avoid use of `blobfile` for public files
- Add support for Python 3.8
- Add py.typed
- Improve the public tests

## [v0.1.1]

- Initial release
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "tiktoken"
version = "0.7.0"
version = "0.8.0"
edition = "2021"
rust-version = "1.57.0"

Expand All @@ -9,7 +9,7 @@ name = "_tiktoken"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.20.0", features = ["extension-module"] }
pyo3 = { version = "0.22.2", default-features = false, features = ["extension-module", "macros"] }

# tiktoken dependencies
fancy-regex = "0.13.0"
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,4 @@ setup(

Then simply `pip install ./my_tiktoken_extension` and you should be able to use your
custom encodings! Make sure **not** to use an editable install.

6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
[project]
name = "tiktoken"
version = "0.7.0"
version = "0.8.0"
description = "tiktoken is a fast BPE tokeniser for use with OpenAI's models"
readme = "README.md"
license = {file = "LICENSE"}
authors = [{name = "Shantanu Jain"}, {email = "[email protected]"}]
dependencies = ["regex>=2022.1.18", "requests>=2.26.0"]
optional-dependencies = {blobfile = ["blobfile>=2"]}
requires-python = ">=3.8"
requires-python = ">=3.9"

[project.urls]
homepage = "https://github.com/openai/tiktoken"
Expand All @@ -24,7 +24,7 @@ build-verbosity = 1

linux.before-all = "curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y"
linux.environment = { PATH = "$PATH:$HOME/.cargo/bin" }
macos.before-all = "rustup target add aarch64-apple-darwin"
macos.before-all = "rustup target add aarch64-apple-darwin x86_64-apple-darwin"

skip = [
"*-manylinux_i686",
Expand Down
Loading

0 comments on commit e060298

Please sign in to comment.