Releases: MilesCranmer/PySR
v1.5.0
Backend Changes
Major Changes
- Change behavior of batching to resample only every iteration; not every eval in MilesCranmer/SymbolicRegression.jl#421
- This result in a speed improvement for code with
batching=true
- It should also result in improved search results with batching, because comparison within a single population is more stable during evolution. In other words, there is no lucky batch phenomenon.
- This also refactors the batching interface to be cleaner. There is a
SubDataset <: Dataset
rather than passing around an arrayidx
explicitly. - Note that other than the slight behaviour change, this is otherwise backwards compatible - the old way to write custom loss functions that take
idx
will still be handled.
- This result in a speed improvement for code with
Other changes
- feat: better error for mismatched eltypes by @MilesCranmer in MilesCranmer/SymbolicRegression.jl#414
- CompatHelper: bump compat for Optim to 1, (keep existing compat) by @github-actions in MilesCranmer/SymbolicRegression.jl#403
- feat: explicitly monitor errors in workers by @MilesCranmer in MilesCranmer/SymbolicRegression.jl#417
- feat: allow recording crossovers by @MilesCranmer in MilesCranmer/SymbolicRegression.jl#415
- add script for converting record to graphml by @MilesCranmer in MilesCranmer/SymbolicRegression.jl#416
- ci: redistribute part 1 of test suite by @MilesCranmer in MilesCranmer/SymbolicRegression.jl#424
- refactor: rename to
.cost
by @MilesCranmer in MilesCranmer/SymbolicRegression.jl#423 - fix: batched dataset for optimisation by @MilesCranmer in MilesCranmer/SymbolicRegression.jl#426
- refactor: task local storage instead of thread local by @MilesCranmer in MilesCranmer/SymbolicRegression.jl#427
Frontend Changes
- Update backend to v1.8.0 by @MilesCranmer in #833
- test: update deprecated sklearn test syntax by @MilesCranmer in #834
- chore(deps): bump juliacall from 0.9.23 to 0.9.24 by @dependabot in #815
- use standard library logging by @MilesCranmer in #835
- Remove warning about many features, as not really relevant anymore by @MilesCranmer in #837
- chore(deps): update beartype requirement from <0.20,>=0.19 to >=0.19,<0.21 by @dependabot in #838
- chore(deps): update jax[cpu] requirement from <0.5,>=0.4 to >=0.4,<0.6 by @dependabot in #810
Full Changelog: v1.4.0...v1.5.0
v1.4.0
What's Changed
#823 adds support for parameters in template expressions, allowing you to learn expressions under a template, that have custom coefficients which can be optimized.
Along with this, the TemplateExpressionSpec
API has changed. (The old API will continue to function, but will not have parametric expressions available).
spec = TemplateExpressionSpec(
"fx = f(x); p[1] + p[2] * fx + p[3] * fx^2",
expressions=["f"],
variable_names=["x"],
parameters={"p": 3},
)
This would learn three parameters, for the expression
You can have multiple parameter vectors, and these parameter vectors can also be indexed by categorical features. For example:
# Learn different parameters for each class:
spec = TemplateExpressionSpec(
"p1[category] * f(x1, x2) + p2[1] * g(x1^2)",
expressions=["f", "g"],
variable_names=["x1", "x2", "category"],
parameters={"p1": 3, "p2": 1},
)
This will learn an equation of the form:
where category
variable in X
rather than as a category keyword (floating point versions of the categories). This difference means that in a TemplateExpressionSpec, you can actually have multiple categories!
-
Added support for expression-level loss functions via
loss_function_expression
, which allows you to specify custom loss functions that operate on the full expression object rather than just its evaluated output. This is particularly useful when working with template expressions. -
Note that the old template expression syntax using function-style definitions is deprecated. Use the new, cleaner syntax instead:
# # Old:
# spec = TemplateExpressionSpec(
# function_symbols=["f", "g"],
# combine="((; f, g), (x1, x2, x3)) -> sin(f(x1, x2)) + g(x3)"
# )
# New:
spec = TemplateExpressionSpec(
"sin(f(x1, x2)) + g(x3)"
expressions=["f", "g"],
variable_names=["x1", "x2", "x3"],
)
Full Changelog: v1.3.1...v1.4.0
v1.3.1
What's Changed
- Automated update to backend: v1.5.1 by @github-actions in #790
Full Changelog: v1.3.0...v1.3.1
v1.3.0
What's Changed
- Expanded support for differential operators via backend 1.5.0 by @MilesCranmer in #782
e.g., say we wish to integrate
import numpy as np
from pysr import PySRRegressor, TemplateExpressionSpec
x = np.random.uniform(1, 10, (1000,)) # Integrand sampling points
y = 1 / (x**2 * np.sqrt(x**2 - 1)) # Evaluation of the integrand
expression_spec = TemplateExpressionSpec(
["f"], "((; f), (x,)) -> D(f, 1)(x)"
)
model = PySRRegressor(
binary_operators=["+", "-", "*", "/"],
unary_operators=["sqrt"],
expression_spec=expression_spec,
maxsize=20,
)
model.fit(x[:, np.newaxis], y)
which should correctly find
Full Changelog: v1.2.0...v1.3.0
v1.2.0
What's Changed
- Compatibility with new scikit-learn API and test suite by @MilesCranmer in #776
- Add differential operators and input stream specification by @MilesCranmer in #780
- (Note: the differential operators aren't yet in a stable state, and are not yet documented. However, they do work!)
- This PR also adds various GC allocation improvements in the backend.
Frontend Changelog: v1.1.0...v1.2.0
Backend Changelog: MilesCranmer/SymbolicRegression.jl@v1.2.0...v1.4.0
v1.1.0
What's Changed
- Automated update to backend: v1.2.0 by @github-actions in #770
Full Changelog: v1.0.2...v1.1.0
v1.0.2
What's Changed
- logger fixes: close streams and persist during warm start by @BrotherHa in #763
- Let sympy use log2(x) instead of log(x)/log(2) by @nerai in #712
New Contributors
- @BrotherHa made their first contribution in #763
- @nerai made their first contribution in #712
Full Changelog: v1.0.1...v1.0.2
v1.0.1
What's Changed
- Automated update to backend: v1.1.0 by @github-actions in #762
- Fall back to
eager
registry when needed by @DilumAluthge in #765
New Contributors
- @DilumAluthge made their first contribution in #765
Full Changelog: v1.0.0...v1.0.1
v1.0.0
PySR v1.0.0 Release Notes
PySR 1.0.0 introduces new features for imposing specific functional forms and finding parametric expressions. It also includes TensorBoard support, along with significant updates to the core algorithm, including some important bug fixes. The default hyperparameters have also been updated based on extensive tuning, with a maxsize of 30 rather than 20.
Major New Features
Expression Specifications
PySR 1.0.0 introduces new ways to specify the structure of equations through "Expression Specifications", that expose the new backend feature of AbstractExpression
:
Template Expressions
TemplateExpressionSpec
allows you to define a specific structure for your equations. For example:
expression_spec = TemplateExpressionSpec(["f", "g"], "((; f, g), (x1, x2, x3)) -> sin(f(x1, x2)) + g(x3)")
Parametric Expressions
ParametricExpressionSpec
enables fitting expressions that can adapt to different categories of data with per-category parameters:
expression_spec = ParametricExpressionSpec(max_parameters=2)
model = PySRRegressor(
expression_spec=expression_spec
binary_operators=["+", "*", "-", "/"],
)
model.fit(X, y, category=category) # Pass category labels
Improved Logging with TensorBoard
The new TensorBoardLoggerSpec
enables logging of the search process, as well as hyperparameter recording, which exposes the AbstractSRLogger
feature of the backend:
logger_spec = TensorBoardLoggerSpec(
log_dir="logs/run",
log_interval=10, # Log every 10 iterations
)
model = PySRRegressor(logger_spec=logger_spec)
Features logged include:
- Loss curves over time at each complexity level
- Population statistics
- Pareto "volume" logging (measures performance over all complexities with a single scalar)
- The min loss over time
Algorithm Improvements
Updated Default Parameters
The default hyperparameters have been significantly revised based on testing:
- Increased default
maxsize
from 20 to 30, as I noticed that many people use the defaults, and this maxsize would allow for more accurate expressions. - New mutation operator weights optimized for better performance, along the new mutation "rotate tree."
- Improved search parameters tuned using Pareto front volume calculations.
- Default
niterations
increased from 40 to 100, also to support better accuracy (at the expense of slightly longer default search times).
Core Changes
- New output organization: Results are now stored in
outputs/<run_id>/
rather than in the directory of execution. - Improved performance with better parallelism handling
- Support for Python 3.10+
- Updated Julia backend to version 1.10+
- Fix for aliasing issues in crossover operations
Breaking Changes
- Minimum Python version is now 3.10, and minimum Julia version is 1.10
- Output file structure has changed to use directories
- Parameter name updates:
equation_file
→output_directory
+run_id
- Added clearer naming for parallelism options, such as
parallelism="serial"
rather than the oldmultithreading=False, procs=0
which was unclear
Documentation
The documentation has a new home at https://ai.damtp.cam.ac.uk/pysr/
v0.19.4
What's Changed
- Create
load_all_packages
to install Julia extensions by @MilesCranmer in #688 - Apptainer definition file for PySR by @wkharold in #687
- JuliaCall 0.9.23 by @MilesCranmer in #703
- build(deps): bump juliacall from 0.9.21 to 0.9.22 by @dependabot in #695
New Contributors
Full Changelog: v0.19.3...v0.19.4