You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support to Berkeley DeepScaleR-1.5B-Preview
Reason: all benchmarks show how this 1.5B model performs better at o1 level in several benchmarks and for its size it is suitable to run on-device.
For what concerns model's weights:
Total parameters is 1.77B, bits is FP32, file size is 7GB
Quantization: we quantized at FP16 bits having 3.3GB weight files.
More in details:
Base Model: Fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B
Dataset: Approximately 40,000 unique problem-answer pairs compiled from AIME, AMC, Omni-MATH, and Still datasets, including data processing steps
. Reward Function: Binary system (1 for correct answers, 0 for incorrect or improperly formatted answers)
Training Steps via RL with GRPO
Phase 1: 8K Context length (8 samples per prompt)
Phase 2: 16K Context length (16 samples per prompt)
Phase 3: 24K Context length
The text was updated successfully, but these errors were encountered:
Add support to Berkeley DeepScaleR-1.5B-Preview
Reason: all benchmarks show how this 1.5B model performs better at o1 level in several benchmarks and for its size it is suitable to run on-device.
For what concerns model's weights:
More in details:
. Reward Function: Binary system (1 for correct answers, 0 for incorrect or improperly formatted answers)
Phase 1: 8K Context length (8 samples per prompt)
Phase 2: 16K Context length (16 samples per prompt)
Phase 3: 24K Context length
The text was updated successfully, but these errors were encountered: