Skip to content

Latest commit

 

History

History
48 lines (30 loc) · 1.13 KB

README.md

File metadata and controls

48 lines (30 loc) · 1.13 KB

PaD

This repo inculdes the code in the paper PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning (NAACL 2024 Long Paper).

Main_figure

Prerequisites

  • torch >= 2.0
  • transformers
  • download pre-trained models (CodeT5_small/base/large on Hugging Face)

Data

├── Data
   └── GSM8K   
       └── train-enhanced.json # pad-augmented gsm8k training data by gpt-3.5-turbo
       └── test_add_code.json # test data with pad-augmented label code
   └── MultiArith  # test data with pad-augmented label code
   └── SVAMP # test data with pad-augmented label code
   └── ASDiv # test data with pad-augmented label code

The data of self-refine task is here

Quick Start

1. Training

Execute the following command to re-produce our models:

sh run_seq2seq.sh

2. Eval

run the following scripts to generate your results:

sh run_seq2seq_test.sh