[Feat] Updating the evaluation script #188

cbhua · 2024-06-03T20:56:08Z

Description

Updating the rl4co/tasks/eval.py for the latest version. I created this quick merge PR to write down the usage tutorial.

Motivation and Context

Fixing the first node selection problem for the sampling method;
A parser is used to launch the evaluation efficiently.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)

Tutorial for the evaluation

Step 1. Prepare your pre-trained model checkpoint and test instances data file. Put them in your preferred place. e.g., we will test the AttentionModel on TSP50:

.
├── rl4co/
│   └── ...
├── checkpoints/
│   └── am-tsp50.ckpt
└── data/
    └── tsp/
        └── tsp50_test_seed1234.npz

Step 2. Run the eval.py with your customized setting. e.g., let's use the sampling method with a top_p=0.95 sampling strategy:

python rl4co/tasks/eval.py --problem tsp --data_path data/tsp/tsp50_test_seed1234.npz --model AttentionModel --ckpt_path checkpoints/am-tsp50.ckpt --method sampling --top_p 0.95

You could check the rl4co/tasks/eval.py to see more supporting parameters with hints. Here are some notes:

We are now supporting 7 evaluation methods: greedy, sampling, multistart_greedy, augment_dihedral_8, augment, multistart_greedy_augment_dihedral_8, and multistart_greedy_augment.
The parameter --model is the class name, for example, use AttentionModel, POMO, SymNCO, etc.
By default, the evaluation results will be saved as a .pkl file under the --save_path. This file includes actions, rewards, inference_time, and avg_reward. You could collect them for the next step processing.
There are some parameters that are not commonly modified, so they are not in the parser list. For example, select_best=True for sampling evaluation. In the current version, you may want to hardcode and modify it.

Step 3. If you want to launch several evaluations with various parameters, you may refer to the following examples:

Evaluate POMO on TSP50 with a sampling of different Top-p and temperature:

  #!/bin/bash

  top_p_list=(0.5 0.6 0.7 0.8 0.9 0.95 0.98 0.99 0.995 1.0)
  temp_list=(0.1 0.3 0.5 0.7 0.8 0.9 1.0 1.1 1.2 1.5 1.8 2.0 2.2 2.5 2.8 3.0)

  problem=tsp
  model=POMO
  ckpt_path=checkpoints/pomo-tsp50.ckpt
  data_path=data/tsp/tsp50_test_seed1234.npz

  for top_p in ${top_p_list[@]}; do
      for temp in ${temp_list[@]}; do
          python rl4co/tasks/eval.py --problem ${problem} --model ${model} --ckpt_path ${ckpt_path} --data_path ${data_path} --method sampling --temperature=${temp} --top_p=${top_p} --top_k=0
      done
  done

Evaluate POMO on CVRP50 with a sampling of different Top-k and temperature:

  #!/bin/bash

  top_k_list=(5 10 15 20 25)
  temp_list=(0.1 0.3 0.5 0.7 0.8 0.9 1.0 1.1 1.2 1.5 1.8 2.0 2.2 2.5 2.8 3.0)

  problem=cvrp
  model=POMO
  ckpt_path=checkpoints/pomo-cvrp50.ckpt
  data_path=data/vrp/vrp50_test_seed1234.npz

  for top_k in ${top_k_list[@]}; do
      for temp in ${temp_list[@]}; do
          python rl4co/tasks/eval.py --problem ${problem} --model ${model} --ckpt_path ${ckpt_path} --data_path ${data_path} --method sampling --temperature=${temp} --top_p=0.0 --top_k=${top_k}
      done
  done

🙌 I will update one notebook for loading the results and do some statics soom.

cbhua added 5 commits June 4, 2024 04:54

[Feat] Updating TSP context to support sampling evaluation

c15743d

[Feat, BugFix] Updating the decoding strategy for sampling evaluation

bedc564

[BugFix] Fixing the sampling evaluation problem and better call script

086f239

[Feat] Adding several eval methods parameter to the parser

5f09104

Merge branch 'main' into dev-eval

3bd1597

cbhua merged commit b1ced3c into main Jun 3, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Updating the evaluation script #188

[Feat] Updating the evaluation script #188

cbhua commented Jun 3, 2024 •

edited

Loading

[Feat] Updating the evaluation script #188

[Feat] Updating the evaluation script #188

Conversation

cbhua commented Jun 3, 2024 • edited Loading

Description

Motivation and Context

Types of changes

Tutorial for the evaluation

cbhua commented Jun 3, 2024 •

edited

Loading