Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Updating the evaluation script #188

Merged
merged 5 commits into from
Jun 3, 2024
Merged

[Feat] Updating the evaluation script #188

merged 5 commits into from
Jun 3, 2024

Conversation

cbhua
Copy link
Member

@cbhua cbhua commented Jun 3, 2024

Description

Updating the rl4co/tasks/eval.py for the latest version. I created this quick merge PR to write down the usage tutorial.

Motivation and Context

  • Fixing the first node selection problem for the sampling method;
  • A parser is used to launch the evaluation efficiently.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)

Tutorial for the evaluation

Step 1. Prepare your pre-trained model checkpoint and test instances data file. Put them in your preferred place. e.g., we will test the AttentionModel on TSP50:

.
├── rl4co/
│   └── ...
├── checkpoints/
│   └── am-tsp50.ckpt
└── data/
    └── tsp/
        └── tsp50_test_seed1234.npz

Step 2. Run the eval.py with your customized setting. e.g., let's use the sampling method with a top_p=0.95 sampling strategy:

python rl4co/tasks/eval.py --problem tsp --data_path data/tsp/tsp50_test_seed1234.npz --model AttentionModel --ckpt_path checkpoints/am-tsp50.ckpt --method sampling --top_p 0.95

You could check the rl4co/tasks/eval.py to see more supporting parameters with hints. Here are some notes:

  • We are now supporting 7 evaluation methods: greedy, sampling, multistart_greedy, augment_dihedral_8, augment, multistart_greedy_augment_dihedral_8, and multistart_greedy_augment.
  • The parameter --model is the class name, for example, use AttentionModel, POMO, SymNCO, etc.
  • By default, the evaluation results will be saved as a .pkl file under the --save_path. This file includes actions, rewards, inference_time, and avg_reward. You could collect them for the next step processing.
  • There are some parameters that are not commonly modified, so they are not in the parser list. For example, select_best=True for sampling evaluation. In the current version, you may want to hardcode and modify it.

Step 3. If you want to launch several evaluations with various parameters, you may refer to the following examples:

  • Evaluate POMO on TSP50 with a sampling of different Top-p and temperature:

      #!/bin/bash
    
      top_p_list=(0.5 0.6 0.7 0.8 0.9 0.95 0.98 0.99 0.995 1.0)
      temp_list=(0.1 0.3 0.5 0.7 0.8 0.9 1.0 1.1 1.2 1.5 1.8 2.0 2.2 2.5 2.8 3.0)
    
      problem=tsp
      model=POMO
      ckpt_path=checkpoints/pomo-tsp50.ckpt
      data_path=data/tsp/tsp50_test_seed1234.npz
    
      for top_p in ${top_p_list[@]}; do
          for temp in ${temp_list[@]}; do
              python rl4co/tasks/eval.py --problem ${problem} --model ${model} --ckpt_path ${ckpt_path} --data_path ${data_path} --method sampling --temperature=${temp} --top_p=${top_p} --top_k=0
          done
      done
  • Evaluate POMO on CVRP50 with a sampling of different Top-k and temperature:

      #!/bin/bash
    
      top_k_list=(5 10 15 20 25)
      temp_list=(0.1 0.3 0.5 0.7 0.8 0.9 1.0 1.1 1.2 1.5 1.8 2.0 2.2 2.5 2.8 3.0)
    
      problem=cvrp
      model=POMO
      ckpt_path=checkpoints/pomo-cvrp50.ckpt
      data_path=data/vrp/vrp50_test_seed1234.npz
    
      for top_k in ${top_k_list[@]}; do
          for temp in ${temp_list[@]}; do
              python rl4co/tasks/eval.py --problem ${problem} --model ${model} --ckpt_path ${ckpt_path} --data_path ${data_path} --method sampling --temperature=${temp} --top_p=0.0 --top_k=${top_k}
          done
      done

🙌 I will update one notebook for loading the results and do some statics soom.

@cbhua cbhua merged commit b1ced3c into main Jun 3, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants