[Feat] Updating the evaluation script #188
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Updating the
rl4co/tasks/eval.py
for the latest version. I created this quick merge PR to write down the usage tutorial.Motivation and Context
Types of changes
Tutorial for the evaluation
Step 1. Prepare your pre-trained model checkpoint and test instances data file. Put them in your preferred place. e.g., we will test the
AttentionModel
on TSP50:Step 2. Run the
eval.py
with your customized setting. e.g., let's use thesampling
method with atop_p=0.95
sampling strategy:You could check the
rl4co/tasks/eval.py
to see more supporting parameters with hints. Here are some notes:greedy
,sampling
,multistart_greedy
,augment_dihedral_8
,augment
,multistart_greedy_augment_dihedral_8
, andmultistart_greedy_augment
.--model
is the class name, for example, useAttentionModel
,POMO
,SymNCO
, etc..pkl
file under the--save_path
. This file includesactions
,rewards
,inference_time
, andavg_reward
. You could collect them for the next step processing.select_best=True
for sampling evaluation. In the current version, you may want to hardcode and modify it.Step 3. If you want to launch several evaluations with various parameters, you may refer to the following examples:
Evaluate POMO on TSP50 with a sampling of different Top-p and temperature:
Evaluate POMO on CVRP50 with a sampling of different Top-k and temperature:
🙌 I will update one notebook for loading the results and do some statics soom.