rank based correlation #256

nicolaleo · 2024-07-02T07:34:06Z

nicolaleo
Jul 2, 2024

I believe that the Pearson coefficient does not adequately represent the correlation between the two evaluation methods. In fact, observing the scatter chart it is clear that there are strong discrepancies both when gpt-4 assigns a score of 0 and when it assigns a score of 100 (llama3 covers a much wider range in both cases). Probably a correlation based on the concept of rank would better describe the relation.

import scipy
pearson=scipy.stats.pearsonr(list1, list2)
spearman=scipy.stats.spearmanr(list1, list2)
kendall=scipy.stats.kendalltau(list1, list2) # Kendall's tau
pearson,spearman,kendall

(PearsonRResult(statistic=0.8048901206421856, pvalue=6.122429296730889e-24),
SignificanceResult(statistic=0.6984056005946799, pvalue=6.613506858448206e-16),
SignificanceResult(statistic=0.5729204292295474, pvalue=1.497691422488142e-12))

Answered by rasbt

Jul 2, 2024

I updated it now here: https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/03_model-evaluation/scores/correlation-analysis.ipynb

View full answer

rasbt · 2024-07-02T12:26:37Z

rasbt
Jul 2, 2024
Maintainer

Thanks for the feedback, and I agree, it's probably a good idea to add rank-based correlation coefficients. The reason I used Pearson was because I wanted to compare it with Prometheus 2 (for my own curiosity), which is one of the main methods trained for rating LLM responses:

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

But sure, I will add those! Thanks!

1 reply

rasbt Jul 2, 2024
Maintainer

I updated it now here: https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/03_model-evaluation/scores/correlation-analysis.ipynb

Answer selected by rasbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rank based correlation #256

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

rank based correlation #256

nicolaleo Jul 2, 2024

Replies: 1 comment · 1 reply

rasbt Jul 2, 2024 Maintainer

rasbt Jul 2, 2024 Maintainer

nicolaleo
Jul 2, 2024

Replies: 1 comment 1 reply

rasbt
Jul 2, 2024
Maintainer

rasbt Jul 2, 2024
Maintainer