Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fairness metrics #434
add fairness metrics #434
Changes from 14 commits
b98b2ac
92c232d
44ef50b
c260656
31e4a5f
5922ae2
7236a14
60973ba
6998108
a9fc2e0
27f53e9
7fe7fb8
bdb7c5a
ca6d7b8
f1d663f
45f57ff
43037d9
ef7f5a1
556f2c5
6fad6a4
a9231a9
82717fa
a2dfe36
375269b
76f2d4f
587ff32
05086ea
25a893f
4436785
12ff56a
251f45b
6c0b76f
219562c
690e738
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to move forward with a fairness-oriented name, it'd be great if we could use some example data here that has a plausibly "sensitive" attribute. yardstick doesn't
Suggests
modeldata at the moment, which has some options.infer::gss
would also work well here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second and third issues linked in the PR description may be related here—there's nothing specific to only fairness about this function for now besides naming choices. This might be a solution for grouped metrics generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would lean toward changing the naming here (and the
direction = "minimize"
default?) so that it is more clearlygroupwise_metric()
or similar. I'd change the section in the pkgdown site to something like "Fairness and Group Metrics". I think this is the right way to go both because folks have non-fairness group metric needs, and because then the name helps users understand how fairness metrics work. I think it's better for learning/using, not worse.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree, switching this file over to talk about them as "group-wise" metrics is the right move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm game! Thank you.
A difficult bit here is that all yardstick metrics know about groups, so I want to make sure we don't imply that non-fairness-metrics aren't group-aware, there just isn't an intermediate grouped operation happening under the hood. I do think that
groupwise_metric()
could be a good way to phrase that (accompanied by strong docs), but also very much open to other ideas, esp. if there's some dplyr-ish concept that already speaks to this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes from the group meeting:
create_
or some other eliciting verb to indicate that this is a function factory, and others agreeddisparity_metric
as a descriptor for this type of metric that doesn't have as strong of a social connotation—seems like "disparity" could describe differences across groups regardless of whether that group is regarded as a sensitive featureThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose moving forward with
create_disparity_metric()
. Any thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of using
create_*
here, but it is a departure compared to the other function factories in yardstick (of which there are a lot, likemetric_tweak()
,metric_set()
, and so forth). Do you think it's better to stay more similar to the naming conventions of yardstick, or to use something likecreate_*
?I have a mild preference for something like
create_groupwise_metric()
because I think there is more ML community vocabulary around what "groupwise" means. The word "disparity" makes me think about the specific metric disparate impact. That being said, my opinion is not super strong here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points--if we want to look to other function factories in the package, maybe the parallel we might want to draw is with
new_metric()
? Something likenew_groupwise_metric()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that a lot,
new_groupwise_metric()
👍There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
na.rm
to both of these.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly? Should test interactions with
na_rm
here.