Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support when-then-otherwise #2258

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

feat: Support when-then-otherwise #2258

wants to merge 26 commits into from

Conversation

mccalluc
Copy link
Contributor

@mccalluc mccalluc commented Jan 30, 2025

Tests pass, but it was a lot of copy and paste, mostly: I need to understand better what's going on.

Things to try:

  • what happens with different types on input and output?
  • what happens with different types between the then and else?
  • what happens if the methods aren't given in the right order, or otherwise is missing?

@mccalluc mccalluc changed the title feat: when-then-else feat: Support when-then-otherwise Jan 30, 2025
@mccalluc mccalluc requested a review from Shoeboxam January 30, 2025 22:46
Copy link
Member

@Shoeboxam Shoeboxam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! The TODOs seem to be in the right direction. Let me know if it would help to hop on a call and work on this together.

The condition column should always be bool.

Polars does implicit fallible casting in some cases, so checks to ensure the dtypes are the same (or at least the casts are infallible) are needed.

truthy: Arc::new(truthy.expr),
falsy: Arc::new(falsy.expr),
},
fill: None, // TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually right: this is run before aggregation, so there's no empty group that needs a default.

@mccalluc mccalluc requested a review from Shoeboxam February 11, 2025 14:58
@mccalluc mccalluc marked this pull request as ready for review February 11, 2025 14:58
Copy link
Member

@Shoeboxam Shoeboxam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, some comments! Thanks.

lf_domain,
dp.symmetric_distance(),
lf.select(
pl.when(pl.col("A") == 1).then(1).alias('fifty'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can allow the computation to run, but the output domain may now contain null.

Comment on lines 54 to 55
let (truthy_domain, _truthy_metric) = t_truthy.output_space();
let (falsy_domain, _falsy_metric) = t_falsy.output_space();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to check that the metrics match too!

let (truthy_domain, _truthy_metric) = t_truthy.output_space();
let (falsy_domain, _falsy_metric) = t_falsy.output_space();

if truthy_domain != falsy_domain {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only need to check that dtypes match. It's ok if the names of the columns in the branch arms are different, and similarly if nullability differs between them, and so on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to prevent polars fallible casting!


let mut output_domain = truthy_domain.clone();
output_domain.column.drop_bounds().ok();
output_domain.column.nullable = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output_domain.column.nullable = false;
output_domain.column.nullable |= falsey_domain.column.nullable;

}

let mut output_domain = truthy_domain.clone();
output_domain.column.drop_bounds().ok();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, if the truthy domain does not have nans and falsey domain has nans, then the resulting output domain would have made the claim that the data does not have nans. Instead, lets clear those descriptors:

This is a shorthand to completely replace the element domain with the loosest descriptors.

Suggested change
output_domain.column.drop_bounds().ok();
output_domain.column.set_dtype(output_domain.column.dtype())?;

@@ -0,0 +1 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests would be good to add!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Polars: when/then/else
2 participants