Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregator dependency injection, min/max, and skipNaN #1108

Merged
merged 23 commits into from
Apr 4, 2025
Merged

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Mar 25, 2025

Helps #961

  • Refactored Aggregator, splitting it into 3 parts:
    • Input, can be supplied by a class implementing AggregatorInputHandler, has 2 implementations
    • Aggregation, can be supplied by a class implementing AggregatorAggregationHandler, has 2 implementations
    • Handling multiple columns, can be supplied by a class implementing AggregatorMultipleColumnsHandler, has 3 implementations

Together they can instantiate Aggregator in any combination you like :)

  • Added Aggregator.aggregateByOrNull() functions for selecting aggregators, like min/max, but also median/percentile in the future, with the Sequence.bestBy family of internal functions
  • added skipNaN option to sum, mean, min, max
  • Rewrote min/max with correct types (T : Comparable<T & Any>?)
    • Decided against allowing mixed number types. Allowing this requires each function to get 3 overloads to avoid ambiguity:
      • <T : Comparable<T & Any>?> for normal comparables
      • <T : Number?> for mixed number types
      • <T> ... where T : Number?, T : Comparable<T & Any?> for normal numbers
    • Plus a slight rethink of the selecting aggregator because each min with numbers would then act like a minBy { it.toUnifiedNumber() } as it needs to return unconverted minimal value
    • But if it's valueble enough, we can add it
  • tests

…egation of specific abilities of the aggregator to whichever combination the client chooses. And instead of having to create a new class for each combination, the same function can be used.
@Jolanrensen Jolanrensen mentioned this pull request Mar 25, 2025
9 tasks
@Jolanrensen Jolanrensen changed the title Another Aggregator refactor and support for min/max Aggregator dependency injection, min/max, and skipNaN Mar 28, 2025
@Jolanrensen Jolanrensen changed the title Aggregator dependency injection, min/max, and skipNaN Aggregator dependency injection, min/max, and skipNaN Mar 31, 2025
@Jolanrensen
Copy link
Collaborator Author

@Jolanrensen Jolanrensen marked this pull request as ready for review April 1, 2025 14:27
@JvmName("isNaWithContract")
@Suppress("NOTHING_TO_INLINE")
@OptIn(ExperimentalContracts::class)
internal inline fun <T : Any?> T.isNA(): Boolean {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isNa -> isMissing() for example values.filterNot { it.isMissing() }, bad previous naming for String for example
"abc".isNA()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe... but "NA" is a well documented DataFrame concept by now. It even has a documentation page https://kotlin.github.io/dataframe/nanandna.html. So I think it's reasonable to keep using the name, meaning "Not Available", so "null or NaN". If you have doubts about it, we could create a separate issue though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not big doubts, I starting to think about NA as about NaN - the problem with closest definitions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a small KDoc about what NA is 👍 .

@Jolanrensen
Copy link
Collaborator Author

Mixed number type proposal #1113

Copilot

This comment was marked as resolved.

@Jolanrensen
Copy link
Collaborator Author

Notebooks failing issue: #1116

@Jolanrensen
Copy link
Collaborator Author

Seems I've caused some NoSuchMethodErrors in combination with kandy... Will need to compare the api dump for minuses and add hidden overloads where possible

Copy link
Collaborator

@AndreiKingsley AndreiKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Regarding min/max mixed types: it seems to me that in such functions it is better to tell the user "please cast to a common type T: Comparable." Adding x3 overloads is a huge amount of initial work and it will be even harder to maintain this in the future. I'd keep std lib logic.

@JvmName("isNaWithContract")
@Suppress("NOTHING_TO_INLINE")
@OptIn(ExperimentalContracts::class)
internal inline fun <T : Any?> T.isNA(): Boolean {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a small KDoc about what NA is 👍 .

@@ -34,242 +34,245 @@ import kotlin.reflect.typeOf

// region DataColumn

public fun DataColumn<Number?>.mean(skipNA: Boolean = skipNA_default): Double = Aggregators.mean(skipNA).aggregate(this)
public fun DataColumn<Number?>.mean(skipNaN: Boolean = skipNaN_default): Double =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I don't like about skipNaN_default - the user doesn't see the actual default value of the arguments. It's a bit annoying or something. And also global variables are the evil 👿 . Do we keep it as a variable on purpose? I'd really like to remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they will :) https://youtrack.jetbrains.com/issue/KTIJ-31662/Parameter-info-Show-value-of-const-default-argument-when-calling-a-function
and yes, we keep it on purpose; it's really good to have a single source of truth for a default value like this. Without this constant it's really easy to create an overload with a different default accidently, so it promotes consistency and predictability of the API.

@Jolanrensen Jolanrensen merged commit 7c63a80 into master Apr 4, 2025
4 of 6 checks passed
@Jolanrensen Jolanrensen added this to the 1.0.0-Beta1 (0.16) milestone Apr 4, 2025
@Jolanrensen Jolanrensen added the enhancement New feature or request label Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants