-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregator
dependency injection, min
/max
, and skipNaN
#1108
Conversation
…egation of specific abilities of the aggregator to whichever combination the client chooses. And instead of having to create a new class for each combination, the same function can be used.
…ingAggregationHandler for min/max etc. Added SequenceBestBy.kt helpers
Aggregator dependency injection
, min/max, and skipNaN
Aggregator dependency injection
, min/max, and skipNaN
Aggregator
dependency injection, min
/max
, and skipNaN
Seems our notebook tests now fail because of https://youtrack.jetbrains.com/issue/KTNB-967/IllegalStateException-null-DefinitelyNotNullType-for-T-exception-while-analyzing-expression |
core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/statistics/min.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/math/min.kt
Outdated
Show resolved
Hide resolved
...aframe/impl/aggregation/aggregators/multipleColumnsHandlers/TwoStepMultipleColumnsHandler.kt
Show resolved
Hide resolved
...rg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/inputHandlers/AnyInputHandler.kt
Show resolved
Hide resolved
...otlin/org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorInputHandler.kt
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/DataColumnType.kt
Show resolved
Hide resolved
@JvmName("isNaWithContract") | ||
@Suppress("NOTHING_TO_INLINE") | ||
@OptIn(ExperimentalContracts::class) | ||
internal inline fun <T : Any?> T.isNA(): Boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isNa -> isMissing() for example values.filterNot { it.isMissing() }, bad previous naming for String for example
"abc".isNA()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe... but "NA" is a well documented DataFrame concept by now. It even has a documentation page https://kotlin.github.io/dataframe/nanandna.html. So I think it's reasonable to keep using the name, meaning "Not Available", so "null or NaN". If you have doubts about it, we could create a separate issue though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not big doubts, I starting to think about NA as about NaN - the problem with closest definitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a small KDoc about what NA is 👍 .
Mixed number type proposal #1113 |
Notebooks failing issue: #1116 |
Seems I've caused some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Regarding min/max mixed types: it seems to me that in such functions it is better to tell the user "please cast to a common type T: Comparable." Adding x3 overloads is a huge amount of initial work and it will be even harder to maintain this in the future. I'd keep std lib logic.
@JvmName("isNaWithContract") | ||
@Suppress("NOTHING_TO_INLINE") | ||
@OptIn(ExperimentalContracts::class) | ||
internal inline fun <T : Any?> T.isNA(): Boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a small KDoc about what NA is 👍 .
@@ -34,242 +34,245 @@ import kotlin.reflect.typeOf | |||
|
|||
// region DataColumn | |||
|
|||
public fun DataColumn<Number?>.mean(skipNA: Boolean = skipNA_default): Double = Aggregators.mean(skipNA).aggregate(this) | |||
public fun DataColumn<Number?>.mean(skipNaN: Boolean = skipNaN_default): Double = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing I don't like about skipNaN_default
- the user doesn't see the actual default value of the arguments. It's a bit annoying or something. And also global variables are the evil 👿 . Do we keep it as a variable on purpose? I'd really like to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they will :) https://youtrack.jetbrains.com/issue/KTIJ-31662/Parameter-info-Show-value-of-const-default-argument-when-calling-a-function
and yes, we keep it on purpose; it's really good to have a single source of truth for a default value like this. Without this constant it's really easy to create an overload with a different default accidently, so it promotes consistency and predictability of the API.
Helps #961
AggregatorInputHandler
, has 2 implementationsAggregatorAggregationHandler
, has 2 implementationsAggregatorMultipleColumnsHandler
, has 3 implementationsTogether they can instantiate
Aggregator
in any combination you like :)Aggregator.aggregateByOrNull()
functions for selecting aggregators, like min/max, but also median/percentile in the future, with the Sequence.bestBy family of internal functionsT : Comparable<T & Any>?
)<T : Comparable<T & Any>?>
for normal comparables<T : Number?>
for mixed number types<T> ... where T : Number?, T : Comparable<T & Any?>
for normal numbersmin
with numbers would then act like aminBy { it.toUnifiedNumber() }
as it needs to return unconverted minimal value