-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit polymorphism discriminator #2903
Comments
Hey, KtMongo author here. I'll give a bit more context. KtMongo relies on KotlinX.Serialization to convert Kotlin types to and from MongoDB BSON. Through this, we support KtMongo provides an extensive DSL over MongoDB operators. This DSL is implemented through extension functions and generics, allowing us to refer to fields when writing requests: @Serializable
class User(
val name: String,
val age: Int,
)
users.find {
User::age gt 18
} So far, so good. The problem comes because of the imaginary field KotlinX.Serialization creates on sealed types: @Serializable
sealed class Foo
@Serializable
class FooA(val a: Int) : Foo()
@Serializable
class FooB(val b: String) : Foo() A instance of {
"type": "FooB",
"b": "foo"
} We would like to be able to write a query for specifically instances of users.find {
User::foo / Foo::type eq "foo" // ⚠ Foo::type does not exist
} This could be worked around by creating an extension function and hard-coding the name By using explicit enums, this issue removes the magic around the discriminant, so it can be accessed through regular Kotlin code. |
@Tmpod The information you are looking for is where @CLOVIS-AI You can get this information as text from the descriptor: |
@CLOVIS-AI Thank you for adding more context!
What I'm proposing would avoid all that hardcoding. By explicitly marking a field as the discriminant, you remove the unintuitive behaviour (imo) of virtual fields. The serialization library knows which field it should encode/decode first and how to map values to serializers. I just don't know how feasible it is to get the discriminant value from each implementor in the context of an annotation processor or compiler plugin. To me, it should be possible, the value is static, but I really am not familiar with the capabilities exposed to processors/plugins (though I've already noticed limitations, such as annotations being incapable of taking values I'd consider compile-time constants, such as enum variants).
I'm not sure I follow. I used |
I guess what you want here is #1664, so it's a duplicate. I also would like to point out that kotlinx.serialization by default always performs polymorphic serialization in array format: |
Do you mean you want to read the value of the type without using runtime type information (using the fact that the instance is of the specific type). As to what "field" is used to decide how to deserialize, the name is always the same (except when explicitly overridden). In the specific case of Json this can be provided either globally in the format configuration, or per type (not attribute) using As to "not having to resort to @Serializable
abstract class Base {
abstract val type: String
}
@Serializable
class Child(val myProp:String) {
override val type: String get() = Child.serializer().descriptor.serialName // or a hardcoded/cached version
} |
I suppose it is related, yes, but what I was suggesting was more versatile, as I see it. The referenced issue is also in part about nested hierarchies and
I had read about that, but I'm not grasping the reasoning behind why kx.ser doesn't support field-based discriminants at the core library level. Why does it need to be a format-specific thing? Either way, an annotation that could tell the format implementors which field should be used for discriminating would be nice, instead of having each format re-implement the same annotations (kbson has a similar |
Can you clarify what the value will return? Also, that is a bit verbose to add to each class. Could there be a way to automatically generate this somehow? |
Yeah. Since that field exists in the serialized document, why not have it in the deserialized object too? Makes some things easier, and in the case of KMongo/KtMongo's and similar DSLs, it's important to have it.
I am aware of that annotation, however it's format specific (and string-based, but that's a minor nitpick and a limitation of annotations). I think an annotation on the field itself would be clearer to developers — this field will be used to resolve polymorphic serializers; no virtual fields at play, no using Again, not entirely sure if this is truly implementable, but perhaps this could work: generate and expose a getter for the discriminant field (its SerialDescriptor, maybe?), that the implementing formats and serializers could use to know which field to encode first and how to match the first decoded value to select the deserializer. It sounds to me like it could even be implemented a separate thing, though it would be more useful to have that built into the core library itself. If this sounds plausible, I could try implement it to help illustrate what I mean by all this 😅 |
This returns
You can always use I had a bit more of a look. Basically you want to do typesafe queries. That makes sense. However, rather than: User::foo / Foo::type eq "foo" Why not have something like: User::foo / isInstance<Foo>() Given that KtMongo wants to be typesafe it should use these type annotations transparently and have a primacy of types. Of course it is also valid to also allow a string parameter. As a bit of format based perspective, if you look at Given the application area, it may actually make a lot of sense to have a custom format for MongoKt. Using such it would be possible to do "shape matching", work directly in Bson, and have your type attributes (formats on decoding can provide virtual attributes that don't exist in the serial form, as long as a valid value can be produced). Note that formats are ultimately in control over what happens |
In theory, that would be the best option, yes. But it cannot be implemented due to KotlinX.Serialization being magic:
The third blocking issue was finding out the class's serial name, but your previous comment solves that one.
There is one, but it's maintained by MongoDB Inc and I cannot easily extend it. Aside the governance issue, changing the format now would break all existing production deployments, which is not acceptable :/ |
Actually, if you have the type of If you want to know the name of the type parameter, this is actually part of the serial descriptor of the polymorphic serializer (with 2 parts: type, name) and may carry annotations as well. For element annotations you need to use the descriptor of the container object (for example if that overrides the default name in the case of Json). You can also use that annotation to get the static element type (base class). A key challenge will be translating your type to the expected value of the type annotation. We can know whether it is polymorphic. We can extract the type tree from the serializer/serial descriptor of the container, the defaults can be retrieved from the format (or your own configuration that creates the format) and thus retrieve both annotations and serial names. Note that the json format doesn't actually fallback to alternative type discriminator names. What I suspect is that you have existing data where the discriminator is somewhat unpredictable, and potentially requiring different configurations for the format as well. You will have to somehow collect that information, be it in code, or annotations. This can then be picked up my the type safe accessors. |
Sorry, this explanation is quite abstract and I'm not sure I'm following everything. To make it a bit more concrete, how would you implement the following? @Serializable
class User(
val name: String,
val role: UserRole,
)
@Serializable
sealed class UserRole {
@Serializable
@SerialName("Guest")
data object Guest : UserRole()
@Serializable
@SerialName("Employee")
data class Employee(val id: String) : UserRole()
}
data class TypeInfo(
val discriminatorFieldName: String,
val discriminatorValue: String,
)
inline fun <reified T, reified F, reified F2 : F> KProperty1<T, F>.typeInfo(): TypeInfo {
TODO("what should be here?")
}
// Expected usage:
User::role.typeInfo<_, _, UserRole.Employee>() // TypeInfo(…fieldName: "type", …value: "Employee") |
@CLOVIS-AI I've created an example here: inline fun <reified T, reified F, reified F2 : F> KProperty1<T, F>.typeInfo(): TypeInfo {
val outerDesc = serializer<T>().descriptor
val attrName = this.name
val innerDesc = serializer<F>().descriptor
val concreteDesc = serializer<F2>().descriptor
val typeDesc = (innerDesc.annotations.filterIsInstance<JsonClassDiscriminator>().firstOrNull()
?: outerDesc.annotations.filterIsInstance<JsonClassDiscriminator>().firstOrNull())
?.let { it.discriminator }
?: "type"
return TypeInfo(typeDesc, concreteDesc.serialName)
} This example is perhaps overdoing it, but it should help. Btw. you should note that the attribute name is not the same as the serialized name in the case that |
Thank you for the input @pdvrieze. I think your example helps to show the problem I was talking about — it's awkward to work with the current closed polymorphism implementation, and trying to do something else is unreliable (in part because of The idea I'm proposing would make it significantly easier to play with closed polymorphism and go beyond just string discriminators. If you could explicitly say which field should be used as the discriminator and have that be part of the serial descriptor, the aforementioned use-cases would be much easier. In short, it feels weird that this core part of closed polymorphism leaves so much for the format to implement and makes you deal with unintuitive configurations if you don't want to use FQNs as discriminators. In my opinion, the serialization library could (and should) facilitate this, while leaving the finer implementation details to the formats themselves. PS: Apologies for the delayed response. Also, heads up @pdvrieze, your example link is broken! It seems to be referencing the upstream repo instead of your fork. |
@Tmpod The discriminator value can be easily determined. The default depends on the json configuration used. The fundamental issue is that property/element mapping is not straightforward (implementation defined) and As in compatibility, you are asking to introduce an additional attribute in each value instance (that is effectively constant). That is far from trivial, nor valid. It also doesn't fit the fact that is specific to formats like json. In terms of writing code I don't see much difference between using an annotation on the base class and an attribute on that base class. In code the type discriminator attribute is in reality a virtual property that reflects the class pointer. In json it is merely a convention to have this attribute work at a meta-level. There is a sort-of compatible way to allow for binary compatible specification of the discriminator name in a format agnostic way, and that would mean that Overall I see the challenge you have in that you want to blend code representation and data representation, something for which you fundamentally need some degree of introspection/reflection. The serialization library/plugin/framework is not designed with that in mind, but provides only that which is needed to interact with a (de)serializer where that (de)serializer implements (internally, hidden) the interaction between the logical data level representation (read the thing that defines the structure of your Json) and the Kotlin model (your actual class, attributes etc.). Formats should not explicitly access links (this is what serializers do) as this would break custom serializers, and therefore the links are not generated as metadata (at least one of the reasons). As to the application in interaction with mongo. It may very well be that the default Json format is not the most appropriate one for your case, but that also depends on whether you can influence the mongo format, or there are other systems that change the data. As to using FQNs as type names, that is something a custom format can adjust. But doing so comes at an overhead for all users that do not need that functionality for json. btw. the example link is fixed (it's not a fork though). |
What is your use-case and why do you need this feature?
I've been experimenting with MongoDB through KMongo and KtMongo, specifically around polymorphic documents. Both kbson (used by KMongo) and the official Kotlin Mongo driver (used by KtMongo) support polymorphic serialization, but always through a virtual field (whose name is dependent on the format).
This isn't ideal. You have to resort to clunky workarounds to use the type-safe query API, you have use
SerialName
everywhere and in general it isn't intuitive at all. Implicit fields make code harder to reason about, and often require diving into library sources to really understand what's happening.Here's an example with current implicit fields
Describe the solution you'd like
Let's consider a new
@PolymorphicDiscriminator
annotation that tells kx.ser and implementing formats which field should be used for polymorphic (de)serialization. One could write the following instead:The same example but using this suggested annotation
I'm not well versed in compiler plugins or annotation processors, so I'm not entirely sure on the implementation details, but all the semantic information required for such an annotation exist. When you're processing a class that is part of a sealed hierarchy, you check if the value it defines is always known at compile-time and if it is unique (e.g. you can't have two implementors use the same discriminator value). I'd even say the value could be anything directly matchable in
when
expressions.If the values are all consistent, you can generate the polymorphic deserialization function which selects the correct implementor serializer based on the decoded discriminator, akin to what already happens on the current implementation based on an implicit field.
Please correct me if anything I've said is wrong or if my suggestion is unattainable at the moment. Thank you!
The text was updated successfully, but these errors were encountered: