Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Compare module to the standard library #13755

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

kit-ty-kate
Copy link
Member

First proposed in #13753 (comment)

As previously extensively discussed in #9928, #9080 and in many tickets or forum posts, both the physical equality operator == and the polymorphic comparison functions compare and = are known footguns for OCaml developers of all levels.

Similarly to #9080 / #13753, this PR hopes to be a first step towards a world where OCaml does not have this footgun anymore by encouraging people to use functions from this new module (which is more explicit than using it directly from Stdlib and thus less prone to footgunning), then deprecating ==, compare, … when they are less used.

Note for reviewers: i'll add documentation to the functions once/if the interface is agreed

external ( > ) : 'a -> 'a -> bool = "%greaterthan"
external ( <= ) : 'a -> 'a -> bool = "%lessequal"
external ( >= ) : 'a -> 'a -> bool = "%greaterequal"
end
Copy link
Contributor

@nojb nojb Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in two minds about these submodules.

The main use of them is with numeric types (Int, Int64, Char, maybe String, etc). If anything, monomorphic versions of these operators should be added to those modules. I don't think the operators <, >, <=, >= are very useful in contexts where one is explicitly using a polymorphic compare (since the ordering is implementation-defined). As for = and <>, they are useful, but once you have put them inside a submodule, I don't see myself using a local open to save a few character with respect to Compare.Poly.equal.

In other words, perhaps it would be simpler not to include the Syntax submodules at all (which would also mean that we do not have to discuss what to name the submodule).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, perhaps it would be simpler not to include the Syntax submodules at all (which would also mean that we do not have to discuss what to name the submodule).

Yes, that would also be my preference to reduce decision fatigue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, that works for me. Done.

Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Docstrings need to be added.

And a second official approval is needed to move forward with this PR.

@kit-ty-kate
Copy link
Member Author

Docstrings need to be added.

what do you think makes the most sense for this module? Should we copy the docstrings from stdlib.mli or should we simply say something like Alias of {!Stdlib.compare}, or even Alias of {!Compare.Poly.compare} in stdlib.mli if we want to move the full description to the new module?

@gasche
Copy link
Member

gasche commented Feb 8, 2025

Actually it may be nice to do things the other way around, that is, move the stdlib docstrings here and point to this new module in the Stdlib documentation. This paves the way for eventually deprecating Stdlib.compare if we decide to.

@gasche
Copy link
Member

gasche commented Feb 8, 2025

I am not sure about the proposed API: Compare.Poly.compare is a mouthful, maybe there is a way to organize this module that makes the name nicer?

For example, I thought of the following "flat" interface (I'm not saying I strongly prefer it, but I think that throwing some ideas around may help decide how we feel about the API):

val phys_equal : 'a -> 'a -> bool
val equal : 'a -> 'a -> bool
val compare : 'a -> 'a -> int
val min, max: 'a -> 'a -> 'a

We loose the information that min, max depend on structural ("poly") comparison, but then it is unclear to me that they could rely on anything else.

Compare.compare still reads a bit weird, and maybe this interface is not as good in terms of showing warning signs about which functions rely on ad-hoc polymorphism.

@zapashcanon
Copy link
Contributor

Compare.Poly.compare is a mouthful, maybe there is a way to organize this module that makes the name nicer?

We could simply have Poly.compare ?

@nojb
Copy link
Contributor

nojb commented Feb 8, 2025

I thought of the following "flat" interface

In that case, better call the module "Poly".

@yallop
Copy link
Member

yallop commented Feb 8, 2025

I think it'd be good to have a name than emphasizes the fact that the behaviour is defined on representations, i.e. that the functions are not parametrically polymorphic, despite their types.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 8, 2025

Generic.compare ?

@nojb
Copy link
Contributor

nojb commented Feb 8, 2025

I think it'd be good to have a name than emphasizes the fact that the behaviour is defined on representations, i.e. that the functions are not parametrically polymorphic, despite their types.

Repr?

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 8, 2025

A twist on @nojb's proposal could be:

Compare.repr : 'a -> 'a -> int
Compare.repr_equal : 'a -> 'a -> bool
Compare.phys_equal : 'a -> 'a -> bool
Compare.repr_min : 'a -> 'a -> 'a
Compare.repr_max : 'a -> 'a -> 'a

@kit-ty-kate
Copy link
Member Author

I find the Compare.repr* alternative proposal quite elegant (I'm also ok with the Repr proposal). The only "negative" comment i would have is that for beginners it might feel a bit obscure, but much like Obj maybe this is actually something positive and might invite people to read the documentation instead of reading/using it by default without thinking much about it.

I'm happy to update the PR with the proposal that makes consensus.

@gasche
Copy link
Member

gasche commented Feb 8, 2025

I think that there is a tension between having ominous-looking names that make people think twice about using polymorphic comparison, and having pleasant-looking names that encourage people to use them and gradually organize a transition out of Stdlib.compare.

In this context I personally find that Generic.compare is better than Compare.repr: it is clear that this comparison is doing something weird, but the name itself reads well -- someone can easily guess what the function does, even without looking at the type information. On the other hand, I have ideas for future extensions of a Compare module (for example a helper for lexicographic ordering with a thunked unit -> int second parameter; or maybe a type t = Lt | Eq | Gt to eventually graduate out of magical numbers), while I suppose Generic would be restricted to weird runtime support -- I guess we could move input_value and output_value there as well, for example.

@kit-ty-kate
Copy link
Member Author

I've pushed two commits with the updated docstrings to help visualize the two alternative proposals.
I have some additional thoughts:

  • Compare.repr*:
    • Pros: introduces a Compare module which i might see people wanting to add more features to it in the future (e.g. Compare.S, operators, maybe things about implicits why not)
    • Cons: longer names. Maybe a bit awkward to use repr without its module name as opposed to compare
  • Repr.compare:
    • Pros: naming is concise, straightforward and enjoyable to use
    • Cons: i'm having trouble describing the module in more general terms that doesn't sound like "Obj but meant to be used by non-advanced users" or "Comparison function" which describes the current state of the module but feels detached from its name. I also can't think of other things to add to that module in the future that would fit its name.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 8, 2025

  • Cons: i'm having trouble describing the module in more general terms that doesn't sound like "Obj but meant to be used by non-advanced users"

Well it could be for operations on the representation of values "that you are allowed to use". For example Obj.reachable_words could be moved there.

So a synopsis could be:

(** Operations on the representation of values. *)
module Repr : sig 
…
end

@yallop
Copy link
Member

yallop commented Feb 8, 2025

I also can't think of other things to add to that module in the future that would fit its name.

One often-requested example: Repr.print. Similar functions that are already in the standard library (e.g. Marshal.to_string, Obj.size) would also fit well, if they weren't already situated elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants