Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [R] Add methods $ and [[ for attributes #11233

Closed
wants to merge 3 commits into from

Conversation

david-cortes
Copy link
Contributor

ref #9810
ref #9734 (comment)

I just realized that it should be possible to make booster objects work with operators like $ and [[, thereby making them easier to inspect and even showing up under IDE autocompletes, albeit with some caveats. It might lead to some reverse dependencies not experiencing failed checks after a CRAN update.

However, I'm not 100% sure that this is a good idea - what this does it to simply translate calls to list fields towards calls to R attributes, which are not exactly the same things and don't have the same semantics (e.g. attributes must have strings as keys, whereas lists can have numbers and unnamed entries; fields are preserved when subsetting an object whereas attributes are not preserved), and I'm pretty sure they must differ in other subtle ways too; so I guess this could potentially break expected R semantics somewhere (don't yet have any example).

Then there's also the issue that we're already using [...] for slicing boosters by rounds (which incidentally makes them drop all attributes in the result, thereby also breaking R semantics for using them like list fields).

It also introduces some odd behaviors as there's both R-level and C-level attributes in booster objects, which don't work the same way.

Perhaps in order to reduce odd behaviors, these methods could be defined only for the xgboost class (not the xgb.Booster produced by xgb.train()).

Waiting for comments from @mayer79 @jameslamb @trivialfis here (ideally @bastistician too if you could take a look).

@bastistician
Copy link
Contributor

I'm no xgboost user, so cannot judge if specific changes to achieve back-compatibility with existing usage patterns are worth the complications, nor do I know the background for the changes in the "xgb.Booster" structure returned by xgb.train(), which no longer seems to be a simple list: AFAICS, only model$ptr exists (but I suspect users don't actually need (easy) access to that component), whereas the stored function call was moved from model$call (used by, e.g., update(), when there is no specific getCall() method) to attr(model, "call"), as were the "params". Of course, the internal structure doesn't really matter for users as long as suitable post-processing methods are provided, but again, I don't know xgboost and its established R-level workflows.

By providing the reverse dependency checks, I was merely trying to help you judge the effect of the update on your R user base and "ecosystem", so a smooth release can happen soon. :)

@mayer79
Copy link
Contributor

mayer79 commented Feb 11, 2025

I am quite undecided here. Thx @bastistician to do that revdep check. This is really helpful.

@trivialfis
Copy link
Member

I'm running a rev check using this PR along with #11232 . If the number of broken packages reduces to a small number then I'm happy to have this function, maybe with a warning that this function only exists for compatibility.

On the other hand, if there's no hope of updating all reverse dependencies, then let's not merge this. We will go ahead with a xgboost3 package.

@david-cortes
Copy link
Contributor Author

Pushed a small fix that should fix the segfault in #9734 (comment)

But I'm having second thoughts about this PR, even if it manages to unbreak many packages.

@trivialfis
Copy link
Member

But I'm having second thoughts about this PR, even if it manages to unbreak many packages.

I think we can make a separate package xgboost3 to allow a gradual transition, which is better than having lots of workarounds only to remove them later (which is also really hard to do).

@trivialfis
Copy link
Member

That said, like others, I would love to keep the original package name. But looking at some of the errors and the last update time of some of the repos, I don't have the confidence that we can push an update without upsetting lots of people.

@bastistician
Copy link
Contributor

I think it would be better to do a proactive update of the existing xgboost package, informing maintainers of revdeps months in advance as @david-cortes suggested (and similar to what the R-spatial folks did when 'sp' migrated from 'rgdal' to 'sf'), than to leave xgboost unmaintained on CRAN, risking removal and thus upsetting users (e.g., it still fails to install on Alpine Linux, blocking ~100 other packages) and confronting reverse dependencies with a potentially shorter deadline. Furthermore, just today one of the CRAN maintainers shared the following on the R-package-devel mailing list:

Note that CRAN does not like the idea of retiring a package and replacing it by another one. This should only be a very exceptional action and CRAN will decide case based. Ideally a new version of the old package should do.

@hcho3
Copy link
Collaborator

hcho3 commented Feb 15, 2025

@bastistician @david-cortes

informing maintainers of revdeps months in advance as @david-cortes suggested ...

What's the exact timeline look like? Does it look like the following sequence?

  1. Contact maintainers of rev deps that would break under XGBoost 3.0.
  2. Wait a few months. Some rev deps would be fixed; others won't.
  3. Publish XGBoost 3.0 on CRAN.

Normally, failing rev dev checks will cause XGBoost to be pulled from CRAN, but it sounds like we can keep the package as long as we try to contact rev dev maintainers beforehand? Did I understand it correctly?

Also, thanks for pointing out one crucial point: leaving {xgboost} unmaintained on CRAN will also inconvenience users, perhaps more so than if we were to bite the bullet and publish XGBoost 3.0 (and let some rev deps to break).

@hcho3
Copy link
Collaborator

hcho3 commented Feb 15, 2025

Discussion continued in #9734 (comment)

@david-cortes
Copy link
Contributor Author

Closing as it leads to more breakage in reverse dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants