[RFC] [R] Add methods `$` and `[[` for attributes #11233

david-cortes · 2025-02-10T19:57:59Z

I just realized that it should be possible to make booster objects work with operators like $ and [[, thereby making them easier to inspect and even showing up under IDE autocompletes, albeit with some caveats. It might lead to some reverse dependencies not experiencing failed checks after a CRAN update.

However, I'm not 100% sure that this is a good idea - what this does it to simply translate calls to list fields towards calls to R attributes, which are not exactly the same things and don't have the same semantics (e.g. attributes must have strings as keys, whereas lists can have numbers and unnamed entries; fields are preserved when subsetting an object whereas attributes are not preserved), and I'm pretty sure they must differ in other subtle ways too; so I guess this could potentially break expected R semantics somewhere (don't yet have any example).

Then there's also the issue that we're already using [...] for slicing boosters by rounds (which incidentally makes them drop all attributes in the result, thereby also breaking R semantics for using them like list fields).

It also introduces some odd behaviors as there's both R-level and C-level attributes in booster objects, which don't work the same way.

Perhaps in order to reduce odd behaviors, these methods could be defined only for the xgboost class (not the xgb.Booster produced by xgb.train()).

Waiting for comments from @mayer79 @jameslamb @trivialfis here (ideally @bastistician too if you could take a look).

bastistician · 2025-02-11T09:10:48Z

I'm no xgboost user, so cannot judge if specific changes to achieve back-compatibility with existing usage patterns are worth the complications, nor do I know the background for the changes in the "xgb.Booster" structure returned by xgb.train(), which no longer seems to be a simple list: AFAICS, only model$ptr exists (but I suspect users don't actually need (easy) access to that component), whereas the stored function call was moved from model$call (used by, e.g., update(), when there is no specific getCall() method) to attr(model, "call"), as were the "params". Of course, the internal structure doesn't really matter for users as long as suitable post-processing methods are provided, but again, I don't know xgboost and its established R-level workflows.

By providing the reverse dependency checks, I was merely trying to help you judge the effect of the update on your R user base and "ecosystem", so a smooth release can happen soon. :)

mayer79 · 2025-02-11T16:47:14Z

I am quite undecided here. Thx @bastistician to do that revdep check. This is really helpful.

trivialfis · 2025-02-13T14:03:39Z

I'm running a rev check using this PR along with #11232 . If the number of broken packages reduces to a small number then I'm happy to have this function, maybe with a warning that this function only exists for compatibility.

On the other hand, if there's no hope of updating all reverse dependencies, then let's not merge this. We will go ahead with a xgboost3 package.

david-cortes · 2025-02-13T17:23:54Z

Pushed a small fix that should fix the segfault in #9734 (comment)

But I'm having second thoughts about this PR, even if it manages to unbreak many packages.

trivialfis · 2025-02-13T19:53:52Z

But I'm having second thoughts about this PR, even if it manages to unbreak many packages.

I think we can make a separate package xgboost3 to allow a gradual transition, which is better than having lots of workarounds only to remove them later (which is also really hard to do).

trivialfis · 2025-02-13T21:29:29Z

That said, like others, I would love to keep the original package name. But looking at some of the errors and the last update time of some of the repos, I don't have the confidence that we can push an update without upsetting lots of people.

bastistician · 2025-02-14T16:36:01Z

I think it would be better to do a proactive update of the existing xgboost package, informing maintainers of revdeps months in advance as @david-cortes suggested (and similar to what the R-spatial folks did when 'sp' migrated from 'rgdal' to 'sf'), than to leave xgboost unmaintained on CRAN, risking removal and thus upsetting users (e.g., it still fails to install on Alpine Linux, blocking ~100 other packages) and confronting reverse dependencies with a potentially shorter deadline. Furthermore, just today one of the CRAN maintainers shared the following on the R-package-devel mailing list:

Note that CRAN does not like the idea of retiring a package and replacing it by another one. This should only be a very exceptional action and CRAN will decide case based. Ideally a new version of the old package should do.

hcho3 · 2025-02-15T01:40:26Z

@bastistician @david-cortes

informing maintainers of revdeps months in advance as @david-cortes suggested ...

What's the exact timeline look like? Does it look like the following sequence?

Contact maintainers of rev deps that would break under XGBoost 3.0.
Wait a few months. Some rev deps would be fixed; others won't.
Publish XGBoost 3.0 on CRAN.

Normally, failing rev dev checks will cause XGBoost to be pulled from CRAN, but it sounds like we can keep the package as long as we try to contact rev dev maintainers beforehand? Did I understand it correctly?

Also, thanks for pointing out one crucial point: leaving {xgboost} unmaintained on CRAN will also inconvenience users, perhaps more so than if we were to bite the bullet and publish XGBoost 3.0 (and let some rev deps to break).

hcho3 · 2025-02-15T10:54:28Z

Discussion continued in #9734 (comment)

david-cortes · 2025-02-17T17:21:37Z

Closing as it leads to more breakage in reverse dependencies.

david-cortes added 2 commits February 10, 2025 20:49

add getters and setters for attributes that would work like lists

1be8beb

missing docs

4883eba

trivialfis mentioned this pull request Feb 13, 2025

Proposal for new R interface (discussion thread) #9734

Open

fix segfault when model doesn't have pointer

b2daee0

david-cortes closed this Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] [R] Add methods `$` and `[[` for attributes #11233

[RFC] [R] Add methods `$` and `[[` for attributes #11233

david-cortes commented Feb 10, 2025

bastistician commented Feb 11, 2025

mayer79 commented Feb 11, 2025

trivialfis commented Feb 13, 2025

david-cortes commented Feb 13, 2025

trivialfis commented Feb 13, 2025

trivialfis commented Feb 13, 2025

bastistician commented Feb 14, 2025

hcho3 commented Feb 15, 2025 •

edited

Loading

hcho3 commented Feb 15, 2025

david-cortes commented Feb 17, 2025

[RFC] [R] Add methods $ and [[ for attributes #11233

[RFC] [R] Add methods $ and [[ for attributes #11233

Conversation

david-cortes commented Feb 10, 2025

bastistician commented Feb 11, 2025

mayer79 commented Feb 11, 2025

trivialfis commented Feb 13, 2025

david-cortes commented Feb 13, 2025

trivialfis commented Feb 13, 2025

trivialfis commented Feb 13, 2025

bastistician commented Feb 14, 2025

hcho3 commented Feb 15, 2025 • edited Loading

hcho3 commented Feb 15, 2025

david-cortes commented Feb 17, 2025

[RFC] [R] Add methods `$` and `[[` for attributes #11233

[RFC] [R] Add methods `$` and `[[` for attributes #11233

hcho3 commented Feb 15, 2025 •

edited

Loading