Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CPU execution of IncrementalPCA #6254

Open
wants to merge 3 commits into
base: branch-25.02
Choose a base branch
from

Conversation

wphicks
Copy link
Contributor

@wphicks wphicks commented Jan 23, 2025

Allow IncrementalPCA to run on CPU.

This PR depends on #6253 and should not be merged until after 6253 is merged. It is otherwise ready for review.

@wphicks wphicks added feature request New feature or request non-breaking Non-breaking change cuml-cpu labels Jan 23, 2025
@wphicks wphicks requested a review from a team as a code owner January 23, 2025 22:20
@wphicks wphicks requested review from betatim and vyasr January 23, 2025 22:20
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jan 23, 2025
from cuml.internals.safe_imports import gpu_only_import

cp = gpu_only_import("cupy")


def _create_rs_generator(random_state):
"""
This is a utility function that returns an instance of CuPy RandomState
This is a utility function that returns an instance of CuPy/numpy
RandomState depending on the current globally-selected device type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RandomState depending on the current globally-selected device type
RandomState depending on the current globally-selected device type

return random_state
else:
raise ValueError("random_state type must be int or CuPy RandomState")
raise ValueError("random_state type must be int or RandomState")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it useful to output the actual type of the thing that is of the wrong type in the error message. Makes it simpler to figure out what is wrong as a user (because presumably they think they are passing the right thing, but it turns out they aren't).

return cp.random.RandomState(seed=random_state)
elif isinstance(random_state, cp.random.RandomState):
return GlobalSettings().xpy.random.RandomState(seed=random_state)
elif isinstance(random_state, GlobalSettings().xpy.random.RandomState):
return random_state
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the global settings thing well enough (and this is an internal function), but I think it can happen that xpy is cupy and the caller passes in a numpy random state. Maybe useful to have an explicit case for this to tell the caller that "yes you passed a random state, but the wrong kind"

@@ -195,6 +200,9 @@ class IncrementalPCA(PCA):
0.0037122774558343763
"""

_cpu_estimator_import_path = "sklearn.decomposition.IncrementalPCA"

@device_interop_preparation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, why do we need this decorator? All it seems to do is process the constructor arguments to remove parameters that aren't arguments to the constructor. But why do we need this? What is the use-case where we want to allow someone to call the constructor with invalid keyword arguments and not error?

Comment on lines +60 to +64
if sparse_format == "csc":
pytest.skip(
"cupyx.scipy.sparse.csc.csc_matrix does not support"
" indexing as of cupy 7.6.0"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we remove the cases where sparse_format == "csc"from the parametrisation? And then also remove this skipping. One less bit of noise in the test output log :D

At least it sounds like that indexing has been remove in an old cupy version -> it ain't coming back any time soon

reco_true = cp.dot(u_true * s, v_true)
u_false, v_false = _svd_flip(u, v, u_based_decision=False)
reco_false = cp.dot(u_false * s, v_false)
@pytest.mark.parametrize("device", ["gpu", "cpu"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make a fixture that we can use in all tests that does this. That way we only have one place to change if we want to add a new device or select a subset of the devices, etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuml-cpu Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants