You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, there is some confusion around DataFrames being passed into DocumentDataset. For now, we expect them to be Dask or Dask-cuDF DataFrames, so we should add stronger type checking for this.
Let's also investigate automatically converting Pandas and cuDF DataFrames to Dask and cuDF-Dask DataFrames, respectively. Perhaps we should just throw an error if a user tries to create a DocumentDataset with them, or maybe we should try to automatically convert them to Dask.
We should at least do the former for now.
If we decide to do the latter, this could involve using get_current_client, is_cudf_type, from_pandas, and creating a from_cudf function. If we decide to go this route, it would probably be a good idea to tell the user that the conversion is happening to avoid any confusion if they try looking at the DocumentDataset.df.
Right now, there is some confusion around DataFrames being passed into
DocumentDataset
. For now, we expect them to be Dask or Dask-cuDF DataFrames, so we should add stronger type checking for this.Let's also investigate automatically converting Pandas and cuDF DataFrames to Dask and cuDF-Dask DataFrames, respectively. Perhaps we should just throw an error if a user tries to create a
DocumentDataset
with them, or maybe we should try to automatically convert them to Dask.We should at least do the former for now.
If we decide to do the latter, this could involve using get_current_client, is_cudf_type, from_pandas, and creating a
from_cudf
function. If we decide to go this route, it would probably be a good idea to tell the user that the conversion is happening to avoid any confusion if they try looking at theDocumentDataset.df
.Somewhat related to #79.
cc @ayushdg @ryantwolf @VibhuJawa
The text was updated successfully, but these errors were encountered: