Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1897976: Consider adding a new DataFrame method to allow applying transformations in the fluent-style #2939

Open
padhia opened this issue Jan 25, 2025 · 3 comments
Assignees
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team

Comments

@padhia
Copy link

padhia commented Jan 25, 2025

What is the current behavior?

Currently, to apply transformations to a dataframe, one needs to create a named variable to call a transformation function.

E.g. consider transformation functions such as

def dedup(df: DataFrame) -> DataFrame:
    ...

def add_audit_cols(df: DataFrame) -> DataFrame:
    ...

my_df: DataFrame = ...

t1 = dedeup(my_df)
transformed_df = add_audit_cols(t1)

What is the desired behavior?

Defined a new method that allows fluent style coding

class DataFrame:
   # This is a proposed method to allow fluent-style API
    def pipe(self, f: Callable[[DataFrame], DataFrame]) -> DataFrame:
        return f(self)

# revised code

transformed_df = my_df.pipe(dedup).pipe(add_audit_cols) 

How would this improve snowflake-snowpark-python?

Fluent-style code feels more ergonomic and matches the style of most built-in DataFrame methods

References, Other Background

@padhia padhia added the feature New feature or request label Jan 25, 2025
@github-actions github-actions bot changed the title Consider adding a new DataFrame method to allow applying transformations in the fluent-style SNOW-1897976: Consider adding a new DataFrame method to allow applying transformations in the fluent-style Jan 25, 2025
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Feb 11, 2025
@sfc-gh-sghosh
Copy link

Hi @padhia ,

Thanks for raising this issue, we are looking into it, will update.

Regards,
Sujan

@sfc-gh-sghosh
Copy link

Hello @padhia ,

We do support fluent style dataframe transformation, could you let us know what is you trying to achieve and what is not working.

Example without creating any temporary df variable we can do the following transformations

df = (
pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
.query("A > 1") # Filter
.assign(C=lambda x: x['B'] * 2)
.rename(columns={'A': 'Alpha'})
)

print(df)

@sfc-gh-sghosh sfc-gh-sghosh added status-in_progress Issue is worked on by the driver team status-triage Issue is under initial triage and removed status-in_progress Issue is worked on by the driver team labels Feb 11, 2025
@padhia
Copy link
Author

padhia commented Feb 11, 2025

Hello Sujan,

Thank you for getting back. I was referring to supporting custom transformations, in addition to, the built-in DataFrame class methods for fluent style APIs

In the example I mentioned above, I have abstracted complex logic into a couple of functions. It'd be nice if I could use the functions without having to call them explicitly.

transformed_df = my_df.pipe(dedup).pipe(add_audit_cols))

Note in the above code, the (proposed) DataFrame method pipe calls the functions as an argument (thus allowing fluent style), v/s the code below, where the function calls are nested and called with DataFrame as an argument.

transformation_df = add_audit_cols(dedup(my_df))

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants