Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

row_to_names improvement #1379

Merged
merged 19 commits into from
Jul 13, 2024
Merged

row_to_names improvement #1379

merged 19 commits into from
Jul 13, 2024

Conversation

samukweku
Copy link
Collaborator

@samukweku samukweku commented Jun 20, 2024

PR Description

Please describe the changes proposed in the pull request:

  • minor speed improvement for row_to_names
  • lazyframe not supported; the collect method cannot be avoided here.
  • minor changes for pandas' row_to_names

speed improvement pandas (YMMV):

import pandas as pd; import janitor as jn; import numpy as np
df = pd.DataFrame({
    "a": ["nums", 6, 9],
    "b": ["chars", "x", "y"],
})
df = pd.concat([df]*100_000, ignore_index=True)

# this PR
%timeit df.row_to_names(0, remove_rows=True, reset_index=True)
2.41 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.row_to_names(0)
27.3 µs ± 340 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


# dev
%timeit df.row_to_names(0, remove_rows=True, reset_index=True)
13.2 ms ± 72.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.row_to_names(0)
2.81 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This PR relates to #1352 .

@samukweku samukweku self-assigned this Jun 20, 2024
@ericmjl
Copy link
Member

ericmjl commented Jun 20, 2024

samuel.oranyeli added 2 commits June 21, 2024 09:23
Copy link

codecov bot commented Jun 20, 2024

Codecov Report

Attention: Patch coverage is 94.33962% with 3 lines in your changes missing coverage. Please review.

Project coverage is 87.36%. Comparing base (62c57c6) to head (9010b06).
Report is 25 commits behind head on dev.

Current head 9010b06 differs from pull request most recent head ff82eba

Please upload reports for the commit ff82eba to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1379      +/-   ##
==========================================
- Coverage   94.48%   87.36%   -7.12%     
==========================================
  Files          80       86       +6     
  Lines        4367     5067     +700     
==========================================
+ Hits         4126     4427     +301     
- Misses        241      640     +399     

@samukweku samukweku marked this pull request as draft June 20, 2024 23:54
@samukweku samukweku force-pushed the samukweku/polars_row_names_improve branch from 07254a7 to 84f9d25 Compare July 7, 2024 06:04
@samukweku samukweku marked this pull request as ready for review July 7, 2024 06:54
Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, @samukweku! Not much that I can see needing nitpicking right now.

@ericmjl
Copy link
Member

ericmjl commented Jul 13, 2024

I am going to merge!

@ericmjl ericmjl merged commit a14061c into dev Jul 13, 2024
4 checks passed
@ericmjl ericmjl deleted the samukweku/polars_row_names_improve branch July 13, 2024 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants