-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: strftime is slow #44764
Comments
After reading the source code, I probably know where the bottleneck comes from. Internally Lines 152 to 166 in 193ca73
Using this feature, method_c became the fastest: @timer
def method_c(index):
return index.strftime(None) I suggest to change the following line to basic_format = format is None or format == "%Y-%m-%d %H:%M:%S" and tz is None Line 134 in 193ca73
|
I confirm that we identified the same performance issue on our side, with custom formats such as It would be great to improve this in a future version ! Would you like us to propose a PR ? If so, some guidance would be appreciated. |
PRs are how things are fixed core can provide review |
I opened a draft PR. It seems to me that we could have some kind of format string processor run beforehand, in order to transform all strftime patterns i.e. I'll have a try in the upcoming days |
@auderson , just being curious: did you try running your benchmark on windows ? Indeed it seems from my first benchmark results that it is even slower (blue curve) : #46116 (comment) |
@smarie I ran this on a Linux Jupyter notebook. |
This is my result on windows 10, a bit faster than yours #46116 (comment) but still way slower than Linux EDIT@smarie Looks like windows strftime is slower than Linux! |
Thanks @auderson for this confirmation ! |
…imes faster ! Related to pandas-dev#44764
…eIndex`: string formatting is now up to 80% faster (as fast as default) when one of the default strftime formats ``"%Y-%m-%d %H:%M:%S"`` or ``"%Y-%m-%d %H:%M:%S.%f"`` is used. See pandas-dev#44764
…QLiteTable`): processing time arrays can be up to 65% faster ! (related to pandas-dev#44764)
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the master branch of pandas.
Reproducible Example
I found pd.DatatimeIndex.strftime is pretty slow when data is large.
In the following I made a simple benchmark.
method_b
first stores 'year', 'month', 'day', 'hour', 'minute', 'second', then convert them to string with f-formatter. Although it's written in python, the time spent is significantly lower.Installed Versions
INSTALLED VERSIONS
commit : 945c9ed
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-63-generic
Version : #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.4
numpy : 1.20.0
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : 0.9.3
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.29.0
pandas_datareader: 0.9.0
bs4 : None
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.22
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : 0.54.1
Prior Performance
No response
The text was updated successfully, but these errors were encountered: