Replies: 4 comments
-
This is an interesting issue but likely environmental discussion. In order for anyone to be able to help to solve your problem I think you need to provide way more details - ideally to some analysis and what polars processess are doing, what your environment is, maybe some logging that indicates what's going on - attaching with debuggers etc. There is absolutely no indication here that your problem is related to airflow, and not to your environment, and generally - it's very unlikely it has anything to do with Airflow. But if you can dig deeper and provide more information and analysis, then maybe someone will be able to help you. |
Beta Was this translation helpful? Give feedback.
-
Converted to discussion untill there are some evidences this is an airflow issue. |
Beta Was this translation helpful? Give feedback.
-
BTW. some random similar issues returned by google threads mention that polars might hang where it requires to open many sockets/files and they are limited - maybe that is your case on your machine where workers are run (i.e. environmental issue) , but that's a totally wild guess. |
Beta Was this translation helpful? Give feedback.
-
Also another interesting thing https://docs.pola.rs/user-guide/misc/multiprocessing/ -> you might want to look at starting your Polars context with spawn. Airlfow uses heavily multiprocessing and forks and it seems that Polars has specific guidelines and instructions that you should follow on this and some guidelines suggesting to use spawn when forking is involved.. This is a polars doc to follow, not Airflow. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.10.0
What happened?
When I create a constant out of tasks using some polars calculation like df.unique or df.with_columns, any polars calculation in tasks will be stuck. The tasks including it will show running and never end.
my polars version is 0.20.19
What you think should happen instead?
The task should run as expected
How to reproduce
Operating System
Ubuntu 20.04.6 LTS (Focal Fossa)
Versions of Apache Airflow Providers
Deployment
Official Apache Airflow Helm Chart
Deployment details
apache-airflow-providers-cncf-kubernetes 8.3.0
apache-airflow-providers-common-compat 1.2.0
apache-airflow-providers-common-io 1.3.1
apache-airflow-providers-common-sql 1.13.0
apache-airflow-providers-fab 1.1.0
apache-airflow-providers-ftp 3.9.0
apache-airflow-providers-http 4.11.0
apache-airflow-providers-imap 3.6.0
apache-airflow-providers-mysql 5.6.1
apache-airflow-providers-smtp 1.7.0
apache-airflow-providers-sqlite 3.8.0
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions