PySpark and Ozone integration #6299
Replies: 3 comments 5 replies
-
Assuming you're not in a Kerberized environment, and you have a csv file uploaded to ofs://ozone1708496417/vol1/bucket1/abc.csv that looks like this:
Here's a script abc.py to read that csv from PySpark:
And you can execute it with:
|
Beta Was this translation helpful? Give feedback.
-
To use Ozone in any application that is using HDFS you need to bring in the shaded FS jar into the class path and provide the ozone-site.xml config updated for the Ozone deployment, then change the url to |
Beta Was this translation helpful? Give feedback.
-
Dear @kerneltime and @jojochuang, Could you please help me consider this error? I have set up the Ozone environment following this discussion.
|
Beta Was this translation helpful? Give feedback.
-
I am trying to use Apache Ozone with PySpark. In the documentation, there is a statement: "Frameworks like Apache Spark, YARN and Hive work against Ozone without needing any change" which tells about OFS and OFS3.
Could you guys help me provide some information related to set up and code samples to read and write data from Ozone using PySpark, please?
Beta Was this translation helpful? Give feedback.
All reactions