Sharp ETL is an ETL framework that simplifies writing and executing ETLs by simply writing SQL workflow files. The SQL workflow file format is combined your favorite SQL dialects with just a little bit of configuration.
docker run --name sharp_etl_db -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root -e MYSQL_DATABASE=sharp_etl mysql:5.7
build from source or download jar from releases
./gradlew buildJars -PscalaVersion=2.12 -PsparkVersion=3.3.0 -PscalaCompt=2.12.15
cat spark/src/main/resources/tasks/hello_world.sql
you will see the following contents:
-- workflow=hello_world
-- loadType=incremental
-- logDrivenType=timewindow
-- step=define variable
-- source=temp
-- target=variables
SELECT 'RESULT' AS `OUTPUT_COL`;
-- step=print SUCCESS to console
-- source=temp
-- target=console
SELECT 'SUCCESS' AS `${OUTPUT_COL}`;
spark-submit --master local --class com.github.sharpdata.sharpetl.spark.Entrypoint spark/build/libs/sharp-etl-spark-standalone-3.3.0_2.12-0.1.0.jar single-job --name=hello_world --period=1440 --default-start-time="2022-07-01 00:00:00" --once --local
And you will see the output like:
== Physical Plan ==
*(1) Project [SUCCESS AS RESULT#17167]
+- Scan OneRowRelation[]
root
|-- RESULT: string (nullable = false)
+-------+
|RESULT |
+-------+
|SUCCESS|
+-------+
The compatible versions of Spark are as follows:
Spark | Scala |
---|---|
2.3.x | 2.11 |
2.4.x | 2.11 / 2.12 |
3.0.x | 2.12 |
3.1.x | 2.12 |
3.2.x | 2.12 / 2.13 |
3.3.x | 2.12 / 2.13 |
3.4.x | 2.12 / 2.13 |
3.5.x | 2.13 |