Dumping Machine is an application which dumps Kafka Avro topics to S3 or HDFS as Parquet.
- Clone this repo to your local machine using
https://github.com/grupozap/dumping-machine
- JDK 8
Make sure you've made changes to config/application.yml
$ ./gradlew clean run
Partitioning is by date and hour
{TOPIC_NAME}/{DATE}/{HOUR}/{PARQUET_FILE}
Example:
prod-dataplatform-events/dt=2019-08-30/hr=22/1_78465.parquet
prod-dataplatform-events/dt=2019-08-30/hr=23/3_78977.parquet
prod-dataplatform-events/dt=2019-08-31/hr=00/8_77567.parquet
Dumping Machine supports Hive Metastore for the following operations:
- Create database
- Create table
- Update table
- Add partition
Made with ❤️ by the Grupo ZAP engineering team