Skip to content
This repository has been archived by the owner on Dec 29, 2023. It is now read-only.

Dumping Machine is an application which dumps Kafka Avro topics to S3 or HDFS as Parquet

Notifications You must be signed in to change notification settings

escaletech/dumping-machine

 
 

Repository files navigation

Dumping Machine

Dumping Machine is an application which dumps Kafka Avro topics to S3 or HDFS as Parquet.


Table of Contents (Optional)


Installation

  • Clone this repo to your local machine using https://github.com/grupozap/dumping-machine

Build requirements

  • JDK 8

Setup

Make sure you've made changes to config/application.yml

$ ./gradlew clean run

Compatibility


Partition

Partitioning is by date and hour

{TOPIC_NAME}/{DATE}/{HOUR}/{PARQUET_FILE}

Example:

prod-dataplatform-events/dt=2019-08-30/hr=22/1_78465.parquet
prod-dataplatform-events/dt=2019-08-30/hr=23/3_78977.parquet
prod-dataplatform-events/dt=2019-08-31/hr=00/8_77567.parquet

Hive Metastore

Dumping Machine supports Hive Metastore for the following operations:

  • Create database
  • Create table
  • Update table
  • Add partition

Team

Made with ❤️ by the Grupo ZAP engineering team

About

Dumping Machine is an application which dumps Kafka Avro topics to S3 or HDFS as Parquet

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 99.3%
  • Dockerfile 0.7%