A source connector for reading data from a microphone and streaming it into Apache Kafka, and a sink connector for streaming data from Apache Kafka straight to your computer's speakers, both implemented on top of the Kafka Connect framework
These connectors use the Java sound API available in the standard library to record and play headerless audio with various formats.
Note that earlier versions of these connectors required SoX to run; current versions do not.
The connector can be built locally by running the following command:
mvn package
Once it's built, copy the JAR file for the connector located in the target
directory
(it should be named something like kafka-connect-sound-${version}.jar
, where
${version}
is the current project version) onto the plugin path or classpath for your
Kafka Connect worker(s).
You can view the list of available versions for the connector by browsing its Maven Central artifacts. Once you've picked a version, download the JAR file for that release and copy it onto the plugin path or classpath for your Kafka Connect worker(s).
For example, to install version 1.0.0 of the connector, you can download the artifact at https://repo.maven.apache.org/maven2/io/github/c0urante/kafka-connect-sound/1.0.0/kafka-connect-sound-1.0.0.jar
Note that this project currently has no extra dependencies (i.e., every dependency is already provided by the Kafka Connect runtime), so it's enough to just download and deploy the connector JAR. This is also why no fat JAR or Confluent Hub archive is published to Maven Central.
Releases to Confluent Hub have been discontinued.
Assumptions:
- Maven 3+ is installed
- Zookeeper is running and listening on localhost:2181
- Kafka is running and listening on localhost:9092
- Current directory is the root of the repository
# Build the project
mvn clean package
# Create the topic that the connectors will write to and read from (important: only need one partition)
kafka-topics --zookeeper localhost:2181 --create --topic voice-wav --partitions 1 --replication-factor 1
# Run the microphone source connector, and make some noise! Close down the worker with ctrl+C when you're finished
connect-standalone config/connect-standalone.properties config/kafka-connect-microphone.properties
# Run the speakers sink connector, and listen as the noise you made is played back out your speakers!
connect-standalone config/connect-standalone.properties config/kafka-connect-speakers.properties
Additional assumptions:
- Kafka Tools is installed and available on the current
$PATH
# Build the project
mvn clean package
# Create the topic that the connector will read from (important: only need one partition)
kafka-topics --bootstrap-server localhost:9092 --create --topic queen --partitions 1 --replication-factor 1
# Read an audio file into Kafka
# Recorded with SoX, using the default audio format (sample size, channels, etc.) for the connectors:
# sox --buffer 2048 -d -t raw -b 16 -L -e signed-integer -r 44100 -c 1 src/test/resources/audio/queen.raw
# To play this with SoX for debugging purposes:
# sox --buffer 2048 -t raw -b 16 -L -e signed-integer -r 44100 -c 1 src/test/resources/audio/queen.raw -d
kafka-binary-producer --topic queen --broker-list localhost:9092 < src/test/resources/audio/queen.raw
# Run the speakers sink connector, and listen to some music!
connect-standalone config/connect-standalone.properties config/kafka-connect-speakers-music.properties
The microphone source produces data with a fixed byte array schema.
The speakers sink connector can consume data with a type of either byte or byte array. Anything else will cause the connector to fail.
Because these connectors are hardcoded to work exclusively with bytes and byte arrays, it's
recommended that the ByteArrayConverter
be used with them unless there are upstream or downstream
limitations that require data to be serialized in a specific format.
These connectors do not perform offsets tracking.
For the source connector, this means that its tasks will only record audio produced in real time when they are running. Rebalances, worker restarts, and task failures will all run the risk of dropping audio.
For the sink connector, this means that its tasks will always resume reading from Kafka based on the value for their consumers' auto.offset.reset property. For Kafka Connect, the default behavior is to read from the beginning of topics. To override this for all sink connectors on a cluster and cause them to read from the ends of topics, add this property to your Kafka Connect worker config file:
consumer.auto.offset.reset = latest
And, for Kafka Connect clusters version 3.0.0 and beyond, you can override this for a single sink connector at a time by adding this property to the connector's config file:
consumer.override.auto.offset.reset = latest
(for standalone Connect workers)
{
"consumer.override.auto.offset.reset": "latest"
}
(for distributed Connect clusters)
For the source connector, this is a matter of feasibility: we can't go back in time and re-record audio that we failed to record while a task wasn't running.
For the sink connector, we could track the offsets of records that we were able to successfully write to the operating system, and commit those periodically. However, the implementation complexity of this would be non-trivial since a single large record may be broken down into several buffers that are written individually to the OS, and several smaller records may be grouped together into a single buffer. Additionally, there's no guarantee that audio that has been dispatched to the OS has actually been played out of the speakers on the machine. We could possibly look into tracking this more precisely, but it doesn't seem worth the bother.
Issues are tracked on GitHub. If there's a problem you're running into with the connector or a feature missing that you'd like to see, please open an issue.
If there's a small bug or typo that you'd like to fix, feel free to open a PR without filing an issue first and tag @C0urante for review.
- Sink connector that writes to speakers
- Source connector that reads from a microphone
- Easy way to read static audio files into Kafka (Kafka Tools)
- Publish to Maven Central
PRs welcome and encouraged!