The benchmark set contains 9 workloads. They fall into two categories. The first category is "simple resource benchmark", the goal is to test how storm performs under pressure of certain resource. The second category is to measure how storm performs in real-life typical use cases.
-
Simple resource benchmarks:
- wordcount, CPU sensitive
- sol, network sensitive
- rollingsort, memory sensitive
-
Typical use-case benchmark:
- rollingcount
- trident
- uniquevisitor
- pageview
- grep
- dataclean
- drpc
In real-life use cases, Kafka is often used for data ingestion. To acccount for that, most use-case benchmarks read data from Kafka and they could be categorized by the corresponding data generators:
-
data generated by
FileReadKafkaProducer
- dataclean
- drpc
- pageview
- uniquevisitor
-
data generated by
PageViewKafkaProducer
- grep
- trident
The data generators are already provided and they are Storm applications as well.
We assume a Storm cluster is already set up locally.
- Build.
First, build storm-benchmark.
git clone https://github.com/manuzhang/storm-benchmark.git
mvn package
- Run. We use SOL as an example.
bin/stormbench -storm ${STORM_HOME}/bin/storm -jar ./target/storm-benchmark-${VERSION}-jar-with-dependencies.jar -conf ./conf/sol.yaml -c topology.workers=2 storm.benchmark.tools.Runner storm.benchmark.benchmarks.SOL
-storm
directs stormbench to look for the storm command-jar
sets the benchmark jar with all the dependencies in-conf
is for user to provide a yaml conf file likestorm/conf/storm.yaml
. Check thestorm-benchmark/conf
folder where conf files are already provided for existing benchmarks-c
allows user to set conf through command line without modifying conf files every time
- Check. The benchmark results will be stored at config path METRICS_PATH(default is: reports). It contains throughput data and latency of the whole cluster.
The result of SOL contains two files
1. `SOL_metrics_1402148415021.csv`. Performance data.
2. `SOL_metrics_1402148415021.yaml`. The config used to run this test.
We assume Storm and Kafka have been set up locally. (No need to create Kafka topic beforehand, which could be auto created when the producer sends messages to Kafka). Also, assume Storm Benchmark has been built successfully.
Here's how we run uniquevisitor, for instance.
- Launch
PageViewKafkaProducer
.
bin/stormbench -storm ${STORM_HOME}/bin/storm -jar ./target/storm-benchmark-${VERSION}-jar-with-dependencies.jar -conf ./conf/pageview_producer.yaml storm.benchmark.tools.Runner storm.benchmark.tools.producer.kafka.PageViewKafkaProducer
- Launch
UniqueVisitor
.
bin/stormbench -storm ${STORM_HOME}/bin/storm -jar ./target/storm-benchmark-${VERSION}-jar-with-dependencies.jar -conf ./conf/uniquevisitor.yaml storm.benchmark.tools.Runner storm.benchmark.benchmarks.UniqueVisitor
Then, we could check the metrics data as in the previous section.
Please contact:
- Manu Zhang: [email protected]
- Sean Zhong: [email protected]
We use the SOL benchmark code(https://github.com/yahoo/storm-perf-test) from yahoo. Thanks.