The GraphSense Transformation Pipeline reads raw block and transaction data, which is ingested into Apache Cassandra by the graphsense-lib component. The transformation pipeline computes an address graph and de-normalized views using Apache Spark, which are again stored in Cassandra to provide efficient queries.
The views computed by this component are subsequently served by the GraphSense REST interface, which is used main data-source graphsense-dashboard, the main graphical user-interface of the GraphSense stack.
This component is implemented in Scala using Apache Spark.
Make sure Java 11 and sbt >= 1.0 is installed:
java -version
sbt about
Download, install, and run Apache Spark (version 3.2.4)
in $SPARK_HOME
:
$SPARK_HOME/sbin/start-master.sh
Download, install, and run Apache Cassandra
(version >= 3.11) in $CASSANDRA_HOME
$CASSANDRA_HOME/bin/cassandra -f
For testing purposes it is easiest to run a dockerized instance of cassandra
docker run --rm --name cassandra_dev -p 9042:9042 -d cassandra:4.0
Use graphsense-lib to ingest data into the raw keyspace (tables). Before you can run the the data import please create a valid .graphsense.yaml
config file in your home directory. For more details see graphsense-lib.
graphsense-cli -v ingest from-node -e dev -c {NETWORK} --batch-size 10 --end-block 1000 --version 2 --create-schema
graphsense-cli -v exchange-rates coinmarketcap ingest -e dev -c {NETWORK}
Note: replace {NETWORK} by the three letter code of the currency you want to import (e.g eth, btc, zec, etc.)
This should create a keyspace {NETWORK}_raw
(tables exchange_rates
,
transaction
, block
, etc.).
Check as follows
cqlsh localhost
cqlsh> USE eth_raw;
cqlsh:btc_raw> DESCRIBE tables;
Create the target keyspace for transformed data
graphsense-cli -v schema create-new-transformed -e dev -c {NETWORK} --no-date --suffix dev
Compile, test the implementation
make test && make build
Run the dockerized pipeline on localhost
make run-docker-{NETWORK}-transform-local
Check the running job using the local Spark UI at http://localhost:4040/jobs.
Detailed information about how to submit a job and how to install the necessary infrastructure we point you to the Dockerfile and the submit script in docker/submit.sh
.
Community contributions e.g. new features and bug fixes are very welcome. For both please create a pull request with the proposed changes. We will review as soon as possible. To avoid frustration and wasted work please contact us to discuss changes before you implement them. This is best done via an issue or our discussion board.
Please make sure that the submitted code is always tested and properly formatted.
Do not forget to format and test your code using
make format && make test
before committing.