Let's use Kafka Streams to detect fraud IPs in clickstream
30 mins
Here is a sample clickstream data in JSON format
{"timestamp": 1642840429496, "ip": "4.4.8.2", "user": "user-70", "action": "clicked", "domain": "twitter.com", "campaign": "campaign-99", "cost": 26}
we are going scan records and filter-out fraudulent IP address - these IP addresses are marked as spam / bot originators
- Producer
- python:
python/producer-clickstream.py
- Java:
src/main/java/x/utils/ClickstreamProducer.java
- python:
- Consumer
- consumer:
src/main/java/x/practice_labs/FraudDetectionApp.java
- Fraud IP Lookup:
src/main/java/x/practice_labs/CacheIPLookup.java
- consumer:
Inspect consumer file : src/main/java/x/practice_labs/FraudDetectionApp.java
Fix TODO items
You can run it via Eclipse
or run it command line as follows
$ cd ~/kafka-labs
$ mvn exec:java -Dexec.mainClass=x.practice_labs.FraudDetectionApp
To run the Java version src/main/java/x/utils/ClickstreamProducer.java
- you can run it through Eclipse
- or commnad line as follows
$ cd ~/kafka-labs
$ mvn exec:java -Dexec.mainClass=x.utils.ClickstreamProducer
Or you can run a python clickstream producer as well
$ cd ~/kafka-labs/python
$ python producer-clickstream.py
Watch the consumer output
Add a few more IPs to src/main/java/x/practice_labs/CacheIPLookup.java
public CacheIPLookup() {
fraudIPs.add("1.1"); // new
fraudIPs.add("3.3");
fraudIPs.add("4.4");
}
See if the new IPs are getting flagged.