I have often needed to implement tedious classification logic in data processing projects. The requirements are often ambiguous to the extent that it would be difficult to implement them even in SQL, with aspects such as fallback and overlap. This logic often ends up expressed as large blocks of nested if statements which are hard to read or modify and perform poorly. This small project aims to make such classification logic easier, and improve performance too.
Build a generic classification engine
Classifier<Product, String> classifier = Classifier.<String, Product, String>builder(
Schema.<String, Product, String>create()
.withAttribute("productType", Product::getProductType)
.withAttribute("issueDate", Product::getIssueDate, Comparator.naturalOrder().reversed())
.withAttribute("productName", Product::getProductName)
.withAttribute("availability", Product::getAvailability)
.withAttribute("discountedPrice", value -> 0.2 * value.getPrice())
).build(Arrays.asList(
MatchingConstraint.<String, String>named("rule1")
.eq("productType", "silk")
.startsWith("productName", "luxury")
.gt("discountedPrice", 1000)
.priority(0)
.classification("EXPENSIVE_LUXURY_PRODUCTS")
.build(),
MatchingConstraint.<String, String>named("rule2")
.eq("productType", "caviar")
.gt("discountedPrice", 100)
.priority(1)
.classification("EXPENSIVE_LUXURY_PRODUCTS")
.build(),
MatchingConstraint.<String, String>anonymous()
.eq("productName", "baked beans")
.priority(2)
.classification("CHEAP_FOOD")
.build()
)
);
Classify
Product p = getProduct();
String classification = classifier.classification(p).orElse("UNCLASSIFIED");