Skip to content

AndreeaAlexandrescuDS/MarketBasketAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

MarketBasketAnalysis

MBA with PySpark

Theory of Apriori Algorithm


There are three major components of the Apriori algorithm:

  • Support
  • Confidence
  • Lift

1) Support


Support refers to the popularity of an item and can be calculated by finding the number of transactions containing a particular item divided by the total number of transactions

2) Confidence


Confidence refers to the likelihood that an item B is also bought if item A is bought.

3) Lift


Lift refers to the increase in the ratio of the sale of B when A is sold.

Association rule by Lift

  • lift = 1 → There is no association between A and B.
  • lift < 1→ A and B are unlikely to be bought together.
  • lift > 1 → greater the lift, greater the likelihood of buying both products together.

Steps Involved in Apriori Algorithm


The Apriori algorithm tries to extract rules for each possible combination of items.
For larger dataset, this computation can make the process extremely slow.
To speed up the process, we need to perform the following steps:
  • Set a minimum value for support and confidence. This means that we are only interested in finding rules for the items that have certain default existence (e.g. support) and have a minimum value for co-occurrence with other items (e.g. confidence).
  • Extract all the subsets having a higher value of support than a minimum threshold.
  • Select all the rules from the subsets with confidence value higher than the minimum threshold.
  • Order the rules by descending order of Lift.

About

MBA with PySpark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published