Market Basket Analysis Using ‘R’ and ‘Python’

Do you know that Almonds are related to Burgers? Yes! You read it right. Almonds are associated with burgers especially if you are running a retail business. Now, the real question is How will your business benefit from this association? The answer lies in the utilization of that relation to promoting cross-selling of items. These are ways to increase the profit

  • Creating Halo Effect i.e. Increasing the sales of complementary products by promotion on one product
  • Promotion by clubbing the products
  • Layout Planning in supermarkets etc.

All these association rules can be found using Market Basket Analysis. It creates an “If…. Then….” scenarios where Ascendant (item bought) leads to Consequent (item combined with ascendant). Market Basket analysis consists of three components mainly.

  • cheapest price for prednisone SUPPORT: It is the probability of an item bought i.e Number of times an item is bought / Total Number of transactions
  • CONFIDENCE: It is the probability of Consequent is bought when Ascendant is bought i.e. Confidence{A=>B} = Number of times items A & B are bought / Number of times item A is bought
  • LIFT:  It is the Joint Probability of Consequent and Ascendant with respect to their individual probability i.e. Lift{A=>B} = Support(A,B) / (Support(A)*Support(B))

Let’s see how to mine rules from data using ‘Apriori’ model of Market Basket Analysis/ Association Rule using R and Python

DATA:

Transaction data of ‘XYZ’ Super Market is taken which looks like below

STEPS IN R:

Loading Data into R

Data which is in CSV format is loaded using ‘read.csv’ function but that function will give each item in the separate column.

MB = read.csv('MBD.csv', header = FALSE)

In order to avoid each item in the separate column, ‘read.transaction’ is used which will separate items using ” , ” and will remove duplicates. The summary gives the Descriptive analysis of the data.

MBD = read.transactions('MBD.csv', sep = ',', rm.duplicates = TRUE)
summary(dataset)

> summary(MBD)
  transactions as itemMatrix in sparse format with
  7501 rows (elements/itemsets/transactions) and
  119 columns (items) and a density of 0.03288973 
most frequent items:
mineral water          eggs     spaghetti  french fries     chocolate       (Other) 
         1788          1348          1306          1282          1229         22405

element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 19 20
1754 1358 1044 816 667 493 391 324 259 139 102 67 40 22 17 4 1 2 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 3.914 5.000 20.000
includes extended item information - examples:

labels
1 almonds
2 antioxydant juice
3 asparagus

Plotting the frequency:

Here top Fifty items are plotted with respect to frequency of occurrence. Graph shows that water bottle is most frequent item.

itemFrequencyPlot(MBD, topN = 50,col=c("black","navyblue","orangered3","goldenrod2","green3"))

Mining Association rules:

Minimum Support of 0.004 and minimum Confidence of 0.2 is considered in order to eliminate the infrequent items ergo items are filtered. ‘inspect’ function is used to display the list of rules

rules = apriori(data = MBD, parameter = list(support = 0.004, confidence = 0.2))
inspect(sort(rules, by = 'lift')[1:10])

>inspect(sort(rules, by = 'lift')[1:10])
       lhs                                            rhs             support     confidence lift     count
  [1]  {light cream}                               => {chicken}       0.004532729 0.2905983  4.843951 34   
  [2]  {pasta}                                     => {escalope}      0.005865885 0.3728814  4.700812 44   
  [3]  {pasta}                                     => {shrimp}        0.005065991 0.3220339  4.506672 38   
  [4]  {eggs,ground beef}                          => {herb & pepper} 0.004132782 0.2066667  4.178455 31   
  [5]  {whole wheat pasta}                         => {olive oil}     0.007998933 0.2714932  4.122410 60   
  [6]  {herb & pepper,spaghetti}                   => {ground beef}   0.006399147 0.3934426  4.004360 48   
  [7]  {herb & pepper,mineral water}               => {ground beef}   0.006665778 0.3906250  3.975683 50   
  [8]  {tomato sauce}                              => {ground beef}   0.005332622 0.3773585  3.840659 40   
  [9]  {mushroom cream sauce}                      => {escalope}      0.005732569 0.3006993  3.790833 43   
  [10] {frozen vegetables,mineral water,spaghetti} => {ground beef}   0.004399413 0.3666667  3.731841 33   

Visualization:

Mapping of rules using the ‘graph’ method to represent the association between the items. This will give the overview of rules to make a collaborative planning for promotion and Layout design.

plot(rules[1:25], method = "graph",control = list(type = "items"), nodeCol = c("black","navyblue","orangered3","goldenrod2","green3"), edgeCol = "black")

Interactive Scatter plot is created using ‘plotly’ where rules are plotted using ‘support’, ‘confidence’ and ‘lift’

plotly_arules(rules, col=c("black","navyblue","orangered3","goldenrod2","green3"))

STEPS IN PYTHON:

Same way association rules can be formed using following Python Codes

First required libraries are imported in python. Here ‘numpy’ is used for scientific computing. ‘matplotlib’ for plotting and ‘pandas’ is to read files

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Loading Data

Here empty list ‘transaction’ is created then each transaction is added as an element to the list using ‘for’ loop

MBD = pd.read_csv('MBD.csv', header = None)
transactions = []
for i in range(0, 7501):
transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

Mining Association rules:

Here ‘apiori’ function to make rules

from apyori import apriori
rules = apriori(transactions, min_support = 0.004, min_confidence = 0.2, min_lift = 3, min_length = 2)

Rules

results = list(rules)

Conclusion:

Many associations like {light cream} => {chicken}, {pasta} => {escalope} etc. are been identified for ‘XYZ’ supermarket. Ergo Promotion activities should be developed using these rules in order to boost the sales and revenues. Some unexpected rules like {chocolate,shrimp} => {frozen vegetables}, {cooking oil,eggs} => {chocolate} etc. have been identified by using Market Basket analysis. This methods can be used for other businesses like e-commerce, Movies etc.

 

Avinash Reddy

Please follow and like us:

2
Leave a Reply

1 Comment threads
1 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
2 Comment authors
  Subscribe  
newest oldest most voted
Notify of
Jesús

Could you send me the data to Jsalinas@lamolina.edu.pe ? Thanks.