Apriori Itemset Generation

Apriori Algorithm

Apriori Algorithm: (by Agrawal et al at IBM Almaden Research Centre) can be used to generate all frequent itemset
 
Pass 1
  1. Generate the candidate itemsets in C1
  2. Save the frequent itemsets in L1
Pass k
  1. Generate the candidate itemsets in Ck from the frequent

  2. itemsets in Lk-1
    1. Join Lk-1 p with Lk-1q, as follows:

    2. insert into Ck
      select p.item1, q.item1, . . . , p.itemk-1, q.itemk-1
      from Lk-1 p, Lk-1q
      where p.item1 = q.item1, . . . p.itemk-2 = q.itemk-2, p.itemk-1 < q.itemk-1
    3. Generate all (k-1)-subsets from the candidate itemsets in Ck
    4. Prune all candidate itemsets from Ck where some (k-1)-subset of the candidate itemset is not in the frequent itemset Lk-1
  3. Scan the transaction database to determine the support for each candidate itemset in Ck
  4. Save the frequent itemsets in Lk

Implementation: A working Apriori Itemset Generation program can be found on the Itemset Implementation page.

Example 1: Assume the user-specified minimum support is 50%

TID A B C D E F
T1 1 0 1 1 0 0
T2 0 1 0 1 0 0
T3 1 1 1 0 1 0
T4 0 1 0 1 0 1
Itemset X supp(X)
{A,B} 25%
{A,C} 50%
{A,D} 25%
{B,C} 25%
{B,D} 50%
{C,D} 25%
Itemset X supp(X)
{A,C} 50%
{B,D} 50%

Example 2: Assume the user-specified minimum support is 40%, then generate all frequent itemsets.

Given: The transaction database shown below

TID A B C D E
T1 1 1 1 0 0
T2 1 1 1 1 1
T3 1 0 1 1 0
T4 1 0 1 1 1
T5 1 1 1 1 0

Pass 1
C1
Itemset X supp(X)
A ?
B ?
C ?
D ?
E ?
L1
Itemset X supp(X)
A 100%
B 60%
C 100%
D 80%
E 40%

Pass 2

C2
Itemset X supp(X)
A,B ?
A,C ?
A,D ?
A,E ?
B,C ?
B,D ?
B,E ?
C,D ?
C,E ?
D,E ?
L2
Itemset X supp(X)
A,B 60%
A,C 100%
A,D 80%
A,E 40%
B,C 60%
B,D 40%
B,E 20%
C,D 80%
C,E 40%
D,E 40%
L2 after saving only the frequent itemsets
Itemset X supp(X)
A,B 60%
A,C 100%
A,D 80%
A,E 40%
B,C 60%
B,D 40%
C,D 80%
C,E 40%
D,E 40%

Pass 3

C3
Itemset X supp(X)
join AB with AC A,B,C ?
join AB with AD A,B,D ?
join AB with AE A,B,E ?
join AC with AD A,C,D ?
join AC with AE A,C,E ?
join AD with AE A,D,E ?
join BC with BD B,C,D ?
join CD with CE C,D,E ?
C3 after pruning
Itemset X supp(X)
A,B,C ?
A,B,D ?
A,C,D ?
A,C,E ?
A,D,E ?
B,C,D ?
C,D,E ?
L3
Itemset X supp(X)
A,B,C 60%
A,B,D 40%
A,C,D 80%
A,C,E 40%
A,D,E 40%
B,C,D 40%
C,D,E 40%

Pass 4

C4
Itemset X supp(X)
combine ABC with ABD A,B,C,D ?
combine ACD with ACE A,C,D,E ?
L4
Itemset X supp(X)
A,B,C,D 40%
A,C,D,E 40%
Pass 5: For pass 5 we can't form any candidates because there aren't two frequent 4-itemsets beginning with the same 3 items.