You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Vipul Pandey (JIRA)" <ji...@apache.org> on 2011/03/07 04:11:06 UTC

[jira] Created: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

FPGrowth/PFPGrowth giving out wrong results. 
---------------------------------------------

                 Key: MAHOUT-617
                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
             Project: Mahout
          Issue Type: Bug
          Components: Frequent Itemset/Association Rule Mining
    Affects Versions: 0.4
         Environment: Mac OS X, Linux
            Reporter: Vipul Pandey


PFPGrowth with my data is giving out wrong results. Attached are : 
- The input data
- The output (sequence file) generated by FPGrowth (PFPGrowth gives the same results)
- Output as text


$ cat part-r-00000 | grep 1678807047
12      1678807047
38      1678807047 3159925415

which says that the support (12) for the item (1678807047) is lesser than the support (38) of a pair containing that item. 


another example
$ cat part-r-00000  | grep 1441690161
12              1441690161 3910019844
18              1604285941 1441690161 3910019844
75              1441690161


Runtime parameters : 
-i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex '[\t]' -s 10






--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] Commented: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by Robin Anil <ro...@gmail.com>.
Only closed set is mined. One may generate all combinations themselves

sent from handheld device excuse typos
On Mar 18, 2011 7:08 AM, "Vipul Pandey (JIRA)" <ji...@apache.org> wrote:
>
> [
https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008276#comment-13008276]
>
> Vipul Pandey commented on MAHOUT-617:
> -------------------------------------
>
> Looks like FPGrowth report only the closed sets , is that right?
> IN that case I may have to mine for all the frequent subsets manually?
>
>> FPGrowth/PFPGrowth giving out wrong results.
>> ---------------------------------------------
>>
>> Key: MAHOUT-617
>> URL: https://issues.apache.org/jira/browse/MAHOUT-617
>> Project: Mahout
>> Issue Type: Bug
>> Components: Frequent Itemset/Association Rule Mining
>> Affects Versions: 0.4
>> Environment: Mac OS X, Linux
>> Reporter: Vipul Pandey
>> Assignee: Robin Anil
>> Labels: AssociationMining, FPGrowth, FrequentItemsets
>> Attachments: XY, XYZ
>>
>>
>> FPGrowth reports the support of itemsets individually - in that - if Item
X appears "individually" 12 times and appears with item Y 10 times (a total
of 22 times) AND item Y appears "individually" 4 times (a total of 14 times)
then this is what the output will be (say for min-support 2)
>> 12 X
>> 10 XY
>> 4 Y
>> Instead of
>> 22 X
>> 10 XY
>> 14 Y
>> Also, because of this If the minimum support is 5 then the output will
look like :
>> 12 X
>> 10 X Y
>> Thus totally Ignoring Y
>> if the minimum support is 11 then the output will look like
>> 12 X
>> again Ignoring Y
>> if the minimum support is 13 then there will be NO output. even though
all the way along Xs support was 22 and Y's was 14
>> Even if we want to show just the maximal itemsets (although i would like
to see ALL the frequent itemsets - maximal or not) this output is wrong as
with a support of 13 we should still have seen X(22) and Y(14)
>> Now Say you add XYZ 11 times
>> for support 1 you'd see
>> 12 X
>> 10 X Y
>> 11 X Y Z
>> 4 Y
>> And for support 11 you'd see
>> 12 X
>> 11 X Y Z
>> Although I'd expect the output (for both s=1 & s=11) to be
>> 33 X
>> 25 Y
>> 21 XY
>> 11 Z
>> 11 XZ
>> 11 YZ
>> 11 XYZ
>> attached are the sample inputs:
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006263#comment-13006263 ] 

Vipul Pandey commented on MAHOUT-617:
-------------------------------------

yes, exactly. But the output that I get upon running FPGrowth on the input file for support 11 is as below : 

12 X 
11 X Y Z 

isn't that what you are getting too?

> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008244#comment-13008244 ] 

Vipul Pandey commented on MAHOUT-617:
-------------------------------------

Robin, 

The output that i'm getting is  :

11	X Y Z 
11	X Y Z 
11	X Y Z 
21	X Y 
21	X Y 
25	Y 
33	X 


That's the same output that you expect according to your test case : 
    assertEquals(
      "[(Z,([X, Y, Z],11)), (Y,([Y],25), ([X, Y],21), ([X, Y, Z],11)), (X,([X],33), ([X, Y],21), ([X, Y, Z],11))]",



But the output we expect is : 
11 Z Y X 
11 Z Y 
11 Z X 
11 Z 
21 Y X 
25 Y 
33 X 

I don't see the subsets ZY, XZ and Z in the output although they all have to be frequent. Instead XYZ is reported 3 times (I assume that's once for each X Y and Z) and XY is reported twice. 

Am I missing something?
If not, then how do I get to the actual output?  




> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006252#comment-13006252 ] 

Vipul Pandey commented on MAHOUT-617:
-------------------------------------

I didn't get it quite right. 
So for support 11 - you'd expect the output to be 

Z Y X 11 
Z Y 11
Z X 11
Z 11
Y X 21
Y 25
X 33

is that what you are suggesting?

> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vipul Pandey updated MAHOUT-617:
--------------------------------

    Description: 
PFPGrowth with my data is giving out wrong results. Attached are : 
- The input data
- The output (sequence file) generated by FPGrowth (PFPGrowth gives the same results)
- Output as text


$ cat part-r-00000 | grep 1678807047
12      1678807047
38      1678807047 3159925415

which says that the support (12) for the item (1678807047) is lesser than the support (38) of a pair containing that item. 


another example
$ cat part-r-00000  | grep 1441690161
12              1441690161 3910019844
18              1604285941 1441690161 3910019844
75              1441690161


Runtime parameters : 
-i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex '[\t]' -s 10


NOTE : Unable to attach files to JIRA. Here's the bundle of files (Input, SequenceOutput & TextOutput) https://files.me.com/vpandey/glsovt




  was:
PFPGrowth with my data is giving out wrong results. Attached are : 
- The input data
- The output (sequence file) generated by FPGrowth (PFPGrowth gives the same results)
- Output as text


$ cat part-r-00000 | grep 1678807047
12      1678807047
38      1678807047 3159925415

which says that the support (12) for the item (1678807047) is lesser than the support (38) of a pair containing that item. 


another example
$ cat part-r-00000  | grep 1441690161
12              1441690161 3910019844
18              1604285941 1441690161 3910019844
75              1441690161


Runtime parameters : 
-i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex '[\t]' -s 10







> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>
> PFPGrowth with my data is giving out wrong results. Attached are : 
> - The input data
> - The output (sequence file) generated by FPGrowth (PFPGrowth gives the same results)
> - Output as text
> $ cat part-r-00000 | grep 1678807047
> 12      1678807047
> 38      1678807047 3159925415
> which says that the support (12) for the item (1678807047) is lesser than the support (38) of a pair containing that item. 
> another example
> $ cat part-r-00000  | grep 1441690161
> 12              1441690161 3910019844
> 18              1604285941 1441690161 3910019844
> 75              1441690161
> Runtime parameters : 
> -i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex '[\t]' -s 10
> NOTE : Unable to attach files to JIRA. Here's the bundle of files (Input, SequenceOutput & TextOutput) https://files.me.com/vpandey/glsovt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006281#comment-13006281 ] 

Hudson commented on MAHOUT-617:
-------------------------------

Integrated in Mahout-Quality #669 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/669/])
    Adding simple tests to illustrate FPGrowth MAHOUT-617


> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Issue Comment Edited: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006263#comment-13006263 ] 

Vipul Pandey edited comment on MAHOUT-617 at 3/13/11 7:35 PM:
--------------------------------------------------------------

yes, exactly. But the output that I get upon running FPGrowth on the input file for support 11 is as below : 

12 X 
11 X Y Z 

isn't that what you are getting too?

and this is for support = 1

12 X 
10 X Y 
11 X Y Z 
4 Y 

      was (Author: vipandey):
    yes, exactly. But the output that I get upon running FPGrowth on the input file for support 11 is as below : 

12 X 
11 X Y Z 

isn't that what you are getting too?
  
> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vipul Pandey updated MAHOUT-617:
--------------------------------

    Attachment: XY

> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robin Anil reassigned MAHOUT-617:
---------------------------------

    Assignee: Robin Anil

> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>
> PFPGrowth with my data is giving out wrong results. Attached are : 
> - The input data
> - The output (sequence file) generated by FPGrowth (PFPGrowth gives the same results)
> - Output as text
> $ cat part-r-00000 | grep 1678807047
> 12      1678807047
> 38      1678807047 3159925415
> which says that the support (12) for the item (1678807047) is lesser than the support (38) of a pair containing that item. 
> another example
> $ cat part-r-00000  | grep 1441690161
> 12              1441690161 3910019844
> 18              1604285941 1441690161 3910019844
> 75              1441690161
> Runtime parameters : 
> -i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex '[\t]' -s 10

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Issue Comment Edited: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006263#comment-13006263 ] 

Vipul Pandey edited comment on MAHOUT-617 at 3/13/11 7:36 PM:
--------------------------------------------------------------

yes, exactly. But the output that I get upon running FPGrowth on the input file for support 11 is as below : 

12 X 
11 X Y Z 

isn't that what you are getting too?

and this is what I get for for support = 1

12 X 
10 X Y 
11 X Y Z 
4 Y 

      was (Author: vipandey):
    yes, exactly. But the output that I get upon running FPGrowth on the input file for support 11 is as below : 

12 X 
11 X Y Z 

isn't that what you are getting too?

and this is for support = 1

12 X 
10 X Y 
11 X Y Z 
4 Y 
  
> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006249#comment-13006249 ] 

Robin Anil commented on MAHOUT-617:
-----------------------------------

After some closer inspection I have to say this is working as intended. Let me 


bq. for min support 2
No, infact FPgrowth is not about counting, its about counting sub patterns in the set, 
X happens (12 + 10) times
Y happens (4 + 10 ) times
X+Y happens (10) times

Same holds true for others. 

bq. For second dataset
Again the counts of sub patterns or itemsets in the dataset is

  Z Y X 11                                                                                                                                      
  Z Y 11
  Z X 11
  Z 11
  Y X 21
  Y 25
  X 33

I am marking this as working as intended. This is however independent of the bug reported in MAHOUT-625


> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vipul Pandey updated MAHOUT-617:
--------------------------------

    Attachment: XYZ

> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vipul Pandey updated MAHOUT-617:
--------------------------------

    Description: 
FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)

12 X
10 XY
4  Y

Instead of 
22 X
10 XY
14 Y

Also, because of this If the minimum support is 5 then the output will look like : 
12 X
10 X Y
Thus totally Ignoring Y

if the minimum support is 11 then the output will look like 
12 X
again Ignoring Y

if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14



Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)


Now Say you add XYZ 11 times


for support 1 you'd see
12 X
10 X Y
11 X Y Z
4   Y




And for support 11 you'd see
12 X
11 X Y Z

Although I'd expect the output (for both s=1 & s=11) to be 
33 X
25 Y 
21 XY
11 Z
11 XZ
11 YZ
11 XYZ


attached are the sample inputs: 

  was:
PFPGrowth with my data is giving out wrong results. Attached are : 
- The input data
- The output (sequence file) generated by FPGrowth (PFPGrowth gives the same results)
- Output as text


$ cat part-r-00000 | grep 1678807047
12      1678807047
38      1678807047 3159925415

which says that the support (12) for the item (1678807047) is lesser than the support (38) of a pair containing that item. 


another example
$ cat part-r-00000  | grep 1441690161
12              1441690161 3910019844
18              1604285941 1441690161 3910019844
75              1441690161


Runtime parameters : 
-i baskets/part-r-00000 -o patterns -k 50 -method sequential -g 10 -regex '[\t]' -s 10


NOTE : Unable to attach files to JIRA. Here's the bundle of files (Input, SequenceOutput & TextOutput) https://files.me.com/vpandey/glsovt





> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006260#comment-13006260 ] 

Robin Anil commented on MAHOUT-617:
-----------------------------------

Yes. for anything less than or equal to 11

> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Vipul Pandey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008276#comment-13008276 ] 

Vipul Pandey commented on MAHOUT-617:
-------------------------------------

Looks like FPGrowth report only the closed sets , is that right?
IN that case I may have to mine for all the frequent subsets manually?

> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robin Anil resolved MAHOUT-617.
-------------------------------

    Resolution: Not A Problem

> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-617) FPGrowth/PFPGrowth giving out wrong results.

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006265#comment-13006265 ] 

Robin Anil commented on MAHOUT-617:
-----------------------------------

No, see the test case I just checked in, try running it for different support sizes

> FPGrowth/PFPGrowth giving out wrong results. 
> ---------------------------------------------
>
>                 Key: MAHOUT-617
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-617
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>         Environment: Mac OS X, Linux
>            Reporter: Vipul Pandey
>            Assignee: Robin Anil
>              Labels: AssociationMining, FPGrowth, FrequentItemsets
>         Attachments: XY, XYZ
>
>
> FPGrowth reports the support of itemsets individually - in that - if Item X appears "individually" 12 times and appears with item Y 10 times (a total of 22 times) AND item Y appears "individually" 4 times (a total of 14 times) then this is what the output will be (say for min-support 2)
> 12 X
> 10 XY
> 4  Y
> Instead of 
> 22 X
> 10 XY
> 14 Y
> Also, because of this If the minimum support is 5 then the output will look like : 
> 12 X
> 10 X Y
> Thus totally Ignoring Y
> if the minimum support is 11 then the output will look like 
> 12 X
> again Ignoring Y
> if the minimum support is 13 then there will be NO output. even though all the way along Xs support was 22 and Y's was 14
> Even if we want to show just the maximal itemsets (although i would like to see ALL the frequent itemsets - maximal or not) this output is wrong as with a support of 13 we should still have seen X(22) and Y(14)
> Now Say you add XYZ 11 times
> for support 1 you'd see
> 12 X
> 10 X Y
> 11 X Y Z
> 4   Y
> And for support 11 you'd see
> 12 X
> 11 X Y Z
> Although I'd expect the output (for both s=1 & s=11) to be 
> 33 X
> 25 Y 
> 21 XY
> 11 Z
> 11 XZ
> 11 YZ
> 11 XYZ
> attached are the sample inputs: 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira