You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Yarco Hayduk (JIRA)" <ji...@apache.org> on 2011/05/22 07:12:47 UTC

[jira] [Created] (MAHOUT-709) FP-Growth Redundant patterns

FP-Growth Redundant patterns
----------------------------

                 Key: MAHOUT-709
                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
             Project: Mahout
          Issue Type: Bug
          Components: Frequent Itemset/Association Rule Mining
    Affects Versions: 0.4
            Reporter: Yarco Hayduk
             Fix For: 0.5


The algorithm outputs more patterns that it is needed. 

I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 

When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html

Succinctly, you are not outputting closed items

I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Yarco Hayduk (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040526#comment-13040526 ] 

Yarco Hayduk commented on MAHOUT-709:
-------------------------------------

http://www.dataminingarticles.com/closed-maximal-itemsets.html has a nice explanation too

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042987#comment-13042987 ] 

Robin Anil commented on MAHOUT-709:
-----------------------------------

That step was added as an optimization to fetch topK patterns faster. And its order of magnitude faster of some dataset. If you are refactoring, can you disable it behind a flag say a strict mode

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-709.
------------------------------

    Resolution: Won't Fix

We can reopen if a new patch is ever available.
                
> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Yarco Hayduk (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044975#comment-13044975 ] 

Yarco Hayduk commented on MAHOUT-709:
-------------------------------------

Thank you for the suggestion. Frankly, I have not ran into a definition of "not strictly closed frequent item sets". The patterns are either closed, free or maximal (there are more definitions, these are only the most common ones). Hence, the implementation is flawed if its javadoc states that it outputs closed itemsets, and in reality it does not. 
I will verify that the decision to traverse the tree top down (at a certain level) leads to better performance and will investigate how to eliminate the issue of redundant itemsets in that scenario. 

Can you please elaborate on this line of code, as I am having a hard time understanding it:
"minSupportValue = Math.max(minSupportValue,minSupport.longValue() / 2);"
Why would we ever want to change the minsup? I assume that this idea comes from the http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.9324&rep=rep1&type=pdf paper?

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Yarco Hayduk (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078749#comment-13078749 ] 

Yarco Hayduk commented on MAHOUT-709:
-------------------------------------

can't give you the source yet....I'm still in the process of perf. testing. 
I found an important bug recently. In the mapReduce version of the program, we don't need to encode the transactions the second time, as it distorts the results. The tree gets restructured and our conditionals get moved almost to the root, instead of being in the bottom of the tree. Anyone care to try the MapReduce version with the num of groups == 1 ?) You will likely see this problem too

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Yarco Hayduk (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040516#comment-13040516 ] 

Yarco Hayduk edited comment on MAHOUT-709 at 5/28/11 12:07 AM:
---------------------------------------------------------------

This is the original paper which introduces closed patterns:

http://www.cs.rpi.edu/research/pdf/99-12.pdf 

      was (Author: yarco):
    This is the original paper which introduces closed patterns:

www.cs.rpi.edu/research/pdf/99-12.pdf 
  
> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Yarco Hayduk (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040516#comment-13040516 ] 

Yarco Hayduk commented on MAHOUT-709:
-------------------------------------

This is the original paper which introduces closed patterns:

www.cs.rpi.edu/research/pdf/99-12.pdf 

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-709:
-----------------------------

    Fix Version/s:     (was: 0.6)

Removing from 0.6 unless a patch is available.

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Yarco Hayduk (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040238#comment-13040238 ] 

Yarco Hayduk commented on MAHOUT-709:
-------------------------------------

"Obviously,([1],2),([3],2) is missed"

PFP-Growth is outputting closed patterns. 
(definition here http://ftp1.de.freebsd.org/Publications/CEUR-WS/Vol-90/liu.pdf )

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-709:
-----------------------------

    Affects Version/s: 0.5
        Fix Version/s:     (was: 0.5)
                       0.6
             Assignee: Robin Anil

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "niu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052393#comment-13052393 ] 

niu commented on MAHOUT-709:
----------------------------

where is the source code of these patch?

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "jinyongbo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041017#comment-13041017 ] 

jinyongbo commented on MAHOUT-709:
----------------------------------

Thanks all.

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Yarco Hayduk (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yarco Hayduk updated MAHOUT-709:
--------------------------------

    Attachment: patterns-converted.txt
                bresult-new.txt
                dumpedPatterns
                SixTransactions.dat

SixTransactions.dat - the dummy DB
dumpedPatterns - original PFP-Growth output
patterns-converted.txt - converted dumpedPatterns to a more readable format 
bresult-new.txt - the output of Borgelt's FP-Growth implementation 

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>            Reporter: Yarco Hayduk
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.5
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "jinyongbo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040140#comment-13040140 ] 

jinyongbo commented on MAHOUT-709:
----------------------------------

it looks like my problem is not the same as Hayduk.
my test data is as follow (mahout version 0.4 , minsupport =2)
1 2 5
2 3
4 5
1 2 3
and the output as follow
1 ([2, 1],2)
2 ([2],3), ([2, 3],2), ([2, 1],2)
3 ([2, 3],2)
5 ([5],2)
Obviously,([1],2),([3],2) is missed



> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "jinyongbo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040143#comment-13040143 ] 

jinyongbo commented on MAHOUT-709:
----------------------------------

I add another question.Is the output file in frequentpatterns folder a sequencefile or not?
Because,i want to save it as a table in Hive. 
 

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Yarco Hayduk (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040516#comment-13040516 ] 

Yarco Hayduk edited comment on MAHOUT-709 at 5/28/11 1:19 AM:
--------------------------------------------------------------

This is the original paper which introduces closed patterns:

http://cchen1.csie.ntust.edu.tw:8080/students/2009/Discovering%20Frequent%20Closed%20Itemsets%20for%20Association%20Rules.pdf

      was (Author: yarco):
    This is the original paper which introduces closed patterns:

http://www.cs.rpi.edu/research/pdf/99-12.pdf 
  
> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040483#comment-13040483 ] 

Lance Norskog commented on MAHOUT-709:
--------------------------------------

bq. (definition here http://ftp1.de.freebsd.org/Publications/CEUR-WS/Vol-90/liu.pdf )


The link does not work; the site is up but does not like the path.

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

Posted by "Yarco Hayduk (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042935#comment-13042935 ] 

Yarco Hayduk commented on MAHOUT-709:
-------------------------------------

OK. I found an issue - you need to remove the topdown mining completely to avoid generating redundant patterns. Now I get a 56kb pattern file instead of 1499kb one ;)


The pattern mining on any level needs to be done in a bottom-up fashion only. The current implementation is mining the patterns in a top-down fashion only at a certain level.  Will submit a patch once I am done. Further, will try to refactor the code into logical peaces (FP-Bonsai pruning, counting etc etc).

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira