You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Yarco Hayduk (JIRA)" <ji...@apache.org> on 2011/08/03 15:48:27 UTC

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

    [ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078749#comment-13078749 ] 

Yarco Hayduk commented on MAHOUT-709:
-------------------------------------

can't give you the source yet....I'm still in the process of perf. testing. 
I found an important bug recently. In the mapReduce version of the program, we don't need to encode the transactions the second time, as it distorts the results. The tree gets restructured and our conditionals get moved almost to the root, instead of being in the bottom of the tree. Anyone care to try the MapReduce version with the num of groups == 1 ?) You will likely see this problem too

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira