You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2010/01/03 20:11:54 UTC

[jira] Updated: (MAHOUT-221) Implementation of FP-Bonsai Pruning for fast pattern mining

     [ https://issues.apache.org/jira/browse/MAHOUT-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-221:
-----------------------------

        Fix Version/s: 0.3
    Affects Version/s:     (was: 0.3)
                       0.2

> Implementation of FP-Bonsai Pruning for fast pattern mining
> -----------------------------------------------------------
>
>                 Key: MAHOUT-221
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-221
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-FPGROWTH.patch, MAHOUT-FPGROWTH.patch
>
>
> FP Bonsai is a method to prune long chained FP-Trees for faster growth. 
> http://win.ua.ac.be/~adrem/bibrem/pubs/fpbonsai.pdf
> This implementation also adds a transaction preprocessing map/reduce job which converts a list of transactions {1, 2, 4, 5}, {1, 2, 3}, {1, 2} into a tree structure and thus saves space during fpgrowth map/reduce 
> the tree formed from above is. For typical this improves the storage space by a great amount and thus saves on time during shuffle and sort
> (1,3) -> (2,3) | - (4,1) - (5,1)
>                       (3,1)        
> Also added a reducer to PFPgrowth (not part of the original paper) which does this compression and saves on space. 
> This patch also adds an example transaction dataset generator from flickr and delicious data set https://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Research/DataSets/PINTSExperimentsDataSets/
> Both of them are GIG of tag data. Where "date userid itemid tag" is given. The example maker creates a transaction based on all the unique tags a user has tagged on an item. 
>          

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.