You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Robin Anil (JIRA)" <ji...@apache.org> on 2010/01/05 03:52:54 UTC
[jira] Commented: (MAHOUT-221) Implementation of FP-Bonsai Pruning
for fast pattern mining
[ https://issues.apache.org/jira/browse/MAHOUT-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796504#action_12796504 ]
Robin Anil commented on MAHOUT-221:
-----------------------------------
I am going to commit this. This is a major change and I need this before doing minor tweaks
> Implementation of FP-Bonsai Pruning for fast pattern mining
> -----------------------------------------------------------
>
> Key: MAHOUT-221
> URL: https://issues.apache.org/jira/browse/MAHOUT-221
> Project: Mahout
> Issue Type: New Feature
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.2
> Reporter: Robin Anil
> Assignee: Robin Anil
> Fix For: 0.3
>
> Attachments: MAHOUT-FPGROWTH.patch, MAHOUT-FPGROWTH.patch
>
>
> FP Bonsai is a method to prune long chained FP-Trees for faster growth.
> http://win.ua.ac.be/~adrem/bibrem/pubs/fpbonsai.pdf
> This implementation also adds a transaction preprocessing map/reduce job which converts a list of transactions {1, 2, 4, 5}, {1, 2, 3}, {1, 2} into a tree structure and thus saves space during fpgrowth map/reduce
> the tree formed from above is. For typical this improves the storage space by a great amount and thus saves on time during shuffle and sort
> (1,3) -> (2,3) | - (4,1) - (5,1)
> (3,1)
> Also added a reducer to PFPgrowth (not part of the original paper) which does this compression and saves on space.
> This patch also adds an example transaction dataset generator from flickr and delicious data set https://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Research/DataSets/PINTSExperimentsDataSets/
> Both of them are GIG of tag data. Where "date userid itemid tag" is given. The example maker creates a transaction based on all the unique tags a user has tagged on an item.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.