You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ted Dunning (JIRA)" <ji...@apache.org> on 2009/09/05 22:18:57 UTC

[jira] Commented: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

    [ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751788#action_12751788 ] 

Ted Dunning commented on MAHOUT-157:
------------------------------------


.bq Need a faster implementation of a bounded SortedSet. Currently use an encapsulated TreeSet where least value is removed after every insert call  if the max capacity is found to be exceeded. Another frequent operation is: merge two bounded SortedSets

This sounds more like a priority queue and a sorted set.  Since the priority queue doesn't need keep all elments in sorted order, it may be cheaper than a SortedSet.

On a separate note, it is common that the most impressive optimization for keeping top-n elements is to simply avoid inserting into the priority queue if you know that it will just get kicked out.  This can be done by looking at the score on the current lowest element.  The next step is to cache that lowest score.  This can often decrease the number of heap insertions by several orders of magnitude.

> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
>                 Key: MAHOUT-157
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-157
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>             Fix For: 0.2
>
>         Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, MAHOUT-157-Combinations-BSD-License.patch, MAHOUT-157-Combinations-BSD-License.patch, MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.