You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Robin Anil (JIRA)" <ji...@apache.org> on 2009/08/04 23:29:10 UTC

[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

     [ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robin Anil updated MAHOUT-157:
------------------------------

    Attachment: MAHOUT-157-inProgress-August-5.patch

Added class FPGrowth<T> where T could be any Datatype which denotes a feature
{noformat}
FPGrowth<String> fp = new FPGrowth<String>();
Collection<List<String>> transactions = new ArrayList<List<String>>();

    transactions.add(Arrays.asList("E", "A", "D", "B"));
    transactions.add(Arrays.asList("D", "A", "C", "E", "B"));
    transactions.add(Arrays.asList("C", "A", "B", "E"));
    transactions.add(Arrays.asList("B", "A", "D"));
    transactions.add(Arrays.asList("D"));
    transactions.add(Arrays.asList("D", "B"));
    transactions.add(Arrays.asList("A", "D", "E"));
    transactions.add(Arrays.asList("B", "C"));
 Map<List<Attribute<String>>, Integer> frequentPatterns = fp.generateFrequentPatterns(transactions);
{noformat}


Implemented 3 stages in PFPGrowth Algorithm.

FIXME: if transaction lengths are long, i.e there are many features, then Conditional FP subtree will have a single path with many nodes. This can cause combinatorial explosion while going through all possible combinations. Currently limited to 6 grams(a frequent pattern of length 6). So generates at max nC1+nC2+nC3+nC4+nC5+nC6 patterns instead of Sigma [k=1 to n] (nCk) patterns.



> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
>                 Key: MAHOUT-157
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-157
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>             Fix For: 0.2
>
>         Attachments: MAHOUT-157-inProgress-August-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.