You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 03:59:36 UTC

[jira] [Updated] (SPARK-6143) Improve FP-Growth for mining closed-forms of frequent patterns

     [ https://issues.apache.org/jira/browse/SPARK-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-6143:
--------------------------------
    Labels: bulk-closed  (was: )

> Improve FP-Growth for mining closed-forms of frequent patterns
> --------------------------------------------------------------
>
>                 Key: SPARK-6143
>                 URL: https://issues.apache.org/jira/browse/SPARK-6143
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Denis Dus
>            Priority: Minor
>              Labels: bulk-closed
>
> It is more convenient for person to analyze closed forms of frequent itemsets (and patterns in general).
> An itemset X is closed in data set X if there exist no proper super-itemset Y such that Y has same support as X in D. So, closed frequent itemsets is just lossless compression of all frequent itemsets.
> 1) A naive approach is to find all frequent itemsets and then remove each of them which is a proper subset of existing frequent itemset and has the same support. But it can be very costly as generation of all frequent itemsets is still needed.
> 2) The more powerful idea is to use some kind of merging while mining process. I've heard about FPClose algorithm based on FPGrowth:
> [http://users.encs.concordia.ca/~grahne/papers/fimi03.pdf] (Section 4 in paper) 
> I think, that it can be more useful for MLLib users if they are interested in frequent itemsets analysis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org