You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tomas Kliegr (JIRA)" <ji...@apache.org> on 2016/01/26 13:38:40 UTC

[jira] [Created] (SPARK-12999) Guidance on adding a stopping criterion (maximul literal length or itemset count) for FPGrowth

Tomas Kliegr created SPARK-12999:
------------------------------------

             Summary: Guidance on adding a stopping criterion (maximul literal length or itemset count) for FPGrowth
                 Key: SPARK-12999
                 URL: https://issues.apache.org/jira/browse/SPARK-12999
             Project: Spark
          Issue Type: Question
    Affects Versions: 1.6.0
            Reporter: Tomas Kliegr


The absence of stopping criteria results in combinatorial explosion and hence excessive run time even on small UCI datasets. Since our workflow makes it difficult
to terminate the FPGrowth job when it is running for too long and
iteratively increase the support threshold, we would like to extend
the SPARK FPGrowth implementation with either of the following
stopping criteria:
- maximum number of generated itemsets,
- maximum length of generated itemsets (i.e. number of items).

We would like to ask for any suggestion that could help us modify the
implementation. 

Having a workaround for this problem can not only make difference to our use case, but through the ability to process more datasets without painful support tweaking also hopefully for the Spar community. 

This question is related to the following issue: https://issues.apache.org/jira/browse/SPARK-12163




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org