You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/12/06 14:48:10 UTC

[jira] [Updated] (SPARK-12163) FPGrowth unusable on some datasets without extensive tweaking of the support threshold

     [ https://issues.apache.org/jira/browse/SPARK-12163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-12163:
------------------------------
      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)

> FPGrowth unusable on some datasets without extensive tweaking of the support threshold
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-12163
>                 URL: https://issues.apache.org/jira/browse/SPARK-12163
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Jaroslav Kuchar
>            Priority: Minor
>
> This problem occurs on standard machine learning UCI datasets. 
> Details for "audiology" dataset follows: It contains only 226 transactions and 70 attributes. Mining of frequent itemsets with support threshold 0.95 will produce 73.162.705 itemsets., for support 0.94 – 366.880.771 itemsets.
> More details about experiment: https://gist.github.com/jaroslav-kuchar/edbcbe72c5a884136db1
> The number of generated itemsets rapidly growths with a number of unique items in transactions. Considering the combinatorial explosion, it can cause performing CPU-intensive and long running tasks for various settings of the support threshold. This extensive tweaking of the support threshold makes the usage of the FPGrowth implementation unusable even for a small dataset.
> It would be useful to implement additional stopping criterions to control the explosion of itemsets’ count in FPGrowth. We propose to implement optional limit for maximum number of generated itemsets or maximum number of items per itemset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org