You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "RJ (Jira)" <ji...@apache.org> on 2022/01/26 14:53:00 UTC
[jira] [Created] (SPARK-38037) Spark MLlib FPGrowth not working with 40+ items in Frequent Item set
RJ created SPARK-38037:
--------------------------
Summary: Spark MLlib FPGrowth not working with 40+ items in Frequent Item set
Key: SPARK-38037
URL: https://issues.apache.org/jira/browse/SPARK-38037
Project: Spark
Issue Type: Bug
Components: ML
Affects Versions: 3.2.0
Environment: Stanalone Linux server
32 GB RAM
4 core
Reporter: RJ
We have been using Spark FPGrowth and it works well with millions of transactions (records) when the frequent items in the Frequent Itemset is less than 25. Beyond 25 it runs into computational limit. For 40+ items in the Frequent Itemset the process never return.
To reproduce, you can create a simple data set of 3 transactions with equal items (40 of them) and run FPgrowth with 0.9 support, the process never completes. Below is a sample data I have used to narrow down the problem:
|I1|I2|I3|I4|I5|I6|I7|I8|I9|I10|I11|I12|I13|I14|I15|I16|I17|I18|I19|I20|I21|I22|I23|I24|I25|I26|I27|I28|I29|I30|I31|I32|I33|I34|I35|I36|I37|I38|I39|I40|
|I1|I2|I3|I4|I5|I6|I7|I8|I9|I10|I11|I12|I13|I14|I15|I16|I17|I18|I19|I20|I21|I22|I23|I24|I25|I26|I27|I28|I29|I30|I31|I32|I33|I34|I35|I36|I37|I38|I39|I40|
|I1|I2|I3|I4|I5|I6|I7|I8|I9|I10|I11|I12|I13|I14|I15|I16|I17|I18|I19|I20|I21|I22|I23|I24|I25|I26|I27|I28|I29|I30|I31|I32|I33|I34|I35|I36|I37|I38|I39|I40|
While the computation grows (2^n -1) with each item in Frequent Itemset, it surely should be able to handle 40 or more items in a Frequest Itemset
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org