You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arseniy Tashoyan (JIRA)" <ji...@apache.org> on 2018/02/02 10:57:00 UTC
[jira] [Created] (SPARK-23318) FP-growth: WARN FPGrowth: Input data
is not cached
Arseniy Tashoyan created SPARK-23318:
----------------------------------------
Summary: FP-growth: WARN FPGrowth: Input data is not cached
Key: SPARK-23318
URL: https://issues.apache.org/jira/browse/SPARK-23318
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 2.2.1
Reporter: Arseniy Tashoyan
When running FPGrowth.fit() fromĀ _ml_ package, one can see a warning:
WARN FPGrowth: Input data is not cached.
This warning occurs even the dataset of transactions is cached.
Actually this warning comes from the FPGrowth implementation in old _mllib_ package. New FPGrowth performs some transformations on the input data set of transactions and then passes it to the old FPGrowth - without caching. Hence the warning.
The problem looks similar to SPARK-18356
If you don't mind, I can push a similar fix:
{code}
// ml.FPGrowth
val handlePersistence = dataset.storageLevel == StorageLevel.NONE
if (handlePersistence) {
// cache the data
}
// then call mllib.FPGrowth
// finally unpersist the data
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org