You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dong Wang (Jira)" <ji...@apache.org> on 2019/11/11 02:52:00 UTC
[jira] [Created] (SPARK-29832) Unnecessary persist on instances in
ml.regression.IsotonicRegression.fit
Dong Wang created SPARK-29832:
---------------------------------
Summary: Unnecessary persist on instances in ml.regression.IsotonicRegression.fit
Key: SPARK-29832
URL: https://issues.apache.org/jira/browse/SPARK-29832
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 3.0.0
Reporter: Dong Wang
Persist on instances in ml.regression.IsotonicRegression.fit() is unnecessary, because it is only used once in run(instances).
{code:scala}
override def fit(dataset: Dataset[_]): IsotonicRegressionModel = instrumented { instr =>
transformSchema(dataset.schema, logging = true)
// Extract columns from data. If dataset is persisted, do not persist oldDataset.
val instances = extractWeightedLabeledPoints(dataset)
val handlePersistence = dataset.storageLevel == StorageLevel.NONE
// Unnecessary persist
if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
instr.logPipelineStage(this)
instr.logDataset(dataset)
instr.logParams(this, labelCol, featuresCol, weightCol, predictionCol, featureIndex, isotonic)
instr.logNumFeatures(1)
val isotonicRegression = new MLlibIsotonicRegression().setIsotonic($(isotonic))
val oldModel = isotonicRegression.run(instances) // Only use once here
if (handlePersistence) instances.unpersist()
{code}
This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org