You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean R. Owen (Jira)" <ji...@apache.org> on 2019/11/16 20:41:00 UTC
[jira] [Commented] (SPARK-29832) Unnecessary persist on instances
in ml.regression.IsotonicRegression.fit
[ https://issues.apache.org/jira/browse/SPARK-29832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975799#comment-16975799 ]
Sean R. Owen commented on SPARK-29832:
--------------------------------------
[~spark_cachecheck] some of these may be valid, but a lot of them don't appear to be. Let's tackle a few before opening the 30 JIRAs you did -- that's very noisy.
run() is going to do use (a transform of) this input many times in a loop. I don't think this analysis is accurate. You could say that it's more optimal to persist stuff closer to where the loop is; sometimes it's better, sometimes, not, depends on how big and expensive the result is.
> Unnecessary persist on instances in ml.regression.IsotonicRegression.fit
> ------------------------------------------------------------------------
>
> Key: SPARK-29832
> URL: https://issues.apache.org/jira/browse/SPARK-29832
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 3.0.0
> Reporter: Dong Wang
> Priority: Major
>
> Persist on instances in ml.regression.IsotonicRegression.fit() is unnecessary, because it is only used once in run(instances).
> {code:scala}
> override def fit(dataset: Dataset[_]): IsotonicRegressionModel = instrumented { instr =>
> transformSchema(dataset.schema, logging = true)
> // Extract columns from data. If dataset is persisted, do not persist oldDataset.
> val instances = extractWeightedLabeledPoints(dataset)
> val handlePersistence = dataset.storageLevel == StorageLevel.NONE
> // Unnecessary persist
> if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
> instr.logPipelineStage(this)
> instr.logDataset(dataset)
> instr.logParams(this, labelCol, featuresCol, weightCol, predictionCol, featureIndex, isotonic)
> instr.logNumFeatures(1)
> val isotonicRegression = new MLlibIsotonicRegression().setIsotonic($(isotonic))
> val oldModel = isotonicRegression.run(instances) // Only use once here
> if (handlePersistence) instances.unpersist()
> {code}
> This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org