You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhengruifeng (JIRA)" <ji...@apache.org> on 2018/04/02 04:07:00 UTC
[jira] [Created] (SPARK-23841) NodeIdCache should unpersist the
last cached nodeIdsForInstances
zhengruifeng created SPARK-23841:
------------------------------------
Summary: NodeIdCache should unpersist the last cached nodeIdsForInstances
Key: SPARK-23841
URL: https://issues.apache.org/jira/browse/SPARK-23841
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 2.4.0
Reporter: zhengruifeng
{{{{NodeIdCache}}}} forget to unpersist the last cached intermediate dataset:
{code:java}
scala> import org.apache.spark.ml.classification._
import org.apache.spark.ml.classification._
scala> val df = spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_libsvm_data.txt")
2018-04-02 11:48:25 WARN LibSVMFileFormat:66 - 'numFeatures' option not specified, determining the number of features by going though the input. If you know the number in advance, please specify it via 'numFeatures' option to avoid the extra scan.
2018-04-02 11:48:31 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
scala> val rf = new RandomForestClassifier().setCacheNodeIds(true)
rf: org.apache.spark.ml.classification.RandomForestClassifier = rfc_aab2b672546b
scala> val rfm = rf.fit(df)
rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = RandomForestClassificationModel (uid=rfc_aab2b672546b) with 20 trees
scala> sc.getPersistentRDDs
res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(56 -> MapPartitionsRDD[56] at map at NodeIdCache.scala:102){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org