You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rajeshbalamohan <gi...@git.apache.org> on 2016/03/26 10:01:15 UTC

[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

GitHub user rajeshbalamohan opened a pull request:

    https://github.com/apache/spark/pull/11978

    SPARK-14113. Consider marking JobConf closure-cleaning in HadoopRDD a…

    ## What changes were proposed in this pull request?
    
    In HadoopRDD, the following code was introduced as a part of SPARK-6943.
    
    ``
      if (initLocalJobConfFuncOpt.isDefined) {
        sparkContext.clean(initLocalJobConfFuncOpt.get)
      }
    ``
    
    Passing initLocalJobConfFuncOpt to HadoopRDD incurs good performance penalty (due to closure cleaning) with large number of RDDs. This would be invoked for every HadoopRDD initialization causing the bottleneck.
    
    example threadstack is given below
    
    ``
            at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
            at org.apache.xbean.asm5.ClassReader.readUTF8(Unknown Source)
            at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
            at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
            at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
            at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:402)
            at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:390)
            at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
            at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102)
            at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102)
            at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
            at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
            at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:102)
            at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
            at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:390)
            at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
            at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
            at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
            at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
            at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$15.apply(ClosureCleaner.scala:224)
            at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$15.apply(ClosureCleaner.scala:223)
            at scala.collection.immutable.List.foreach(List.scala:318)
            at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:223)
            at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
            at org.apache.spark.SparkContext.clean(SparkContext.scala:2079)
            at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:112)
    ``
    
    This PR request does the following
    
    1. Remove the closure cleaning in HadoopRDD init, which was mainly added to check if HadoopRDD can be made serializable or not.
    2. Directly instantiate HadoopRDD in OrcRelation, instead of going via SparkContext.hadoopRDD (which internally invokes threaddump in "withScope"). Clubbing this change instead of making a separate ticket for this minor change.
    
    
    ## How was this patch tested?
    
    No new tests have been added. Used the following code to measure overhead the HadoopRDD init codepath. With patch it took 340ms as opposed to 4815ms without patch.
    
    Also tested with number of queries from TPC-DS in multi node environment. Along with, ran the following unit tests org.apache.spark.sql.hive.execution.HiveCompatibilitySuite,org.apache.spark.sql.hive.execution.HiveQuerySuite,org.apache.spark.sql.hive.execution.PruningSuite,org.apache.spark.sql.hive.CachedTableSuite,org.apache.spark.rdd.RDDOperationScopeSuite,org.apache.spark.ui.jobs.JobProgressListenerSuite
    
    ``
      test("Check timing for HadoopRDD init") {
        val start: Long = System.currentTimeMillis();
    
        val initializeJobConfFunc = HadoopTableReader.initializeLocalJobConfFunc ("", null) _
        Utils.withDummyCallSite(sqlContext.sparkContext) {
          // Large tables end up creating 5500 RDDs
          for(i <- 1 to 5500) {
            // ignore nulls in RDD as its mainly for testing timing of RDD creation
            val testRDD = new HadoopRDD(sqlContext.sparkContext, null, Some(initializeJobConfFunc),
              null, classOf[NullWritable], classOf[Writable], 10)
          }
        }
        val end: Long = System.currentTimeMillis();
        println("Time taken : " + (end - start))
      }
    ``
    
    Without Patch: (time taken to init 5000 HadoopRDD)
    Time taken : 4815         
    
    Without Patch: (time taken to init 5000 HadoopRDD)
    Time taken : 340
    
    
    …s optional

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rajeshbalamohan/spark SPARK-14113

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11978.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11978
    
----
commit dfb6b03c5061dd8514fe09804c30c9281af50ab9
Author: Rajesh Balamohan <rb...@apache.org>
Date:   2016-03-26T08:58:17Z

    SPARK-14113. Consider marking JobConf closure-cleaning in HadoopRDD as optional

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-204300804
  
    **[Test build #54687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54687/consoleFull)** for PR 11978 at commit [`0c53ed2`](https://github.com/apache/spark/commit/0c53ed23e3fb24cc5d882272ddca629843005629).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11978#discussion_r57609134
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -979,6 +979,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
         // A Hadoop configuration can be about 10 KB, which is pretty big, so broadcast it.
         val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration))
         val setInputPathsFunc = (jobConf: JobConf) => FileInputFormat.setInputPaths(jobConf, path)
    +    clean(setInputPathsFunc)
    --- End diff --
    
    yeah, I think we need to add the cleaning for `hadoopRDD` too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-202619019
  
    **[Test build #54375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54375/consoleFull)** for PR 11978 at commit [`d4e75d2`](https://github.com/apache/spark/commit/d4e75d2b306918be131a2d6ef70160f5a3353fe2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-204301039
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54687/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by rajeshbalamohan <gi...@git.apache.org>.
Github user rajeshbalamohan commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-211175800
  
    @srowen  - As per andrew's comment, I thought it was fine to make the change given that HadoopRDD is marked as DeveloperAPI.  Please let me know if any additional changes are needed.
    
    Additional info: Huge amount of changes in SPARK-13664 for FileSourceStrategy which is marked as the default codepath.  So ideally, OrcRelation would no longer go via this codepath by default. Given that, this PR would have an impact if someone is trying to directly invoke HadoopRDD and has done closure clearing upfront.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-201783935
  
    Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-204517168
  
    @rajeshbalamohan the point of cleaning a closure is because it might be passed in by the user. If we provide the closure internally then we don't have to clean it. Before this patch we used to clean the user's closure in `sc.hadoopRDD`, but after this patch we don't do that anymore. That regresses behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11978#discussion_r57608918
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala ---
    @@ -317,12 +317,11 @@ private[orc] case class OrcTableScan(
           classOf[OrcInputFormat]
             .asInstanceOf[Class[_ <: MapRedInputFormat[NullWritable, Writable]]]
     
    -    val rdd = sqlContext.sparkContext.hadoopRDD(
    +    val rdd = new HadoopRDD(sqlContext.sparkContext,
    --- End diff --
    
    can you add a comment to say we're creating a HadoopRDD here directly to bypass closure cleaning


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-202617742
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by rajeshbalamohan <gi...@git.apache.org>.
Github user rajeshbalamohan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11978#discussion_r57537799
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -979,6 +979,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
         // A Hadoop configuration can be about 10 KB, which is pretty big, so broadcast it.
         val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration))
         val setInputPathsFunc = (jobConf: JobConf) => FileInputFormat.setInputPaths(jobConf, path)
    +    clean(setInputPathsFunc)
    --- End diff --
    
    Thanks @srowen. Yes, for invocations via sc.textFile.  Adding additional method like following and passing initLocalJobConfFuncOpt to it can help avoid closure cleaning in this scenario.  However, this would call for changes in all other places where sc.textFile is invoked.  Intension was to allow user to make use of HadoopRDD directly (if needed) without having to incur the cost of closure cleaning (e.g in sql modules). Hence did not make those additional changes.
    
    ```
      def newTextFile(
          path: String,
          initLocalJobConfFuncOpt: Option[JobConf => Unit],
          minPartitions: Int = defaultMinPartitions): RDD[String] = withScope {
        assertNotStopped()
        hadoopFile(path, classOf[TextInputFormat], initLocalJobConfFuncOpt,
          classOf[LongWritable], classOf[Text],
          minPartitions).map(pair => pair._2.toString).setName(path)
      }
    
    
      def hadoopFile[K, V](
          path: String,
          inputFormatClass: Class[_ <: InputFormat[K, V]],
          initLocalJobConfFuncOpt: Option[JobConf => Unit],
          keyClass: Class[K],
          valueClass: Class[V],
          minPartitions: Int = defaultMinPartitions): RDD[(K, V)] = withScope {
        assertNotStopped()
        // A Hadoop configuration can be about 10 KB, which is pretty big, so broadcast it.
        val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration))
        new HadoopRDD(
          this,
          confBroadcast,
          initLocalJobConfFuncOpt,
          inputFormatClass,
          keyClass,
          valueClass,
          minPartitions).setName(path)
      }
    
    e.g
      sc.newTextFile(tmpFilePath, Some(setInputPathsFunc), 4).count()
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11978#discussion_r58714243
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -979,6 +979,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
         // A Hadoop configuration can be about 10 KB, which is pretty big, so broadcast it.
         val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration))
         val setInputPathsFunc = (jobConf: JobConf) => FileInputFormat.setInputPaths(jobConf, path)
    +    clean(setInputPathsFunc)
    --- End diff --
    
    @rajeshbalamohan catching up here: I think the remaining TODO is that cleaning still needs to be restored for `hadoopRDD` right? then this is ready.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-212182468
  
    (This is trivial but might be better if the title follows `[SPARK-XXXXX][SQL]` format as described in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by rajeshbalamohan <gi...@git.apache.org>.
Github user rajeshbalamohan closed the pull request at:

    https://github.com/apache/spark/pull/11978


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-210440963
  
    ping @rajeshbalamohan ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by rajeshbalamohan <gi...@git.apache.org>.
Github user rajeshbalamohan commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-204200260
  
    Thanks @andrewor14 . Addressed your review comments in latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-214677094
  
    @rajeshbalamohan looks like this needs a rebase now, and I think Andrew's comment still needs to be addressed. Are you suggesting this change is no longer needed or did I misunderstand your last comment? Let's resolve this one way or the other.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by rajeshbalamohan <gi...@git.apache.org>.
Github user rajeshbalamohan commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-202048734
  
    Tested with following suites along with the earlier sql suites 
    
    org.apache.spark.FileSuite
    org.apache.spark.SparkContextSuite
    org.apache.spark.graphx.GraphLoaderSuite
    org.apache.spark.graphx.lib.SVDPlusPlusSuite
    org.apache.spark.metrics.InputOutputMetricsSuite
    org.apache.spark.ml.PipelineSuite
    org.apache.spark.ml.classification.DecisionTreeClassifierSuite
    org.apache.spark.ml.classification.LogisticRegressionSuite
    org.apache.spark.ml.classification.MultilayerPerceptronClassifierSuite
    org.apache.spark.ml.classification.NaiveBayesSuite
    org.apache.spark.ml.clustering.KMeansSuite
    org.apache.spark.ml.clustering.LDASuite
    org.apache.spark.ml.evaluation.BinaryClassificationEvaluatorSuite
    org.apache.spark.ml.evaluation.MulticlassClassificationEvaluatorSuite
    org.apache.spark.ml.evaluation.RegressionEvaluatorSuite
    org.apache.spark.ml.feature.BinarizerSuite
    org.apache.spark.ml.feature.BucketizerSuite
    org.apache.spark.ml.feature.ChiSqSelectorSuite
    org.apache.spark.ml.feature.CountVectorizerSuite
    org.apache.spark.ml.feature.DCTSuite
    org.apache.spark.ml.feature.ElementwiseProductSuite
    org.apache.spark.ml.feature.HashingTFSuite
    org.apache.spark.ml.feature.IDFSuite
    org.apache.spark.ml.feature.InteractionSuite
    org.apache.spark.ml.feature.MaxAbsScalerSuite
    org.apache.spark.ml.feature.MinMaxScalerSuite
    org.apache.spark.ml.feature.NGramSuite
    org.apache.spark.ml.feature.NormalizerSuite
    org.apache.spark.ml.feature.OneHotEncoderSuite
    org.apache.spark.ml.feature.PCASuite
    org.apache.spark.ml.feature.PolynomialExpansionSuite
    org.apache.spark.ml.feature.QuantileDiscretizerSuite
    org.apache.spark.ml.feature.RFormulaSuite
    org.apache.spark.ml.feature.RegexTokenizerSuite
    org.apache.spark.ml.feature.SQLTransformerSuite
    org.apache.spark.ml.feature.StandardScalerSuite
    org.apache.spark.ml.feature.StopWordsRemoverSuite
    org.apache.spark.ml.feature.StringIndexerSuite
    org.apache.spark.ml.feature.TokenizerSuite
    org.apache.spark.ml.feature.VectorAssemblerSuite
    org.apache.spark.ml.feature.VectorIndexerSuite
    org.apache.spark.ml.feature.VectorSlicerSuite
    org.apache.spark.ml.feature.Word2VecSuite
    org.apache.spark.ml.recommendation.ALSSuite
    org.apache.spark.ml.regression.AFTSurvivalRegressionSuite
    org.apache.spark.ml.regression.DecisionTreeRegressorSuite
    org.apache.spark.ml.regression.GeneralizedLinearRegressionSuite
    org.apache.spark.ml.regression.IsotonicRegressionSuite
    org.apache.spark.ml.regression.LinearRegressionSuite
    org.apache.spark.ml.source.libsvm.LibSVMRelationSuite
    org.apache.spark.ml.tuning.CrossValidatorSuite
    org.apache.spark.ml.util.DefaultReadWriteSuite
    org.apache.spark.mllib.classification.LogisticRegressionSuite
    org.apache.spark.mllib.classification.NaiveBayesSuite
    org.apache.spark.mllib.classification.SVMSuite
    org.apache.spark.mllib.clustering.GaussianMixtureSuite
    org.apache.spark.mllib.clustering.KMeansSuite
    org.apache.spark.mllib.clustering.LDASuite
    org.apache.spark.mllib.clustering.PowerIterationClusteringSuite
    org.apache.spark.mllib.feature.ChiSqSelectorSuite
    org.apache.spark.mllib.feature.Word2VecSuite
    org.apache.spark.mllib.fpm.FPGrowthSuite
    org.apache.spark.mllib.recommendation.MatrixFactorizationModelSuite
    org.apache.spark.mllib.regression.IsotonicRegressionSuite
    org.apache.spark.mllib.regression.LassoSuite
    org.apache.spark.mllib.regression.LinearRegressionSuite
    org.apache.spark.mllib.regression.RidgeRegressionSuite
    org.apache.spark.mllib.tree.DecisionTreeSuite
    org.apache.spark.mllib.tree.GradientBoostedTreesSuite
    org.apache.spark.mllib.tree.RandomForestSuite
    org.apache.spark.mllib.util.MLUtilsSuite
    org.apache.spark.rdd.HadoopRDD,
    org.apache.spark.rdd.MapPartitionsRDD,
    org.apache.spark.rdd.PairRDDFunctionsSuite
    org.apache.spark.repl.ReplSuite
    org.apache.spark.sql.execution.datasources.csv.CSVSuite
    org.apache.spark.sql.execution.datasources.json.JsonSuite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-204255888
  
    **[Test build #54687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54687/consoleFull)** for PR 11978 at commit [`0c53ed2`](https://github.com/apache/spark/commit/0c53ed23e3fb24cc5d882272ddca629843005629).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-204255875
  
    @rajeshbalamohan you need to clean `sc.hadoopRDD` too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-204255624
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by rajeshbalamohan <gi...@git.apache.org>.
Github user rajeshbalamohan commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-214705705
  
    @srowen  - With the master code base & the changes that went in (FileSourceStrategy to be specific), this PR would no longer be very relevant in master codebase. This would be more relevant for 1.6.x line, but not sure if we need to backport it. Will mark it as closed now.  Plz let me know and I can close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by rajeshbalamohan <gi...@git.apache.org>.
Github user rajeshbalamohan commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-204307805
  
    @andrewor14  - Not sure if I understood your last comment.  Currently no direct invocation to  HadoopRDD (with initLocalJobConfFuncOpt) is made in Spark. Later point in time, if change is needed to invoke HadoopRDD (with initLocalJobConfFuncOpt)  via SparkContext, following method could be added which cleans up the function.  
    
    ```
    def hadoopRDD[K, V](
           broadcastedConf: Broadcast[SerializableConfiguration],
           initLocalJobConfFuncOpt: Option[JobConf => Unit],
           inputFormatClass: Class[_ <: InputFormat[K, V]],
           keyClass: Class[K],
           valueClass: Class[V],
           minPartitions: Int = defaultMinPartitions): RDD[(K, V)] = withScope {
        assertNotStopped()
        clean(initLocalJobConfFuncOpt)
        new HadoopRDD(this, broadcastedConf, initLocalJobConfFuncOpt,
          inputFormatClass, keyClass, valueClass, minPartitions)
      }
    ```
    
    But, I am not sure whether we need to clean sc.hadoopRDD in this patch. Please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11978#discussion_r57529953
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -979,6 +979,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
         // A Hadoop configuration can be about 10 KB, which is pretty big, so broadcast it.
         val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration))
         val setInputPathsFunc = (jobConf: JobConf) => FileInputFormat.setInputPaths(jobConf, path)
    +    clean(setInputPathsFunc)
    --- End diff --
    
    Doesn't this still cause the closure cleaning to happen once per `HadoopRDD`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-204301037
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-201828591
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54261/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-202658203
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54375/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-215924087
  
    @rajeshbalamohan go ahead and close this. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-202657649
  
    **[Test build #54375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54375/consoleFull)** for PR 11978 at commit [`d4e75d2`](https://github.com/apache/spark/commit/d4e75d2b306918be131a2d6ef70160f5a3353fe2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11978#discussion_r57559696
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -979,6 +979,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
         // A Hadoop configuration can be about 10 KB, which is pretty big, so broadcast it.
         val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration))
         val setInputPathsFunc = (jobConf: JobConf) => FileInputFormat.setInputPaths(jobConf, path)
    +    clean(setInputPathsFunc)
    --- End diff --
    
    OK yeah it's added back for `hadoopFile` calls but not others, like `hadoopRDD` or direct use. I don't know enough to evaluate this change with authority. If it's there for correctness it has to be there, and this does look like a function that needs cleaning, right? It may not happen to need cleaning in the case you're exercising but that doesn't mean it works to not clean it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-201828050
  
    **[Test build #54261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54261/consoleFull)** for PR 11978 at commit [`dfb6b03`](https://github.com/apache/spark/commit/dfb6b03c5061dd8514fe09804c30c9281af50ab9).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-211263001
  
    I'm referring to https://github.com/apache/spark/pull/11978#issuecomment-204517168 which suggests `hadoopRDD` calls need to be cleaned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-201828588
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-202515698
  
    Looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-201738816
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-202658198
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11978#discussion_r57608860
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala ---
    @@ -317,12 +317,11 @@ private[orc] case class OrcTableScan(
           classOf[OrcInputFormat]
             .asInstanceOf[Class[_ <: MapRedInputFormat[NullWritable, Writable]]]
     
    -    val rdd = sqlContext.sparkContext.hadoopRDD(
    +    val rdd = new HadoopRDD(sqlContext.sparkContext,
           conf.asInstanceOf[JobConf],
           inputFormatClass,
           classOf[NullWritable],
    -      classOf[Writable]
    -    ).asInstanceOf[HadoopRDD[NullWritable, Writable]]
    +      classOf[Writable], sqlContext.sparkContext.defaultMinPartitions)
    --- End diff --
    
    split this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-201783910
  
    CC @andrewor14 possibly to comment on whether this closure cleaning can be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11978#discussion_r58250294
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
    @@ -109,10 +109,6 @@ class HadoopRDD[K, V](
         minPartitions: Int)
       extends RDD[(K, V)](sc, Nil) with Logging {
     
    -  if (initLocalJobConfFuncOpt.isDefined) {
    -    sparkContext.clean(initLocalJobConfFuncOpt.get)
    --- End diff --
    
    technically this might also regress behavior since it's `HadoopRDD` is not private, but it's probably OK if this change improves performance since this class is a `DeveloperApi`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11978#issuecomment-201786327
  
    **[Test build #54261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54261/consoleFull)** for PR 11978 at commit [`dfb6b03`](https://github.com/apache/spark/commit/dfb6b03c5061dd8514fe09804c30c9281af50ab9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org