You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2016/03/24 09:46:25 UTC

[jira] [Created] (SPARK-14113) Consider marking JobConf closure-cleaning in HadoopRDD as optional

Rajesh Balamohan created SPARK-14113:
----------------------------------------

             Summary: Consider marking JobConf closure-cleaning in HadoopRDD as optional
                 Key: SPARK-14113
                 URL: https://issues.apache.org/jira/browse/SPARK-14113
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
            Reporter: Rajesh Balamohan


In HadoopRDD, the following code was introduced as a part of SPARK-6943.

{noformat}
  if (initLocalJobConfFuncOpt.isDefined) {
    sparkContext.clean(initLocalJobConfFuncOpt.get)
  }
{noformat}

When working on one of the changes in OrcRelation, I tried passing initLocalJobConfFuncOpt to HadoopRDD and that incurred good performance penalty (due to closure cleaning) with large RDDs. This would be invoked for every HadoopRDD initialization causing the bottleneck.

example threadstack is given below

{noformat}
        at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
        at org.apache.xbean.asm5.ClassReader.readUTF8(Unknown Source)
        at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
        at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
        at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
        at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:402)
        at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:390)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102)
        at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:102)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:390)
        at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
        at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
        at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
        at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
        at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$15.apply(ClosureCleaner.scala:224)
        at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$15.apply(ClosureCleaner.scala:223)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:223)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:2079)
        at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:112){noformat}

Creating this JIRA to explore the possibility of removing it or mark it optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org