You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2016/03/24 09:46:25 UTC
[jira] [Created] (SPARK-14113) Consider marking JobConf
closure-cleaning in HadoopRDD as optional
Rajesh Balamohan created SPARK-14113:
----------------------------------------
Summary: Consider marking JobConf closure-cleaning in HadoopRDD as optional
Key: SPARK-14113
URL: https://issues.apache.org/jira/browse/SPARK-14113
Project: Spark
Issue Type: Bug
Components: Spark Core
Reporter: Rajesh Balamohan
In HadoopRDD, the following code was introduced as a part of SPARK-6943.
{noformat}
if (initLocalJobConfFuncOpt.isDefined) {
sparkContext.clean(initLocalJobConfFuncOpt.get)
}
{noformat}
When working on one of the changes in OrcRelation, I tried passing initLocalJobConfFuncOpt to HadoopRDD and that incurred good performance penalty (due to closure cleaning) with large RDDs. This would be invoked for every HadoopRDD initialization causing the bottleneck.
example threadstack is given below
{noformat}
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.readUTF8(Unknown Source)
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:402)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:390)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:102)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:390)
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$15.apply(ClosureCleaner.scala:224)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$15.apply(ClosureCleaner.scala:223)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:223)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2079)
at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:112){noformat}
Creating this JIRA to explore the possibility of removing it or mark it optional.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org