You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Adam Szita (JIRA)" <ji...@apache.org> on 2017/03/29 10:35:41 UTC

[jira] [Commented] (PIG-5200) Orc_1 and Orc_Pushdown_* tests fail on Spark

    [ https://issues.apache.org/jira/browse/PIG-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946904#comment-15946904 ] 

Adam Szita commented on PIG-5200:
---------------------------------

So it looks like Pig code is calling Hive code (for ORC) which then calls some Kryo methods, that eventually tries to {{Class.forName}} the same Hive class it was originally called from. For a surprising reason this fails:

The problem roots in the fact that we use spark-assembly.jar, here is why:

# spark-assembly.jar gets to be the first entry in the {color:red}system classloader{color} (which is an URLClassLoader) on the Spark executor JVM.
# The Java implementation of URLClassLoader creates a {{jarLoader}} object for this path, and puts this loader into its {{loaders}} list into the first position.
#* this means that whenever we try loading a class, JVM will look for it in this jar first.
# During execution PigRecordReader and OrcNewInputFormat are loaded using Spark's {color:green}MutableClassLoader{color} which has the jars on its classpath for which we called {{sparkContext.addJar}} on the frontend (e.g. hive-exec, hive-serde, kryo, etc..)
# As we can see in the stracktrace Hive code calls {{fromKryo()}} method which creates a Kryo object. The problem is that the {color:red}system classloader{color} already has a loader for Kryo classes, because Kryo is shaded in spark-assembly.jar, see #2. *So the {color:red}system classloader{color} will load Kryo and not Spark's {color:green}MutableClassLoader{color}.*
#* In Kryo's constructor we set a classLoader field: {{this.classLoader = this.getClass().getClassLoader();}} - this is then ultimately set to be the {color:red}system classloader{color} since that loaded Kryo...
# Further down the code Kryo calls {{type = Class.forName(className, false, this.kryo.getClassLoader());}} and it will try loading a Hive class using the classloader that was set in Kryo, which is the {color:red}system classloader{color}.
# Since there is no Hive jars on the _system classpath_ this fails.

What I tried as a workaround is that I manually removed Kryo from the spark-assembly.jar. This ultimately fixed the issue because the {color:red}system classloader{color} is thus unable to load Kryo, and the JVM will use Spark's {color:green}MutableClassLoader{color} to load. After that Kryo will use the same classloader that loaded itself and since that it {color:green}MutableClassLoader{color}, the ClassNotFound exceptions for Hive classes are gone.

[~kellyzly], [~xuefuz] how do you think we could fix this?

> Orc_1 and Orc_Pushdown_* tests fail on Spark
> --------------------------------------------
>
>                 Key: PIG-5200
>                 URL: https://issues.apache.org/jira/browse/PIG-5200
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>
> Orc_1 and all of the Orc_Pushdown E2E tests produce the following exception:
> {code}
> 2017-03-27 03:16:50,293 [task-result-getter-1] WARN  org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 1.0 (TID 1, example-2.com, executor 1): java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Unable to find class:
>  org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionTree
> Serialization trace:
> expression (org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:263)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.<init>(PigRecordReader.java:121)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:117)
>         at org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark.createRecordReader(PigInputFormatSpark.java:64)
>         at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
>         at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
>         at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionTree
> Serialization trace:
> expression (org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl)
>         at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>         at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>         at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>         at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)
>         at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
>         at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl.fromKryo(SearchArgumentImpl.java:1006)
>         at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory.create(SearchArgumentFactory.java:44)
>         at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory.createFromConf(SearchArgumentFactory.java:52)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:312)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:229)
>         at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat$OrcRecordReader.<init>(OrcNewInputFormat.java:69)
>         at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.createRecordReader(OrcNewInputFormat.java:51)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:253)
>         ... 23 more
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$ExpressionTree
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:270)
>         at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
>         ... 36 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)