You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Praveen Rachabattuni (JIRA)" <ji...@apache.org> on 2015/02/24 04:52:12 UTC

[jira] [Updated] (PIG-4228) SchemaTupleBackend error when working on a Spark 1.1.0 cluster

     [ https://issues.apache.org/jira/browse/PIG-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Praveen Rachabattuni updated PIG-4228:
--------------------------------------
    Fix Version/s: spark-branch

> SchemaTupleBackend error when working on a Spark 1.1.0 cluster
> --------------------------------------------------------------
>
>                 Key: PIG-4228
>                 URL: https://issues.apache.org/jira/browse/PIG-4228
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.14.0
>         Environment: spark-1.1.0
>            Reporter: Carlos Balduz
>              Labels: spark
>             Fix For: spark-branch
>
>         Attachments: groupby.pig, movies_data.csv
>
>
> Whenever I try to run a script on a Spark cluster, I get the following error:
> ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (...): java.lang.RuntimeException: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at [1-2[-1,-1]]
>         org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62)
>         org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
>         scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>         scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
>         org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34)
>         org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
>         scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
>         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:745)
> After debugging I have seen that the problem is inside SchemaTupleBackend. Although SparkLauncher initializes this class, when the job gets sent to the executors this is lost and when POOutputConsumerIterator tries to fetch the results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing a RuntimeException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)