You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Nandor Kollar (JIRA)" <ji...@apache.org> on 2017/12/13 11:31:00 UTC
[jira] [Commented] (PIG-5320) TestCubeOperator#testRollupBasic is flaky on Spark 2.2

    [ https://issues.apache.org/jira/browse/PIG-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289136#comment-16289136 ] 

Nandor Kollar commented on PIG-5320:
------------------------------------

I think this is a problem with Spark 1.6.x too, checking for the condition in a loop should solve the problem. I also changed the map and set implementation to sorted one, since we use integer job ids, I hope it would slightly improve performance in case of many jobs. [~kellyzly], [~szita] could you please have a look at my patch? My only concern is: is SparkListener#onJobEnd() called when the job fails? If not, then Pig would stuck in an infinite loop.

> TestCubeOperator#testRollupBasic is flaky on Spark 2.2
> ------------------------------------------------------
>
>                 Key: PIG-5320
>                 URL: https://issues.apache.org/jira/browse/PIG-5320
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>         Attachments: PIG-5320_1.patch
>
>
> TestCubeOperator#testRollupBasic occasionally fails with
> {code}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias c
> 	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1779)
> 	at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
> 	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1110)
> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:512)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
> 	at org.apache.pig.PigServer.registerScript(PigServer.java:781)
> 	at org.apache.pig.PigServer.registerScript(PigServer.java:858)
> 	at org.apache.pig.PigServer.registerScript(PigServer.java:821)
> 	at org.apache.pig.test.Util.registerMultiLineQuery(Util.java:972)
> 	at org.apache.pig.test.TestCubeOperator.testRollupBasic(TestCubeOperator.java:124)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get the rdds of this spark operator: 
> 	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> 	at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> 	at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:237)
> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:293)
> 	at org.apache.pig.PigServer.launchPlan(PigServer.java:1475)
> 	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1460)
> 	at org.apache.pig.PigServer.execute(PigServer.java:1449)
> 	at org.apache.pig.PigServer.access$500(PigServer.java:119)
> 	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1774)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
> 	at org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
> 	at org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
> 	at org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {code}
> I think the problem is that in JobStatisticCollector#waitForJobToEnd {{sparkListener.wait()}} is not inside a loop, like suggested in wait's javadoc:
> {code}
>      * As in the one argument version, interrupts and spurious wakeups are
>      * possible, and this method should always be used in a loop:
> {code}
> Thus due to a spurious wakeup, the wait might pass without a notify getting called.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)