You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Nandor Kollar (JIRA)" <ji...@apache.org> on 2017/04/10 22:00:43 UTC
[jira] [Commented] (PIG-5176) Several ComputeSpec test cases fail

    [ https://issues.apache.org/jira/browse/PIG-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963551#comment-15963551 ] 

Nandor Kollar commented on PIG-5176:
------------------------------------

It looks like the problem occurs when the Spark file server is Netty-based. It seems that the [Netty file server|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L52] has a strange restriction, you can't add the same file with the same name twice, while [HTTP file server|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/HttpFileServer.scala#L64] doesn't restrict on this. [~kellyzly] probably on your cluster Spark used HTTP-based file server, that's why you didn't experience this issue, and my cluster uses Netty-based implementation.
You can reproduce the problem in a unit test: execute TestStreaming#testInputShipSpecs in yarn client mode (SPARK_MASTER=yarn-client) and with the additional VM option: -Dspark.rpc.useNettyFileServer=true. It should fail, but when you remove the spark.rpc.useNettyFileServer VM option (this means Spark will use HTTP-based file server, see [NettyRpcEnv.scala|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala#L59]) it should pass. I'm not sure how can we fix this, should we check for the currently used file server implementation property in Pig, set the useNettyFileServer to false in our SparkLauncher class, or document that this is not supported, one should use HTTP-based file server. Liyun, What do you recommend?

> Several ComputeSpec test cases fail
> -----------------------------------
>
>                 Key: PIG-5176
>                 URL: https://issues.apache.org/jira/browse/PIG-5176
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>         Attachments: PIG-5176.patch
>
>
> Several ComputeSpec test cases failed on my cluster:
> ComputeSpec_5 - ComputeSpec_13
> These scripts have a ship() part in the define, where the ship includes the script file too, so we add the same file to spark context twice. This is not a problem with Hadoop, but looks like Spark doesn't like adding the same filename twice:
> {code}
> Caused by: java.lang.IllegalArgumentException: requirement failed: File PigStreamingDepend.pl already registered.
>         at scala.Predef$.require(Predef.scala:233)
>         at org.apache.spark.rpc.netty.NettyStreamManager.addFile(NettyStreamManager.scala:69)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1386)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1348)
>         at org.apache.spark.api.java.JavaSparkContext.addFile(JavaSparkContext.scala:662)
>         at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addResourceToSparkJobWorkingDirectory(SparkLauncher.java:462)
>         at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.shipFiles(SparkLauncher.java:371)
>         at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addFilesToSparkJob(SparkLauncher.java:357)
>         at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.uploadResources(SparkLauncher.java:235)
>         at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:222)
>         at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)