You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Adam Szita (JIRA)" <ji...@apache.org> on 2017/03/10 21:40:04 UTC

[jira] [Commented] (PIG-5180) MergeSparseJoin fails with Spark exec type

    [ https://issues.apache.org/jira/browse/PIG-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905742#comment-15905742 ] 

Adam Szita commented on PIG-5180:
---------------------------------

For the sparse merge join we incorporate IndexedStorage from the piggybank library. This implements IndexableLoadFunc.
In SparkCompiler#visitMergeJoin we don't call setIndexFile() on the POMergeJoin instance if the corresponding load func implements IndexableLoadFunc. Thats why we end up trying to create a Path object from a null String.
In these cases we don't need to replicate an index file, since it is already stored that way on the HDFS, so a simple null-check will take care of this.
[~kellyzly] can you take a look on [^PIG-5180.0.patch]?

> MergeSparseJoin fails with Spark exec type
> ------------------------------------------
>
>                 Key: PIG-5180
>                 URL: https://issues.apache.org/jira/browse/PIG-5180
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>         Attachments: PIG-5180.0.patch
>
>
> MergeSparseJoin 1 to 6 all fail due to following exception being thrown on the frontend side:
> {code}
> Caused by: java.lang.IllegalArgumentException: Can not create a Path from a null string
> 	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:122)
> 	at org.apache.hadoop.fs.Path.<init>(Path.java:134)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.setReplicationForMergeJoin(JobGraphBuilder.java:126)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:105)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> 	at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> 	at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:224)
> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> 	... 33 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)