You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/02/10 10:54:18 UTC

[jira] [Commented] (PIG-4790) Join after union fail due to UnionOptimizer

    [ https://issues.apache.org/jira/browse/PIG-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140582#comment-15140582 ] 

Rohini Palaniswamy commented on PIG-4790:
-----------------------------------------

The difference in the complex script is that one of the edges is a vertex group. The fix has a problem though. It turns of UnionOptimizer for the simple case as well where the edges are normal which the previous patch handled. We should avoid turning off UnionOptimizer as much as possible because the performance of UnorderedPartitionedKVOutput is currently very bad and is not fixed yet. Would be good to add the script to TestTezCompiler as well. 

> Join after union fail due to UnionOptimizer
> -------------------------------------------
>
>                 Key: PIG-4790
>                 URL: https://issues.apache.org/jira/browse/PIG-4790
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.16.0
>
>         Attachments: PIG-4790-1.patch, PIG-4790-2.patch
>
>
> The following script fail to run:
> {code}
> rmf ooo
> a = load 'student.txt' as (name:chararray, age:int, gpa:double);
> b = filter a by age > 65;
> c = filter a by age <=10;
> d = union b, c;
> e = join a by name left, d by name;
> store e into 'ooo';
> {code}
> Exception stack:
> {code}
> Caused by: java.lang.IllegalArgumentException: Edge [scope-43 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> [scope-55 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ SCATTER_GATHER : org.apache.tez.runtime.library.input.OrderedGroupedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput >> NullEdgeManager }) already defined!
>         at org.apache.tez.dag.api.DAG.addEdge(DAG.java:272)
>         at org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:311)
>         at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:252)
>         at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
>         at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
>         at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
>         at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:65)
>         at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:111)
>         ... 20 more
> {code}
> Disable pig.tez.opt.union the script runs fine.
> Seems we shall detect this patten and disallow merge vertex group into a pair already has an edge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)