You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thomas Kappler (JIRA)" <ji...@apache.org> on 2011/05/11 17:07:47 UTC

[jira] [Commented] (PIG-1683) New logical plan: Nested foreach plan fail if one inner alias is refered more than once

    [ https://issues.apache.org/jira/browse/PIG-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031767#comment-13031767 ] 

Thomas Kappler commented on PIG-1683:
-------------------------------------

I found a strange problem that looks like a special case of this issue. Apologies if it isn't.

I wanted to use REGEX_EXTRACT in a nested generate block where I clean up some strings. Pig accepts or rejects the block depending on the order of the "is null" condition. The simplest example I could come up with that shows the problem is this:

{noformat} 
a = load '1.txt' using PigStorage(',') as (a0:chararray, a1:chararray);
b = foreach a {
    b0 = TRIM(a0);
    b1 = REGEX_EXTRACT(b0, '^\\((.+)\\)$', 1);
    generate ((b1 is null) ? b0 : b1) as cleaned_name; -- FAILS
    -- generate ((b1 is not null) ? b1 : b0) as cleaned_name; -- SUCCEEDS
    -- generate ((b1 is null) ? b0 : b1); -- FAILS
}
store b into 'out';
{noformat}

1.txt is

{noformat}
foo1,bar1
 (foo2),bar2
{noformat}

The "b is null" variant fails with the original error message of this issue: "Attempt to give operator of type org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject multiple outputs. This operator does not support multiple outputs."

The inverted, logically equivalent "b is not null" variant succeeds.

If I replace the REGEX_EXTRACT call with a simple expression like "b1 = a0", it works. But the way I read the Pig Latin reference, it should be allowed at this point since it's not a relational operator?

> New logical plan: Nested foreach plan fail if one inner alias is refered more than once
> ---------------------------------------------------------------------------------------
>
>                 Key: PIG-1683
>                 URL: https://issues.apache.org/jira/browse/PIG-1683
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>         Attachments: PIG-1683-1.patch
>
>
> The following script fail:
> {code}
> a = load '1.txt' as (a0, a1, a2);
> b = load '2.txt' as (b0, b1);
> c = join a by a0, b by b0;
> d = foreach c {
>     d0 = a::a0;
>     d1 = a::a1;
>     generate ((d0 is not null)? d0 : d1);
> }
> explain d;
> {code}
> Stack:
> ERROR 2015: Invalid physical operators in the physical plan
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias d
>         at org.apache.pig.PigServer.explain(PigServer.java:957)
>         at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:353)
>         at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:285)
>         at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:248)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.Explain(PigScriptParser.java:605)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:327)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
>         at org.apache.pig.Main.run(Main.java:498)
>         at org.apache.pig.Main.main(Main.java:107)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false.
>         at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:308)
>         at org.apache.pig.PigServer.compilePp(PigServer.java:1350)
>         at org.apache.pig.PigServer.explain(PigServer.java:926)
>         ... 10 more
> Caused by: org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: ERROR 2015: Invalid physical operators in the physical plan
>         at org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:474)
>         at org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:82)
>         at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
>         at org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:519)
>         at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71)
>         at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>         at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:295)
>         ... 12 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give operator of type org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject multiple outputs.  This operator does not support multiple outputs.
>         at org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:180)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:133)
>         at org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:470)
>         ... 19 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira