You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thomas Kappler (JIRA)" <ji...@apache.org> on 2011/05/11 17:07:47 UTC
[jira] [Commented] (PIG-1683) New logical plan: Nested foreach plan
fail if one inner alias is refered more than once
[ https://issues.apache.org/jira/browse/PIG-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031767#comment-13031767 ]
Thomas Kappler commented on PIG-1683:
-------------------------------------
I found a strange problem that looks like a special case of this issue. Apologies if it isn't.
I wanted to use REGEX_EXTRACT in a nested generate block where I clean up some strings. Pig accepts or rejects the block depending on the order of the "is null" condition. The simplest example I could come up with that shows the problem is this:
{noformat}
a = load '1.txt' using PigStorage(',') as (a0:chararray, a1:chararray);
b = foreach a {
b0 = TRIM(a0);
b1 = REGEX_EXTRACT(b0, '^\\((.+)\\)$', 1);
generate ((b1 is null) ? b0 : b1) as cleaned_name; -- FAILS
-- generate ((b1 is not null) ? b1 : b0) as cleaned_name; -- SUCCEEDS
-- generate ((b1 is null) ? b0 : b1); -- FAILS
}
store b into 'out';
{noformat}
1.txt is
{noformat}
foo1,bar1
(foo2),bar2
{noformat}
The "b is null" variant fails with the original error message of this issue: "Attempt to give operator of type org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject multiple outputs. This operator does not support multiple outputs."
The inverted, logically equivalent "b is not null" variant succeeds.
If I replace the REGEX_EXTRACT call with a simple expression like "b1 = a0", it works. But the way I read the Pig Latin reference, it should be allowed at this point since it's not a relational operator?
> New logical plan: Nested foreach plan fail if one inner alias is refered more than once
> ---------------------------------------------------------------------------------------
>
> Key: PIG-1683
> URL: https://issues.apache.org/jira/browse/PIG-1683
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.8.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1683-1.patch
>
>
> The following script fail:
> {code}
> a = load '1.txt' as (a0, a1, a2);
> b = load '2.txt' as (b0, b1);
> c = join a by a0, b by b0;
> d = foreach c {
> d0 = a::a0;
> d1 = a::a1;
> generate ((d0 is not null)? d0 : d1);
> }
> explain d;
> {code}
> Stack:
> ERROR 2015: Invalid physical operators in the physical plan
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias d
> at org.apache.pig.PigServer.explain(PigServer.java:957)
> at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:353)
> at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:285)
> at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:248)
> at org.apache.pig.tools.pigscript.parser.PigScriptParser.Explain(PigScriptParser.java:605)
> at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:327)
> at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> at org.apache.pig.Main.run(Main.java:498)
> at org.apache.pig.Main.main(Main.java:107)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false.
> at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:308)
> at org.apache.pig.PigServer.compilePp(PigServer.java:1350)
> at org.apache.pig.PigServer.explain(PigServer.java:926)
> ... 10 more
> Caused by: org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: ERROR 2015: Invalid physical operators in the physical plan
> at org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:474)
> at org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:82)
> at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
> at org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:519)
> at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71)
> at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:295)
> ... 12 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give operator of type org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject multiple outputs. This operator does not support multiple outputs.
> at org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:180)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:133)
> at org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:470)
> ... 19 more
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira