You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2008/07/18 19:08:31 UTC

[jira] Commented: (PIG-310) Some nested order by queries fail in logical to physical translator

    [ https://issues.apache.org/jira/browse/PIG-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614791#action_12614791 ] 

Alan Gates commented on PIG-310:
--------------------------------

It looks to me like the logical plan isn't being constructed quite correctly.  It appears that one project is being connected in two different places in the plan.

Here is the logical plan for the above script:

{code}
Logical Plan:
ForEach gates-Fri Jul 18 09:28:13 PDT 2008-8 Schema: {bytearray} Type: bag
|   |
|   Project gates-Fri Jul 18 09:28:13 PDT 2008-6 Projections:  [*]  Overloaded: false FieldSchema: CA: bag Type: bag
|   Input: SORT gates-Fri Jul 18 09:28:13 PDT 2008-5|
|   |---SORT gates-Fri Jul 18 09:28:13 PDT 2008-5 Schema: null Type: bag
|       |   |
|       |   Project gates-Fri Jul 18 09:28:13 PDT 2008-4 Projections: [0] Overloaded: false FieldSchema: bytearray Type: bytearray
|       |   Input: Project gates-Fri Jul 18 09:28:13 PDT 2008-3 Projections: [1] Overloaded: true|
|       |   |---Project gates-Fri Jul 18 09:28:13 PDT 2008-3 Projections: [1] Overloaded: true FieldSchema: A: tuple Type: tuple
|       |       Input: CoGroup gates-Fri Jul 18 09:28:13 PDT 2008-2
|       |
|       |---Project gates-Fri Jul 18 09:28:13 PDT 2008-3 Projections: [1] Overloaded: true FieldSchema: A: tuple Type: tuple
|           Input: CoGroup gates-Fri Jul 18 09:28:13 PDT 2008-2
|
|---CoGroup gates-Fri Jul 18 09:28:13 PDT 2008-2 Schema: {group: bytearray,A: (null)} Type: bag
    |   |
    |   Project gates-Fri Jul 18 09:28:13 PDT 2008-1 Projections: [0] Overloaded: false FieldSchema: bytearray Type: bytearray
    |   Input: Load gates-Fri Jul 18 09:28:13 PDT 2008-
    |
    |---Load gates-Fri Jul 18 09:28:13 PDT 2008-0 Schema: null Type: bag
{code}

Notice that the project named "Project gates-Fri Jul 18 09:28:13 PDT 2008-3" is connected both directly to the sort (that is, it's sorts input), and it is in the plan of sort.  This does not look correct.  And if you change the script from "order by $1" to "order by *", then the two projects are different, and the script works.

Assigning to Santhosh to take a look.  If this isn't an issue please assign it back to me.

> Some nested order by queries fail in logical to physical translator
> -------------------------------------------------------------------
>
>                 Key: PIG-310
>                 URL: https://issues.apache.org/jira/browse/PIG-310
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>             Fix For: types_branch
>
>
> The query:
> {code}
> a = load 'myfile';                                                                                            
> b = group a by $0;                                                                                                                                              
> c = foreach b {c1 = order $1 by $1; generate flatten(c1); };                                                                                                    
> store c into 'outfile'
> {code}
> dies with the error message:
> java.io.IOException: Unable to store for alias: c [null]
>     at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:232)
>     at org.apache.pig.PigServer.compilePp(PigServer.java:556)
>     at org.apache.pig.PigServer.execute(PigServer.java:482)
>     at org.apache.pig.PigServer.store(PigServer.java:324)
>     at org.apache.pig.PigServer.store(PigServer.java:310)
>     at org.apache.pig.tools.grunt.GruntParser.processStore(GruntParser.java:173)
>     at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:317)
>     at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:77)
>     at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:58)
>     at org.apache.pig.Main.main(Main.java:311)
> Caused by: org.apache.pig.backend.executionengine.ExecException
>     ... 10 more
> Caused by: org.apache.pig.impl.plan.VisitorException
>     at org.apache.pig.impl.logicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:726)
>     at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:141)
>     at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:35)
>     at org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
>     at org.apache.pig.impl.logicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:651)
>     at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:87)
>     at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:36)
>     at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
>     at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>     at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:229)
>     ... 9 more
> Caused by: org.apache.pig.impl.plan.PlanException: Attempt to connect operator Project[tuple][1] - gates-Sat Jul 12 17:57:09 PDT 2008-16 which is not in the pla
>     at org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:254)
>     at org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:140)
>     at org.apache.pig.impl.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:77)
>     at org.apache.pig.impl.logicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:723)
>     ... 18 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.