You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2009/06/22 18:40:07 UTC

[jira] Created: (PIG-859) Optimizer throw error on self-joins

Optimizer throw error on self-joins
-----------------------------------

                 Key: PIG-859
                 URL: https://issues.apache.org/jira/browse/PIG-859
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.3.0
            Reporter: Ashutosh Chauhan
             Fix For: 0.4.0


Doing self-join results in exception thrown by Optimizer. Consider the following query
{code}
grunt> A = load 'a';
grunt> B = Join A by $0, A by $0;
grunt> explain B;

2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1094: Attempt to insert between two nodes that were not connected.
Details at logfile: pig_1245538027026.log
{code}

Relevant stack-trace from log-file:
{code}

Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
2047: Internal error. Unable to introduce split operators.
        at
org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
        at
org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
        at org.apache.pig.PigServer.compileLp(PigServer.java:844)
        at org.apache.pig.PigServer.compileLp(PigServer.java:781)
        at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
        at org.apache.pig.PigServer.explain(PigServer.java:566)
        ... 8 more
Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
to insert between two nodes that were not connected.
        at
org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
        at
org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
        at
org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
        ... 13 more
{code}


A possible workaround is:
{code}

grunt> A = load 'a';
grunt> B = load 'a';
grunt> C = join A by $0, B by $0;
grunt> explain C;
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-859) Optimizer throw error on self-joins

Posted by "Philip (flip) Kromer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885011#action_12885011 ] 

Philip (flip) Kromer commented on PIG-859:
------------------------------------------

There are many use cases for a self join -- graph explorations especially. Would it work to say something like

{code:sql|title=bfs.pig}
-- Enumerate paths of length two
Edges = LOAD 'a' AS (src, dest);
E2Paths = Join Edges AS InLinks BY dest, Edges AS Outlinks BY src;
{code}



> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-859) Optimizer throw error on self-joins

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930192#action_12930192 ] 

Alan Gates commented on PIG-859:
--------------------------------

We should also be considering this from a performance angle.  Since we can detect that this is a self join we should only be reading the table once, not twice.  This should be true whether it is done via A1 = A or a second load.

> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Corinne Chandel
>             Fix For: 0.9.0
>
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-859) Optimizer throw error on self-joins

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917784#action_12917784 ] 

hc busy commented on PIG-859:
-----------------------------

Another work-around:

{code}
grunt> A = load 'a';
grunt> B = group A all;
grunt> C = foreach B generate FLATTEN(B.($0,$3)) as (key1, value1), FLATTEN(B.($0,$3)) as (key2,value2);
grunt> D = filter C by key1==key2;
grunt> E = foreach D generate key1 as key, value1 as left, value2 as right;
{code}

> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.9.0
>
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-859) Optimizer throw error on self-joins

Posted by "David Ciemiewicz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929883#action_12929883 ] 

David Ciemiewicz commented on PIG-859:
--------------------------------------

An alternative workaround is:

{code}
A = load ...;
A1 = foreach A generate *;
AA1 = join A by ..., A1 by ...;
{code}

If Pig supported re-aliasing, then we could do:

{code}
A = load ...;
A1 = A;
AA1 = join A by ..., A1 by ...;
{code}

> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.9.0
>
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-859) Optimizer throw error on self-joins

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-859.
--------------------------------

    Resolution: Won't Fix

The right thing already happenning since the self join as-is would produce two columns with the same name. Second load is needed for self-join to work

> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.9.0
>
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-859) Optimizer throw error on self-joins

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-859:
-------------------------------

    Fix Version/s: 0.9.0

> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.9.0
>
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-859) Optimizer throw error on self-joins

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929901#action_12929901 ] 

Olga Natkovich commented on PIG-859:
------------------------------------

Viraj, as I pointed out, we can't support this case because it is ambiguous. We are planning to support the re-aliasing as suggested by Ciemo to avoid name conflict. I will re-assign this to Corinne for documentation purpose

> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.9.0
>
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-859) Optimizer throw error on self-joins

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-859:
----------------------------------

    Assignee: Corinne Chandel

> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Corinne Chandel
>             Fix For: 0.9.0
>
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (PIG-859) Optimizer throw error on self-joins

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat reopened PIG-859:
----------------------------


Hi Olga,
 According to the use case of dfs.pig, we need to support this syntax. It would help the user to avoid having to write 2 load statements, which is non-intuitive.
 If you believe that this is not required we need to document this behavior that the self-join requires 2 load statements.
Regards
Viraj

> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.9.0
>
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-859) Optimizer throw error on self-joins

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771223#action_12771223 ] 

Daniel Dai commented on PIG-859:
--------------------------------

I was wondering whether Pig Latin should allow self-join. In the script:

A = load 'a' as (a0, a1);
B = Join A by $0, A by $0;

The output schema for B is (A.a0, A.a1, A.a0, A.a1). It is doom to cause a schema alias conflict. 

> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-859) Optimizer throw error on self-joins

Posted by "Jing Huang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771220#action_12771220 ] 

Jing Huang commented on PIG-859:
--------------------------------

if we do 
joina = join rec1 by (a), rec1 by (a) using "merge" ;

New error message is thrown by parser, 

Error message is :
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: Duplicate schema alias: rec1::a in "joina"
        at org.apache.pig.impl.logicalLayer.validators.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:69)
        at org.apache.pig.impl.logicalLayer.validators.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:115)
        at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:203)
        at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
        at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
        ... 13 more
================================================================================


> Optimizer throw error on self-joins
> -----------------------------------
>
>                 Key: PIG-859
>                 URL: https://issues.apache.org/jira/browse/PIG-859
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>
> Doing self-join results in exception thrown by Optimizer. Consider the following query
> {code}
> grunt> A = load 'a';
> grunt> B = Join A by $0, A by $0;
> grunt> explain B;
> 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1094: Attempt to insert between two nodes that were not connected.
> Details at logfile: pig_1245538027026.log
> {code}
> Relevant stack-trace from log-file:
> {code}
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
> 2047: Internal error. Unable to introduce split operators.
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:844)
>         at org.apache.pig.PigServer.compileLp(PigServer.java:781)
>         at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
>         at org.apache.pig.PigServer.explain(PigServer.java:566)
>         ... 8 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
> to insert between two nodes that were not connected.
>         at
> org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
>         at
> org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
>         at
> org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
>         ... 13 more
> {code}
> A possible workaround is:
> {code}
> grunt> A = load 'a';
> grunt> B = load 'a';
> grunt> C = join A by $0, B by $0;
> grunt> explain C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.