You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org> on 2011/06/12 14:18:51 UTC

[jira] [Created] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
---------------------------------------------------------------------------------------------

                 Key: PIG-2119
                 URL: https://issues.apache.org/jira/browse/PIG-2119
             Project: Pig
          Issue Type: Bug
            Reporter: Gianmarco De Francisci Morales


The input:
{code}
grunt> cat b.txt
a       11
b       3
c       10
a       12
b       10
c       15
{code}

The script:
{code}
a = load 'b.txt' AS (id:chararray, num:int);
b = group a by id;
c = foreach b { 
  d = order a by num DESC;
  n = COUNT(a);
  e = limit d 1;
  generate n;
}
{code}

The exception:
{code}
Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
        at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
        at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)

{code}

I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
If I remove the limit in any case I get the same exception but with LOSort.

The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048748#comment-13048748 ] 

Daniel Dai commented on PIG-2119:
---------------------------------

LOGenerate is the only sink in nested plan. This is one of the basic assumption we made throughout logical plan. To solve the issue above, we need to prune the dangling branch before proceed.

> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2119:
----------------------------

    Fix Version/s: 0.9.2
    
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Daniel Dai
>             Fix For: 0.9.2, 0.10, 0.11
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Vivek Padmanabhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118016#comment-13118016 ] 

Vivek Padmanabhan commented on PIG-2119:
----------------------------------------

Faced this issue with the below script;
{code}
A = load '3char_1long_tab' as (f1:chararray, f2:chararray, f3:chararray,ct:long);
B = GROUP A  BY f1;
C =    FOREACH B {
        zip_ordered = ORDER A BY f3 ASC; 
        GENERATE
                FLATTEN(group) AS f1,	
                A.(f3, ct),
		--COUNT(zip_ordered),
                SUM(A.ct) AS total;
  };

dump C;
{code}

The zip_ordered is an accident and not used, but Pig 0.8 silently ignores this while Pig 0.9 throws exception.
I believe the affect version should be 0.9

                
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2119:
----------------------------

    Attachment:     (was: PIG-2119-1.patch)
    
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Daniel Dai
>             Fix For: 0.10
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2119:
----------------------------

    Attachment: PIG-2119-1.patch
    
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Daniel Dai
>             Fix For: 0.10
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2119:
----------------------------

    Attachment: PIG-2119-1.patch
    
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Daniel Dai
>             Fix For: 0.10
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2119:
----------------------------

    Attachment:     (was: PIG-2119-1.patch)
    
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Daniel Dai
>             Fix For: 0.10
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Thejas M Nair (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-2119:
-------------------------------

    Affects Version/s: 0.9.1
                       0.9.0

+1
                
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Daniel Dai
>             Fix For: 0.10
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai reassigned PIG-2119:
-------------------------------

    Assignee: Daniel Dai
    
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Daniel Dai
>             Fix For: 0.10
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186010#comment-13186010 ] 

Daniel Dai commented on PIG-2119:
---------------------------------

Patch committed to 0.9 branch as per Dmitriy's request (PIG-2474)
                
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Daniel Dai
>             Fix For: 0.9.2, 0.10, 0.11
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2119:
----------------------------

    Attachment: PIG-2119-1.patch
    
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>             Fix For: 0.10
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Daniel Dai (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai resolved PIG-2119.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.11
     Hadoop Flags: Reviewed

Unit tests pass. Test-patch:
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     -1 release audit.  The applied patch generated 458 release audit warnings (more than the trunk's current 447 warnings).

All new files has proper header.

Patch committed to both trunk and 0.10 branch.
                
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Daniel Dai
>             Fix For: 0.10, 0.11
>
>         Attachments: PIG-2119-1.patch
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2119) DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

Posted by "Olga Natkovich (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-2119:
--------------------------------

    Fix Version/s: 0.10
    
> DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-2119
>                 URL: https://issues.apache.org/jira/browse/PIG-2119
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>             Fix For: 0.10
>
>
> The input:
> {code}
> grunt> cat b.txt
> a       11
> b       3
> c       10
> a       12
> b       10
> c       15
> {code}
> The script:
> {code}
> a = load 'b.txt' AS (id:chararray, num:int);
> b = group a by id;
> c = foreach b { 
>   d = order a by num DESC;
>   n = COUNT(a);
>   e = limit d 1;
>   generate n;
> }
> {code}
> The exception:
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
>         at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
>         at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
> {code}
> I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
> If I remove the limit in any case I get the same exception but with LOSort.
> The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira