You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2009/06/04 23:41:07 UTC

[jira] Created: (PIG-834) incorrect plan when algebraic functions are nested

incorrect plan when algebraic functions are nested
--------------------------------------------------

                 Key: PIG-834
                 URL: https://issues.apache.org/jira/browse/PIG-834
             Project: Pig
          Issue Type: Bug
          Components: impl
            Reporter: Thejas M Nair
            Priority: Critical


a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.


# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
    |       |
    |       |---Project[bag][2] - 1-123
    |           |
    |           |---Project[bag][1] - 1-124
    |   |
    |   Project[bytearray][0] - 1-133
    |
    |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
        |
        |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
    |   |
    |   |---Project[bag][0] - 1-135
    |   |
    |   Project[bytearray][1] - 1-134
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-137--------
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
    |   |
    |   |---Project[bag][0] - 1-136
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-145--------
Global sort: false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832358#action_12832358 ] 

Hadoop QA commented on PIG-834:
-------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435493/pig-834_3.patch
  against trunk revision 908324.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/208/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/208/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/208/console

This message is automatically generated.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Attachment: pig-834_2.patch

Correct approach is following: If leaf of inner plan of ForEach is not combinable then we dont put combiner in any case. If it is, there should not be any other combinable POUserFunc in the ForEach's inner plan. First check already exists in trunk. This patch checks for this second conditon and makes sure not to fire combiner if there is any other combinable POUserFunc in the ForEach inner plan apart from leaf POUserFunc.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Status: Open  (was: Patch Available)

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Status: Patch Available  (was: Open)

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch checked-in.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-834:
-------------------------------

    Priority: Major  (was: Critical)

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>             Fix For: 0.7.0
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Status: Patch Available  (was: Open)

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831535#action_12831535 ] 

Ashutosh Chauhan commented on PIG-834:
--------------------------------------

Another hudson quirk : ( Failed test passes successfully on local machine. Patch is ready for review.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai closed PIG-834.
--------------------------


> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Status: Open  (was: Patch Available)

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831421#action_12831421 ] 

Hadoop QA commented on PIG-834:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435027/pig-834_2.patch
  against trunk revision 907760.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/195/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/195/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/195/console

This message is automatically generated.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Attachment: pig-834.patch

In this patch, I look for a pattern of POUserFunc followed by another POUserFunc in the inner plan of ForEach and if thats found I flag the combiner optimizer to not fire. This disables the combiner for this particular query (test case included). Wondering if this fix is sufficient for this bug ?

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Status: Patch Available  (was: Open)

Trying to get hudson going on this.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832613#action_12832613 ] 

Richard Ding commented on PIG-834:
----------------------------------

+1 for commit.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-834:
------------------------------

    Fix Version/s: 0.7.0

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Priority: Critical
>             Fix For: 0.7.0
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805708#action_12805708 ] 

Olga Natkovich commented on PIG-834:
------------------------------------

The short term solution will be to catch this case and not enable combiner

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Priority: Critical
>             Fix For: 0.7.0
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-834:
----------------------------------

    Assignee: Ashutosh Chauhan

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-834:
------------------------------

    Description: 
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.

{code}
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
    |       |
    |       |---Project[bag][2] - 1-123
    |           |
    |           |---Project[bag][1] - 1-124
    |   |
    |   Project[bytearray][0] - 1-133
    |
    |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
        |
        |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
    |   |
    |   |---Project[bag][0] - 1-135
    |   |
    |   Project[bytearray][1] - 1-134
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-137--------
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
    |   |
    |   |---Project[bag][0] - 1-136
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-145--------
Global sort: false
{code}

  was:
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.


# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
    |       |
    |       |---Project[bag][2] - 1-123
    |           |
    |           |---Project[bag][1] - 1-124
    |   |
    |   Project[bytearray][0] - 1-133
    |
    |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
        |
        |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
    |   |
    |   |---Project[bag][0] - 1-135
    |   |
    |   Project[bytearray][1] - 1-134
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-137--------
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
    |   |
    |   |---Project[bag][0] - 1-136
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-145--------
Global sort: false


> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Priority: Critical
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Attachment: pig-834_3.patch

Instead of having recursive function walking on plan, better to have a visitor doing that. So, this patch replaces that function with a visitor.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.