You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2009/06/04 23:41:07 UTC

[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-834:
------------------------------

    Description: 
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.

{code}
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
    |       |
    |       |---Project[bag][2] - 1-123
    |           |
    |           |---Project[bag][1] - 1-124
    |   |
    |   Project[bytearray][0] - 1-133
    |
    |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
        |
        |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
    |   |
    |   |---Project[bag][0] - 1-135
    |   |
    |   Project[bytearray][1] - 1-134
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-137--------
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
    |   |
    |   |---Project[bag][0] - 1-136
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-145--------
Global sort: false
{code}

  was:
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.


# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
    |       |
    |       |---Project[bag][2] - 1-123
    |           |
    |           |---Project[bag][1] - 1-124
    |   |
    |   Project[bytearray][0] - 1-133
    |
    |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
        |
        |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
    |   |
    |   |---Project[bag][0] - 1-135
    |   |
    |   Project[bytearray][1] - 1-134
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-137--------
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
    |   |
    |   |---Project[bag][0] - 1-136
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-145--------
Global sort: false


> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Priority: Critical
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.