You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2008/07/10 21:20:31 UTC

[jira] Created: (PIG-302) Cross products of two flattens is incorrectly including records with null values

Cross products of two flattens is incorrectly including records with null values
--------------------------------------------------------------------------------

                 Key: PIG-302
                 URL: https://issues.apache.org/jira/browse/PIG-302
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
            Reporter: Alan Gates
            Assignee: Alan Gates


Given two data sets:

studenttab10
alex garcia,39,3.81
bob jones,40,2.77
zach johnson,23,4.00
tony mendleson,87,2.10
todd wellington,55,3.32
melany smith,19,3.98
jane wesley,62,1.98
irene chan,34,3.14
laverne shirley,58,2.43
marcia tently,32,3.48

and

alex garcia,39,republican,1.50
bob jones,40,democrat,1000.30
zach johnson,23,independent,0.00
tony mendleson,87,socialist,101012.92
todd wellington,55,green,99.89
melany smith,29,republican,88787.29
john wesley,62,democrat,0.89
bob smith,18,independent,0.99
johnny appleseed,234,green,99.95
barak obama,47,democrat,3.48

and the script:

a = load '/Users/gates/test/data/studenttab10' using PigStorage(',') as (name, age, gpa);
b = load '/Users/gates/test/data/votertab10' using PigStorage(',') as (name, age, registration, contributions);
c = filter a by age < 40;
d = filter b by age < 40;
e = cogroup c by name, d by name;
f = foreach e generate flatten (c), flatten(d);
dump f;

The result is:

(NULL, bob smith, 18, independent, 0.99)
(alex garcia, 39, 3.81, alex garcia, 39, republican, 1.50)
(melany smith, 19, 3.98, melany smith, 29, republican, 88787.29)
(zach johnson, 23, 4.00, zach johnson, 23, independent, 0.00)

The first record should not be there.  Flatten is supposed to remove records without a match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-302) Cross products of two flattens is incorrectly including records with null values

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-302:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

cogroup patch checked in.

> Cross products of two flattens is incorrectly including records with null values
> --------------------------------------------------------------------------------
>
>                 Key: PIG-302
>                 URL: https://issues.apache.org/jira/browse/PIG-302
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>         Attachments: cogroup.patch
>
>
> Given two data sets:
> studenttab10
> alex garcia,39,3.81
> bob jones,40,2.77
> zach johnson,23,4.00
> tony mendleson,87,2.10
> todd wellington,55,3.32
> melany smith,19,3.98
> jane wesley,62,1.98
> irene chan,34,3.14
> laverne shirley,58,2.43
> marcia tently,32,3.48
> and
> alex garcia,39,republican,1.50
> bob jones,40,democrat,1000.30
> zach johnson,23,independent,0.00
> tony mendleson,87,socialist,101012.92
> todd wellington,55,green,99.89
> melany smith,29,republican,88787.29
> john wesley,62,democrat,0.89
> bob smith,18,independent,0.99
> johnny appleseed,234,green,99.95
> barak obama,47,democrat,3.48
> and the script:
> a = load '/Users/gates/test/data/studenttab10' using PigStorage(',') as (name, age, gpa);
> b = load '/Users/gates/test/data/votertab10' using PigStorage(',') as (name, age, registration, contributions);
> c = filter a by age < 40;
> d = filter b by age < 40;
> e = cogroup c by name, d by name;
> f = foreach e generate flatten (c), flatten(d);
> dump f;
> The result is:
> (NULL, bob smith, 18, independent, 0.99)
> (alex garcia, 39, 3.81, alex garcia, 39, republican, 1.50)
> (melany smith, 19, 3.98, melany smith, 29, republican, 88787.29)
> (zach johnson, 23, 4.00, zach johnson, 23, independent, 0.00)
> The first record should not be there.  Flatten is supposed to remove records without a match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-302) Cross products of two flattens is incorrectly including records with null values

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-302:
---------------------------

    Attachment: cogroup.patch

> Cross products of two flattens is incorrectly including records with null values
> --------------------------------------------------------------------------------
>
>                 Key: PIG-302
>                 URL: https://issues.apache.org/jira/browse/PIG-302
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>         Attachments: cogroup.patch
>
>
> Given two data sets:
> studenttab10
> alex garcia,39,3.81
> bob jones,40,2.77
> zach johnson,23,4.00
> tony mendleson,87,2.10
> todd wellington,55,3.32
> melany smith,19,3.98
> jane wesley,62,1.98
> irene chan,34,3.14
> laverne shirley,58,2.43
> marcia tently,32,3.48
> and
> alex garcia,39,republican,1.50
> bob jones,40,democrat,1000.30
> zach johnson,23,independent,0.00
> tony mendleson,87,socialist,101012.92
> todd wellington,55,green,99.89
> melany smith,29,republican,88787.29
> john wesley,62,democrat,0.89
> bob smith,18,independent,0.99
> johnny appleseed,234,green,99.95
> barak obama,47,democrat,3.48
> and the script:
> a = load '/Users/gates/test/data/studenttab10' using PigStorage(',') as (name, age, gpa);
> b = load '/Users/gates/test/data/votertab10' using PigStorage(',') as (name, age, registration, contributions);
> c = filter a by age < 40;
> d = filter b by age < 40;
> e = cogroup c by name, d by name;
> f = foreach e generate flatten (c), flatten(d);
> dump f;
> The result is:
> (NULL, bob smith, 18, independent, 0.99)
> (alex garcia, 39, 3.81, alex garcia, 39, republican, 1.50)
> (melany smith, 19, 3.98, melany smith, 29, republican, 88787.29)
> (zach johnson, 23, 4.00, zach johnson, 23, independent, 0.00)
> The first record should not be there.  Flatten is supposed to remove records without a match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-302) Cross products of two flattens is incorrectly including records with null values

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-302:
---------------------------

    Status: Patch Available  (was: Open)

Flatten logic was not properly resetting a couple of variables, which resulted in extraneous rows being included in the results.

> Cross products of two flattens is incorrectly including records with null values
> --------------------------------------------------------------------------------
>
>                 Key: PIG-302
>                 URL: https://issues.apache.org/jira/browse/PIG-302
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>         Attachments: cogroup.patch
>
>
> Given two data sets:
> studenttab10
> alex garcia,39,3.81
> bob jones,40,2.77
> zach johnson,23,4.00
> tony mendleson,87,2.10
> todd wellington,55,3.32
> melany smith,19,3.98
> jane wesley,62,1.98
> irene chan,34,3.14
> laverne shirley,58,2.43
> marcia tently,32,3.48
> and
> alex garcia,39,republican,1.50
> bob jones,40,democrat,1000.30
> zach johnson,23,independent,0.00
> tony mendleson,87,socialist,101012.92
> todd wellington,55,green,99.89
> melany smith,29,republican,88787.29
> john wesley,62,democrat,0.89
> bob smith,18,independent,0.99
> johnny appleseed,234,green,99.95
> barak obama,47,democrat,3.48
> and the script:
> a = load '/Users/gates/test/data/studenttab10' using PigStorage(',') as (name, age, gpa);
> b = load '/Users/gates/test/data/votertab10' using PigStorage(',') as (name, age, registration, contributions);
> c = filter a by age < 40;
> d = filter b by age < 40;
> e = cogroup c by name, d by name;
> f = foreach e generate flatten (c), flatten(d);
> dump f;
> The result is:
> (NULL, bob smith, 18, independent, 0.99)
> (alex garcia, 39, 3.81, alex garcia, 39, republican, 1.50)
> (melany smith, 19, 3.98, melany smith, 29, republican, 88787.29)
> (zach johnson, 23, 4.00, zach johnson, 23, independent, 0.00)
> The first record should not be there.  Flatten is supposed to remove records without a match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-302) Cross products of two flattens is incorrectly including records with null values

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-302:
-------------------------------

    Fix Version/s: types_branch

> Cross products of two flattens is incorrectly including records with null values
> --------------------------------------------------------------------------------
>
>                 Key: PIG-302
>                 URL: https://issues.apache.org/jira/browse/PIG-302
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>
> Given two data sets:
> studenttab10
> alex garcia,39,3.81
> bob jones,40,2.77
> zach johnson,23,4.00
> tony mendleson,87,2.10
> todd wellington,55,3.32
> melany smith,19,3.98
> jane wesley,62,1.98
> irene chan,34,3.14
> laverne shirley,58,2.43
> marcia tently,32,3.48
> and
> alex garcia,39,republican,1.50
> bob jones,40,democrat,1000.30
> zach johnson,23,independent,0.00
> tony mendleson,87,socialist,101012.92
> todd wellington,55,green,99.89
> melany smith,29,republican,88787.29
> john wesley,62,democrat,0.89
> bob smith,18,independent,0.99
> johnny appleseed,234,green,99.95
> barak obama,47,democrat,3.48
> and the script:
> a = load '/Users/gates/test/data/studenttab10' using PigStorage(',') as (name, age, gpa);
> b = load '/Users/gates/test/data/votertab10' using PigStorage(',') as (name, age, registration, contributions);
> c = filter a by age < 40;
> d = filter b by age < 40;
> e = cogroup c by name, d by name;
> f = foreach e generate flatten (c), flatten(d);
> dump f;
> The result is:
> (NULL, bob smith, 18, independent, 0.99)
> (alex garcia, 39, 3.81, alex garcia, 39, republican, 1.50)
> (melany smith, 19, 3.98, melany smith, 29, republican, 88787.29)
> (zach johnson, 23, 4.00, zach johnson, 23, independent, 0.00)
> The first record should not be there.  Flatten is supposed to remove records without a match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.