You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Philip (flip) Kromer (JIRA)" <ji...@apache.org> on 2014/06/11 08:27:01 UTC

[jira] [Updated] (PIG-4007) GROUP ALL on an empty table produces no output

     [ https://issues.apache.org/jira/browse/PIG-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip (flip) Kromer updated PIG-4007:
--------------------------------------

    Description: 
Using GROUP ALL on an empty table produces no output. I would expect it to produce a single row with key 'all' and an empty bag. This seems inconsistent with PIG-514.

{code}
vals = LOAD 'test/data/pigunit/top_queries_input_data.txt' AS (name:chararray, val:int); 
empty = FILTER vals BY (1 == 2); 
empty_g_1 = GROUP empty ALL; 
empty_g_2 = GROUP empty BY 1;
empty_g_1_stats = FOREACH empty_g_1 GENERATE 
  COUNT_STAR(empty); 
DUMP empty_g_1; 
DUMP empty_g_2; 
DUMP empty_g_1_stats; 
{code}

None of the previous statements produce output. My workaround is to COGROUP with a one-line table:

{code} 
one_line = FOREACH (LIMIT vals 1) GENERATE 1 AS uno; 
empty_cog = COGROUP one_line BY uno, empty BY 1; 
DUMP empty_cog;
{code}

A practical example of where it complicates things is set equality. This is best done by testing whether the symmetric difference has zero size:

{code}
vals_a = LOAD 'test/data/pigunit/top_queries_input_data.txt' AS (name:chararray, val:int); 
vals_b = LOAD 'test/data/pigunit/top_queries_input_data.txt' AS (name:chararray, val:int); 
a_xor_b = FILTER (COGROUP vals_a BY name, vals_b BY name)
  BY ((COUNT_STAR(vals_a) == 0L) OR (COUNT_STAR(vals_b) == 0L));
a_equals_b = FOREACH (GROUP a_xor_b ALL) GENERATE
  ((COUNT_STAR(a_xor_b) == 0) ? 1 : 0) AS is_equal; 
DUMP a_equals_b;
{code}


  was:
Using GROUP ALL on an empty table produces no output. I would expect it to produce a single row with key 'all' and an empty bag. This seems inconsistent with PIG-514.

{code}
vals = LOAD 'test/data/pigunit/top_queries_input_data.txt' AS (name:chararray, val:int); 
empty = FILTER vals BY (1 == 2); 
empty_g_1 = GROUP empty ALL; 
empty_g_2 = GROUP empty BY 1;
empty_g_1_stats = FOREACH empty_g_1 GENERATE 
  COUNT_STAR(empty); 
DUMP empty_g_1; 
DUMP empty_g_2; 
DUMP empty_g_1_stats; 
{code}

None of the previous statements produce output. My workaround is to COGROUP with a one-line table:

{code} 
one_line = FOREACH (LIMIT vals 1) GENERATE 1 AS uno; 
empty_cog = COGROUP one_line BY uno, empty BY 1; 
DUMP empty_cog;
{code}

A practical example of where it complicates things is set equality. This is best done by testing whether the symmetric difference has zero size:

{code}
vals_a = LOAD 'test/data/pigunit/top_queries_input_data.txt' AS (name:chararray, val:int); 
vals_b = LOAD 'test/data/pigunit/top_queries_input_data.txt' AS (name:chararray, val:int); 
a_xor_b = FILTER (COGROUP vals_a BY name, vals_b BY name)
  BY ((COUNT_STAR(vals_a) == 0L) OR (COUNT_STAR(vals_b) == 0L));
a_equals_b = FOREACH (GROUP a_xor_b ALL) GENERATE
  ((COUNT_STAR(a_xor_b) == 0) ? 1 : 0) AS is_equal; 
DUMP a_equals_b;
{code}

         Labels: cogroup empty filtered group group_all no records rows  (was: )

> GROUP ALL on an empty table produces no output
> ----------------------------------------------
>
>                 Key: PIG-4007
>                 URL: https://issues.apache.org/jira/browse/PIG-4007
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Philip (flip) Kromer
>              Labels: cogroup, empty, filtered, group, group_all, no, records, rows
>
> Using GROUP ALL on an empty table produces no output. I would expect it to produce a single row with key 'all' and an empty bag. This seems inconsistent with PIG-514.
> {code}
> vals = LOAD 'test/data/pigunit/top_queries_input_data.txt' AS (name:chararray, val:int); 
> empty = FILTER vals BY (1 == 2); 
> empty_g_1 = GROUP empty ALL; 
> empty_g_2 = GROUP empty BY 1;
> empty_g_1_stats = FOREACH empty_g_1 GENERATE 
>   COUNT_STAR(empty); 
> DUMP empty_g_1; 
> DUMP empty_g_2; 
> DUMP empty_g_1_stats; 
> {code}
> None of the previous statements produce output. My workaround is to COGROUP with a one-line table:
> {code} 
> one_line = FOREACH (LIMIT vals 1) GENERATE 1 AS uno; 
> empty_cog = COGROUP one_line BY uno, empty BY 1; 
> DUMP empty_cog;
> {code}
> A practical example of where it complicates things is set equality. This is best done by testing whether the symmetric difference has zero size:
> {code}
> vals_a = LOAD 'test/data/pigunit/top_queries_input_data.txt' AS (name:chararray, val:int); 
> vals_b = LOAD 'test/data/pigunit/top_queries_input_data.txt' AS (name:chararray, val:int); 
> a_xor_b = FILTER (COGROUP vals_a BY name, vals_b BY name)
>   BY ((COUNT_STAR(vals_a) == 0L) OR (COUNT_STAR(vals_b) == 0L));
> a_equals_b = FOREACH (GROUP a_xor_b ALL) GENERATE
>   ((COUNT_STAR(a_xor_b) == 0) ? 1 : 0) AS is_equal; 
> DUMP a_equals_b;
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)