You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Arnab Guin (JIRA)" <ji...@apache.org> on 2012/12/21 00:37:13 UTC

[jira] [Commented] (PIG-3060) FLATTEN in nested foreach fails when the input contains an empty bag

    [ https://issues.apache.org/jira/browse/PIG-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537492#comment-13537492 ] 

Arnab Guin commented on PIG-3060:
---------------------------------



Not sure the issue has been fixed on the trunk between the filed date and current date. I tried out the example on the latest trunk. 

Here is my input file: (with empty bag)

flatten.txt:

2 {}
3 {}
4 {}

I essentially used the attached program with some minor modifications (like adding dump, load, store etc.). The number of bags is 0 as expected.

flatten.pig:

A = load './flatten.txt' using PigStorage(' ') as (a0:int, a1:bag{(t:chararray)});
B = group A by $0;
dump B;
C = foreach B {
    c1 = foreach A generate FLATTEN(a1);
    generate COUNT(c1);
};
dump B;
dump C;


shell> pig -x local -f flatten.pig 
(2,{(2,{})})
(3,{(3,{})})
(4,{(4,{})})
(0)
(0)
(0)

With another example where the bag is non-empty:

flatten.txt:
2 {(a),(b),(c)}
3 {(x),(y),(z)}

shell> pig -x local -f flatten.pig 
(2,{(2,{(a),(b),(c)})})
(3,{(3,{(x),(y),(z)})})
(3)
(3)

Did I get something wrong?

-Arnab
                
> FLATTEN in nested foreach fails when the input contains an empty bag
> --------------------------------------------------------------------
>
>                 Key: PIG-3060
>                 URL: https://issues.apache.org/jira/browse/PIG-3060
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Youngwook Kim
>
> FLATTEN inside a foreach statement produces wrong results, if the input contains an empty bag.
> {code}
> A = load 'flatten.txt' as (a0:int, a1:bag{(t:chararray)});
> B = group A by a0;
> C = foreach B {
>   c1 = foreach A generate FLATTEN(a1);
>   generate COUNT(c1);
> };
> {code}
> The easy workaround is to filter out empty bags.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira