You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sungho Ryu <ry...@gmail.com> on 2011/08/22 09:47:07 UTC

Flattening after limit

I'm trying to select some group of tuples using LIMIT and FLATTEN,
but the result is different from what I expected.

I wonder whether it is an intended behavior or a bug.

--------------------
Example (Selecting 2 groups based on value of 'k') :

data = LOAD 'data' AS (k, v);
DUMP data;

(1, A)
(1, B)
(2, C)
(3, D)
(3, E)
(3, F)

grouped = GROUP data BY k;
selected = LIMIT grouped 2;
flattened = FOREACH selected GENERATE FLATTEN (data);

DUMP flattened;

(1, A)
(1, B)

What I expected was 2 groups - e.g :
(1, A)
(1, B)
(2, C)


EXPLAIN showed that the LIMIT 2 was also being applied to 'flattened', not
only to 'grouped'.

Is this an intended behavior ?  If so, what is the correct way to do to get
the desired result ?

@ I tried on PIG 0.8.0 & 0.8.1, with & without -t All or -t LimitOptimizer.
The results were all the same.

Re: Flattening after limit

Posted by Daniel Dai <da...@hortonworks.com>.
It is a bug. I opened https://issues.apache.org/jira/browse/PIG-2231 for it.

Thanks,
Daniel

On Mon, Aug 22, 2011 at 12:47 AM, Sungho Ryu <ry...@gmail.com> wrote:
> I'm trying to select some group of tuples using LIMIT and FLATTEN,
> but the result is different from what I expected.
>
> I wonder whether it is an intended behavior or a bug.
>
> --------------------
> Example (Selecting 2 groups based on value of 'k') :
>
> data = LOAD 'data' AS (k, v);
> DUMP data;
>
> (1, A)
> (1, B)
> (2, C)
> (3, D)
> (3, E)
> (3, F)
>
> grouped = GROUP data BY k;
> selected = LIMIT grouped 2;
> flattened = FOREACH selected GENERATE FLATTEN (data);
>
> DUMP flattened;
>
> (1, A)
> (1, B)
>
> What I expected was 2 groups - e.g :
> (1, A)
> (1, B)
> (2, C)
>
>
> EXPLAIN showed that the LIMIT 2 was also being applied to 'flattened', not
> only to 'grouped'.
>
> Is this an intended behavior ?  If so, what is the correct way to do to get
> the desired result ?
>
> @ I tried on PIG 0.8.0 & 0.8.1, with & without -t All or -t LimitOptimizer.
> The results were all the same.
>