You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2012/06/21 20:29:16 UTC

Possible bug in replicated join?

Am posting before making a ticket just to make sure I'm not doing something
stupid or missing something obvious.


$ cat data

1

2

3

4

5


a = load 'data' as (x:int);

b = foreach a generate TOTUPLE(x);


c = load 'data' as (x:int);

d = foreach c generate TOTUPLE(x);


e = join b by $0, d by $0;

dump e;


((1),(1))

((2),(2))

((3),(3))

((4),(4))

((5),(5))

ok....
but
f = join b by $0, d by $0 using 'replicated';

dump f;


(1,1)

(2,2)

(3,3)

(4,4)

(5,5)

!!!!

Re: Possible bug in replicated join?

Posted by Thejas Nair <th...@hortonworks.com>.
That certainly looks like a bug. The replicated join should not flatten 
the tuple.
I didn't actually know that pig supported doing joins on tuples (i guess 
it does not allow that on maps and bags).

-Thejas


On 6/21/12 11:29 AM, Jonathan Coveney wrote:
> Am posting before making a ticket just to make sure I'm not doing something
> stupid or missing something obvious.
>
>
> $ cat data
>
> 1
>
> 2
>
> 3
>
> 4
>
> 5
>
>
> a = load 'data' as (x:int);
>
> b = foreach a generate TOTUPLE(x);
>
>
> c = load 'data' as (x:int);
>
> d = foreach c generate TOTUPLE(x);
>
>
> e = join b by $0, d by $0;
>
> dump e;
>
>
> ((1),(1))
>
> ((2),(2))
>
> ((3),(3))
>
> ((4),(4))
>
> ((5),(5))
>
> ok....
> but
> f = join b by $0, d by $0 using 'replicated';
>
> dump f;
>
>
> (1,1)
>
> (2,2)
>
> (3,3)
>
> (4,4)
>
> (5,5)
>
> !!!!
>