You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Alexander Schätzle <al...@yahoo.com> on 2010/04/22 11:40:41 UTC

result of OUTER JOIN

Hello,

can anybody tell me what the LEFT OUTER JOIN produces in case of non matching tuples?
I thought it would produce nulls for the right relation but a later test for IS NULL does not produce the desired solution.

Example:

left_rel has schema (s: bytearray)
right_rel has schema (s: bytearray, p: bytearray)

A = JOIN left_rel BY s LEFT OUTER, right_rel BY s;

So what is the resulting tuple in case that a value for s in 'left_rel' doesn't find a join partner in 'right_rel'?
I thought it would be: (value for s in left_rel, null, null)

But a later FILTER for null values does not produce the right result:

B = FILTER A BY right_rel : : s is null

Can anybody tell me what I got wrong?


Thx in advance,
Alex



RE: result of OUTER JOIN

Posted by Alexander Schätzle <al...@yahoo.com>.
Consider the following example:

left_rel (s: bytearray)
################
(a)
(b)
(c)
(d)
(e)

right_rel (s: bytearray, p: bytearray)
##########################
(a, 1)
(b, 2)
(e, 3)



A = JOIN left_rel BY s LEFT OUTER, right_rel BY s;
###########################################
(a, a, 1)
(b, b, 2)
(c, , )
(d, , )
(e, e, 3)

B = FILTER A BY $1 is null;

DUMP B; -> produces an ERROR!

STORE B INTO 'outfile';
###################
a
b
c
d
e

But what I expect is:
c
d

I'm very confused at the moment and I absolutely don't understand what's happening!
Does anybody knows the answer?

Thx,
Alex


Re: result of OUTER JOIN

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi Alex,

   This is a bug in pig imo where it is pushing the filter before the 
join : when it should not.


To validate, simply introduce an intermediate store/load pair to see 
right results.
There probably already is some JIRA similar to this, if yes - please do 
add to that or please do create a new one.

Someone from pig team should probably resolve this (if you are not on 
latest pig already).



Regards,
Mridul


On Thursday 22 April 2010 03:10 PM, Alexander Schätzle wrote:
> Hello,
>
> can anybody tell me what the LEFT OUTER JOIN produces in case of non matching tuples?
> I thought it would produce nulls for the right relation but a later test for IS NULL does not produce the desired solution.
>
> Example:
>
> left_rel has schema (s: bytearray)
> right_rel has schema (s: bytearray, p: bytearray)
>
> A = JOIN left_rel BY s LEFT OUTER, right_rel BY s;
>
> So what is the resulting tuple in case that a value for s in 'left_rel' doesn't find a join partner in 'right_rel'?
> I thought it would be: (value for s in left_rel, null, null)
>
> But a later FILTER for null values does not produce the right result:
>
> B = FILTER A BY right_rel : : s is null
>
> Can anybody tell me what I got wrong?
>
>
> Thx in advance,
> Alex
>
>