You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Alexander Schätzle <al...@yahoo.com> on 2010/04/22 11:40:41 UTC
result of OUTER JOIN
Hello,
can anybody tell me what the LEFT OUTER JOIN produces in case of non matching tuples?
I thought it would produce nulls for the right relation but a later test for IS NULL does not produce the desired solution.
Example:
left_rel has schema (s: bytearray)
right_rel has schema (s: bytearray, p: bytearray)
A = JOIN left_rel BY s LEFT OUTER, right_rel BY s;
So what is the resulting tuple in case that a value for s in 'left_rel' doesn't find a join partner in 'right_rel'?
I thought it would be: (value for s in left_rel, null, null)
But a later FILTER for null values does not produce the right result:
B = FILTER A BY right_rel : : s is null
Can anybody tell me what I got wrong?
Thx in advance,
Alex
RE: result of OUTER JOIN
Posted by Alexander Schätzle <al...@yahoo.com>.
Consider the following example:
left_rel (s: bytearray)
################
(a)
(b)
(c)
(d)
(e)
right_rel (s: bytearray, p: bytearray)
##########################
(a, 1)
(b, 2)
(e, 3)
A = JOIN left_rel BY s LEFT OUTER, right_rel BY s;
###########################################
(a, a, 1)
(b, b, 2)
(c, , )
(d, , )
(e, e, 3)
B = FILTER A BY $1 is null;
DUMP B; -> produces an ERROR!
STORE B INTO 'outfile';
###################
a
b
c
d
e
But what I expect is:
c
d
I'm very confused at the moment and I absolutely don't understand what's happening!
Does anybody knows the answer?
Thx,
Alex
Re: result of OUTER JOIN
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi Alex,
This is a bug in pig imo where it is pushing the filter before the
join : when it should not.
To validate, simply introduce an intermediate store/load pair to see
right results.
There probably already is some JIRA similar to this, if yes - please do
add to that or please do create a new one.
Someone from pig team should probably resolve this (if you are not on
latest pig already).
Regards,
Mridul
On Thursday 22 April 2010 03:10 PM, Alexander Schätzle wrote:
> Hello,
>
> can anybody tell me what the LEFT OUTER JOIN produces in case of non matching tuples?
> I thought it would produce nulls for the right relation but a later test for IS NULL does not produce the desired solution.
>
> Example:
>
> left_rel has schema (s: bytearray)
> right_rel has schema (s: bytearray, p: bytearray)
>
> A = JOIN left_rel BY s LEFT OUTER, right_rel BY s;
>
> So what is the resulting tuple in case that a value for s in 'left_rel' doesn't find a join partner in 'right_rel'?
> I thought it would be: (value for s in left_rel, null, null)
>
> But a later FILTER for null values does not produce the right result:
>
> B = FILTER A BY right_rel : : s is null
>
> Can anybody tell me what I got wrong?
>
>
> Thx in advance,
> Alex
>
>