You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Marko Musnjak <ma...@gmail.com> on 2011/04/07 20:19:56 UTC

Join issue

Hi,

I'm trying to do a left outer join of two files, on eight keys, but it
always seems that the keys don't match. I'm able to reproduce this with the
code and files below.
I expect to see lines like these:
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;...;10;31;;;0;3
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;...;0;-1;;;1;0

But all I get are values from j2, with no matches from "failed".
Is this a bug, or just a too-long day? I'm using CDH 3b3, pig version Apache
Pig version 0.7.0+16

Thanks,
Marko

failed = LOAD 'hdfs:///user/hadoop/out/failed.csv' USING PigStorage(';') AS
    (y:int, m:int, d:int, n:int, i:chararray, t:chararray,
    l2:int,l3:int,r1:int,r2:int,r3:int,r4:int,dclr:int, bclr:int);


j2 = LOAD 'hdfs:///user/hadoop/out/j2.csv' USING PigStorage(';') AS
    (y:int, m:int, d:int, n:int, i:chararray, t:chararray,
    total:int, dc:int, bc:int,l2:int, l3:int,
    total_l: int, dcl:int, bcl:int);

j3 = JOIN j2 BY (y,m,d,n,i,t,l2,l3) LEFT OUTER,
    failed BY (y,m,d,n,i,t,l2,l3);

j2:
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2213;;5;1;3
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2963;;37;0;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;6861;;1;0;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;241;;3;0;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;301;;6;2;2
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2652;;5;0;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2212;;1;0;0

failed:
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;10;31;;;0;2
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;10;31;;;0;3
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;0;-1;;;1;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;9;21;;;1;0
2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;0;-1;;;1;0

Re: Join issue

Posted by Marko Musnjak <ma...@gmail.com>.
As I suspected, the problem was between keyboard and chair :)

Thanks,
Marko

On Fri, Apr 8, 2011 at 00:43, Daniel Dai <ji...@yahoo-inc.com> wrote:

> null column from different relation does not redeemed as equal in join.
> This is consistent with SQL.
>
> Daniel
>
>
> On 04/07/2011 11:19 AM, Marko Musnjak wrote:
>
>> Hi,
>>
>> I'm trying to do a left outer join of two files, on eight keys, but it
>> always seems that the keys don't match. I'm able to reproduce this with
>> the
>> code and files below.
>> I expect to see lines like these:
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;...;10;31;;;0;3
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;...;0;-1;;;1;0
>>
>> But all I get are values from j2, with no matches from "failed".
>> Is this a bug, or just a too-long day? I'm using CDH 3b3, pig version
>> Apache
>> Pig version 0.7.0+16
>>
>> Thanks,
>> Marko
>>
>> failed = LOAD 'hdfs:///user/hadoop/out/failed.csv' USING PigStorage(';')
>> AS
>>     (y:int, m:int, d:int, n:int, i:chararray, t:chararray,
>>     l2:int,l3:int,r1:int,r2:int,r3:int,r4:int,dclr:int, bclr:int);
>>
>>
>> j2 = LOAD 'hdfs:///user/hadoop/out/j2.csv' USING PigStorage(';') AS
>>     (y:int, m:int, d:int, n:int, i:chararray, t:chararray,
>>     total:int, dc:int, bc:int,l2:int, l3:int,
>>     total_l: int, dcl:int, bcl:int);
>>
>> j3 = JOIN j2 BY (y,m,d,n,i,t,l2,l3) LEFT OUTER,
>>     failed BY (y,m,d,n,i,t,l2,l3);
>>
>> j2:
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2213;;5;1;3
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2963;;37;0;0
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;6861;;1;0;0
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;241;;3;0;0
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;301;;6;2;2
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2652;;5;0;0
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2212;;1;0;0
>>
>> failed:
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;10;31;;;0;2
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;10;31;;;0;3
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;0;-1;;;1;0
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;9;21;;;1;0
>> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;0;-1;;;1;0
>>
>
>

Re: Join issue

Posted by Daniel Dai <ji...@yahoo-inc.com>.
null column from different relation does not redeemed as equal in join. 
This is consistent with SQL.

Daniel

On 04/07/2011 11:19 AM, Marko Musnjak wrote:
> Hi,
>
> I'm trying to do a left outer join of two files, on eight keys, but it
> always seems that the keys don't match. I'm able to reproduce this with the
> code and files below.
> I expect to see lines like these:
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;...;10;31;;;0;3
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;...;0;-1;;;1;0
>
> But all I get are values from j2, with no matches from "failed".
> Is this a bug, or just a too-long day? I'm using CDH 3b3, pig version Apache
> Pig version 0.7.0+16
>
> Thanks,
> Marko
>
> failed = LOAD 'hdfs:///user/hadoop/out/failed.csv' USING PigStorage(';') AS
>      (y:int, m:int, d:int, n:int, i:chararray, t:chararray,
>      l2:int,l3:int,r1:int,r2:int,r3:int,r4:int,dclr:int, bclr:int);
>
>
> j2 = LOAD 'hdfs:///user/hadoop/out/j2.csv' USING PigStorage(';') AS
>      (y:int, m:int, d:int, n:int, i:chararray, t:chararray,
>      total:int, dc:int, bc:int,l2:int, l3:int,
>      total_l: int, dcl:int, bcl:int);
>
> j3 = JOIN j2 BY (y,m,d,n,i,t,l2,l3) LEFT OUTER,
>      failed BY (y,m,d,n,i,t,l2,l3);
>
> j2:
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2213;;5;1;3
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2963;;37;0;0
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;6861;;1;0;0
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;241;;3;0;0
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;301;;6;2;2
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2652;;5;0;0
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;58;3;5;2212;;1;0;0
>
> failed:
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;10;31;;;0;2
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;10;31;;;0;3
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;0;-1;;;1;0
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;301;;9;21;;;1;0
> 2011;3;31;2;AAAAAAAAAAAAAAA;TTTTTTTT;2213;;0;-1;;;1;0