You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chloe Huang <ch...@ussuning.com> on 2016/04/20 03:10:24 UTC
Join does not work in my case
Hi Guys,
I am using PIG for data processing. But the join function seems not work in my case.
The PIG script is as follow:
A = LOAD './q' USING PigStorage(',') AS (ori_query: chararray, t: chararray, w: chararray);
B = LOAD './word' USING PigStorage('\t') AS (word: chararray, proID: chararray, proScore: chararray);
C = JOIN A by t, B by word;
--DUMP C;
STORE C INTO 'join_out';
First I am loading my test case 'q' into A, and then load my test case 'word' into B.
By "JOIN A by t, B by word', I am expecting an inner join of A's field 't' with B's field 'word'. In my test case, I have included many common fields in A.t and B.word.
But I got nothing in my result C. The output file is also empty.
Here is a small piece of 'q': (The document 'q' is attached)
dark shoes for lady,dark,3.234
dark shoes for lady,shoes,2.261
dark shoes for lady,for,1.223
dark shoes for lady,lady,2.345
casual male shoes,casual,3.478
casual male shoes,male,2.675
casual male shoes,shoes,4.265
casual sporty,casual,2.678
Here is a small piece of 'word' (The document 'word' is attached)
for,104365130,0.588235294118
male,104365130, 0.588235294118
35,104365130,0.588235294118
ar,104365132,0.588235294118
cow,104365132,0.652521008403
mm,104365132,0.588235294118
45109,104365135,0.588235294118
medium,104365135,0.588235294118
casual,104365135,0.588235294118
fur,104365135,0.652521008403
lady,104365135,0.652521008403
shoes,104365135,0.6
st,104366010,0.533333333333
ad,104366010,0.533333333333
ray,104366010,0.597619047619
chic,104366010,0.533333333333
d,104394306,0.519480519481
dark,104394306,0.519480519481
comf,104394306,0.574358568261
casual,104394306,0.574358568261
sporty,104394306,0.574358568261
PEACEPRINCESS,104394306,0.0
shoes,104394889,1.15914601533
A.t and B.word are both defined as chararray, I have included my test cases 'q' and 'word' in the attachment.
Does anyone have an idea why JOIN is not working here?
Many Thanks,
Chloe H
Re: Join does not work in my case
Posted by Arthur Kho <be...@gmail.com>.
Your code to read word uses â\tâ as a delimiter but your file uses , as a delimiter.
> On 1 19, 2016, at 18:10, Chloe Huang <ch...@ussuning.com> wrote:
>
> Hi Guys,
>
> I am using PIG for data processing. But the join function seems not work in my case.
>
> The PIG script is as follow:
>
> A = LOAD './q' USING PigStorage(',') AS (ori_query: chararray, t: chararray, w: chararray);
>
> B = LOAD './word' USING PigStorage('\t') AS (word: chararray, proID: chararray, proScore: chararray);
>
> C = JOIN A by t, B by word;
> --DUMP C;
>
> STORE C INTO 'join_out';
>
> First I am loading my test case 'q' into A, and then load my test case 'word' into B.
> By "JOIN A by t, B by word', I am expecting an inner join of A's field 't' with B's field 'word'. In my test case, I have included many common fields in A.t and B.word.
> But I got nothing in my result C. The output file is also empty.
>
> Here is a small piece of 'q': (The document 'q' is attached)
> dark shoes for lady,dark,3.234
> dark shoes for lady,shoes,2.261
> dark shoes for lady,for,1.223
> dark shoes for lady,lady,2.345
> casual male shoes,casual,3.478
> casual male shoes,male,2.675
> casual male shoes,shoes,4.265
> casual sporty,casual,2.678
>
>
> Here is a small piece of 'word' (The document 'word' is attached)
> for,104365130,0.588235294118
> male,104365130, 0.588235294118
> 35,104365130,0.588235294118
> ar,104365132,0.588235294118
> cow,104365132,0.652521008403
> mm,104365132,0.588235294118
> 45109,104365135,0.588235294118
> medium,104365135,0.588235294118
> casual,104365135,0.588235294118
> fur,104365135,0.652521008403
> lady,104365135,0.652521008403
> shoes,104365135,0.6
> st,104366010,0.533333333333
> ad,104366010,0.533333333333
> ray,104366010,0.597619047619
> chic,104366010,0.533333333333
> d,104394306,0.519480519481
> dark,104394306,0.519480519481
> comf,104394306,0.574358568261
> casual,104394306,0.574358568261
> sporty,104394306,0.574358568261
> PEACEPRINCESS,104394306,0.0
> shoes,104394889,1.15914601533
>
>
> A.t and B.word are both defined as chararray, I have included my test cases 'q' and 'word' in the attachment.
> Does anyone have an idea why JOIN is not working here?
>
>
> Many Thanks,
> Chloe H