You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chloe Huang <ch...@ussuning.com> on 2016/04/20 03:10:24 UTC

Join does not work in my case

Hi Guys,


I am using PIG for data processing. But the join function seems not work in my case.


The PIG script is as follow:


A = LOAD './q' USING PigStorage(',') AS (ori_query: chararray, t: chararray, w: chararray);


B = LOAD './word' USING PigStorage('\t') AS (word: chararray, proID: chararray, proScore: chararray);


C = JOIN A by t, B by word;

--DUMP C;


STORE C INTO 'join_out';


First I am loading my test case 'q' into A, and then load my test case 'word' into B.

By "JOIN A  by t, B by word', I am expecting an inner join of A's field 't' with B's field 'word'. In my test case, I have included many common fields in A.t and B.word.

But I got nothing in my result C. The output file is also empty.


Here is a small piece of 'q':         (The document 'q' is attached)

dark shoes for lady,dark,3.234

dark shoes for lady,shoes,2.261

dark shoes for lady,for,1.223

dark shoes for lady,lady,2.345

casual male shoes,casual,3.478

casual male shoes,male,2.675

casual male shoes,shoes,4.265

casual sporty,casual,2.678


Here is a small piece of 'word'    (The document 'word' is attached)

for,104365130,0.588235294118

male,104365130, 0.588235294118

35,104365130,0.588235294118

ar,104365132,0.588235294118

cow,104365132,0.652521008403

mm,104365132,0.588235294118

45109,104365135,0.588235294118

medium,104365135,0.588235294118

casual,104365135,0.588235294118

fur,104365135,0.652521008403

lady,104365135,0.652521008403

shoes,104365135,0.6

st,104366010,0.533333333333

ad,104366010,0.533333333333

ray,104366010,0.597619047619

chic,104366010,0.533333333333

d,104394306,0.519480519481

dark,104394306,0.519480519481

comf,104394306,0.574358568261

casual,104394306,0.574358568261

sporty,104394306,0.574358568261

PEACEPRINCESS,104394306,0.0

shoes,104394889,1.15914601533



A.t and B.word are both defined as chararray, I have included my test cases 'q' and 'word' in the attachment.

Does anyone have an idea why JOIN is not working here?



Many Thanks,

Chloe H






Re: Join does not work in my case

Posted by Arthur Kho <be...@gmail.com>.
Your code to read word uses ‘\t’ as a delimiter but your file uses , as a delimiter.

> On 1 19, 2016, at 18:10, Chloe Huang <ch...@ussuning.com> wrote:
> 
> Hi Guys,
> 
> I am using PIG for data processing. But the join function seems not work in my case. 
> 
> The PIG script is as follow:
> 
> A = LOAD './q' USING PigStorage(',') AS (ori_query: chararray, t: chararray, w: chararray);
> 
> B = LOAD './word' USING PigStorage('\t') AS (word: chararray, proID: chararray, proScore: chararray);
> 
> C = JOIN A by t, B by word;
> --DUMP C;
> 
> STORE C INTO 'join_out';
> 
> First I am loading my test case 'q' into A, and then load my test case 'word' into B. 
> By "JOIN A  by t, B by word', I am expecting an inner join of A's field 't' with B's field 'word'. In my test case, I have included many common fields in A.t and B.word. 
> But I got nothing in my result C. The output file is also empty.
> 
> Here is a small piece of 'q':         (The document 'q' is attached)
> dark shoes for lady,dark,3.234
> dark shoes for lady,shoes,2.261
> dark shoes for lady,for,1.223
> dark shoes for lady,lady,2.345
> casual male shoes,casual,3.478
> casual male shoes,male,2.675
> casual male shoes,shoes,4.265
> casual sporty,casual,2.678
> 
> 
> Here is a small piece of 'word'    (The document 'word' is attached)
> for,104365130,0.588235294118
> male,104365130, 0.588235294118
> 35,104365130,0.588235294118
> ar,104365132,0.588235294118
> cow,104365132,0.652521008403
> mm,104365132,0.588235294118
> 45109,104365135,0.588235294118
> medium,104365135,0.588235294118
> casual,104365135,0.588235294118
> fur,104365135,0.652521008403
> lady,104365135,0.652521008403
> shoes,104365135,0.6
> st,104366010,0.533333333333
> ad,104366010,0.533333333333
> ray,104366010,0.597619047619
> chic,104366010,0.533333333333
> d,104394306,0.519480519481
> dark,104394306,0.519480519481
> comf,104394306,0.574358568261
> casual,104394306,0.574358568261
> sporty,104394306,0.574358568261
> PEACEPRINCESS,104394306,0.0
> shoes,104394889,1.15914601533
> 
> 
> A.t and B.word are both defined as chararray, I have included my test cases 'q' and 'word' in the attachment. 
> Does anyone have an idea why JOIN is not working here?
> 
> 
> Many Thanks,
> Chloe H