You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by lzg <lz...@163.com> on 2012/05/29 11:08:46 UTC

How to mapreduce in the scenario

Hi,
 
I wonder that if Hadoop can solve effectively the question as following:
 
==========================================
input file: a.txt, b.txt
result: c.txt
 
a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...
 
b.txt： 
id1,address1,...
id2,address2,...
id3,address3,...

c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...
========================================
 
I know that it can be done well by database.
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?
 
Any suggestion can help me. Thank you very much!
 
Best Regards,
 
Gump

Re: How to mapreduce in the scenario

Posted by Robert Evans <ev...@yahoo-inc.com>.

Yes you can do it.  In pig you would write something like

A = load ‘a.txt’ as (id, name, age, ...)
B = load ‘b.txt’ as (id, address, ...)
C = JOIN A BY id, B BY id;
STORE C into ‘c.txt’

Hive can do it similarly too.  Or you could write your own directly in map/redcue or using the data_join jar.

--Bobby Evans

On 5/29/12 4:08 AM, "lzg" <lz...@163.com> wrote:

Hi,

I wonder that if Hadoop can solve effectively the question as following:

==========================================
input file: a.txt, b.txt
result: c.txt

a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...

b.txt：
id1,address1,...
id2,address2,...
id3,address3,...

c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...
========================================

I know that it can be done well by database.
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?

Any suggestion can help me. Thank you very much!

Best Regards,

Gump