You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by lzg <lz...@163.com> on 2012/05/29 11:08:46 UTC
How to mapreduce in the scenario
Hi,
I wonder that if Hadoop can solve effectively the question as following:
==========================================
input file: a.txt, b.txt
result: c.txt
a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...
b.txt:
id1,address1,...
id2,address2,...
id3,address3,...
c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...
========================================
I know that it can be done well by database.
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?
Any suggestion can help me. Thank you very much!
Best Regards,
Gump
Re: How to mapreduce in the scenario
Posted by Robert Evans <ev...@yahoo-inc.com>.
Yes you can do it. In pig you would write something like
A = load ‘a.txt’ as (id, name, age, ...)
B = load ‘b.txt’ as (id, address, ...)
C = JOIN A BY id, B BY id;
STORE C into ‘c.txt’
Hive can do it similarly too. Or you could write your own directly in map/redcue or using the data_join jar.
--Bobby Evans
On 5/29/12 4:08 AM, "lzg" <lz...@163.com> wrote:
Hi,
I wonder that if Hadoop can solve effectively the question as following:
==========================================
input file: a.txt, b.txt
result: c.txt
a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...
b.txt:
id1,address1,...
id2,address2,...
id3,address3,...
c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...
========================================
I know that it can be done well by database.
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?
Any suggestion can help me. Thank you very much!
Best Regards,
Gump