You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Sh...@cognizant.com on 2011/09/09 08:24:17 UTC

Multiple files as input to a mapreduce job

HI,

 

The following is the scenario I have:

 

I have a java program that reads multiple files from the disk.

*         There are 3 files (A,B,C) that are read and populated into 3
collections (arraylist).

*         There are 2 files input1 and input2 that act as input to my
program.

*         I search a keyword in file input1 and find the ID
corresponding to the matching entries.

*         This ID is used in file input2 to fetch the entries
corresponding to it.

There has to be a join between input1 and input2

 

I want to convert it into a map reduce program, is that possible?

How can we read multiple input files in the mapper, can we read files
from the disk.

Please Advice.....

 

 

Regards,

Shreya



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. 
Any unauthorised review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly 
prohibited and may be unlawful.

RE: Multiple files as input to a mapreduce job

Posted by Subroto Sanyal <su...@huawei.com>.
Hi Shreya,

 

The functionality expected by you achieved by Hive (Internally by Mapred). 

May be you can take look in Hive Join logics or can use Hive directly.

You can look into org.apache.hadoop.mapred.lib.CombineFileInputFormat<K,V>
and related classes for more details.
 

Regards, 
Subroto Sanyal

  _____  

From: Shreya.Pal@cognizant.com [mailto:Shreya.Pal@cognizant.com] 
Sent: Friday, September 09, 2011 11:54 AM
To: mapreduce-user@hadoop.apache.org
Subject: Multiple files as input to a mapreduce job

 

HI,

 

The following is the scenario I have:

 

I have a java program that reads multiple files from the disk.

*         There are 3 files (A,B,C) that are read and populated into 3
collections (arraylist).

*         There are 2 files input1 and input2 that act as input to my
program.

*         I search a keyword in file input1 and find the ID corresponding to
the matching entries.

*         This ID is used in file input2 to fetch the entries  corresponding
to it.

There has to be a join between input1 and input2

 

I want to convert it into a map reduce program, is that possible?

How can we read multiple input files in the mapper, can we read files from
the disk.

Please Advice...

 

 

Regards,

Shreya

This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message. 
Any unauthorised review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly 
prohibited and may be unlawful.