You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by biro lehel <le...@yahoo.com> on 2012/05/21 10:12:18 UTC

Transfer archives (or any file) from Mapper to Reducer?

Dear all,

In my Mapper, I run a script that processes my set of input text files, creates from them some other text files (this is done locally on the FS on my nodes), and as a result, each MapTask will produce an archive as a result. My issue is, that I'm looking for a way for the Reducer to "take" these archives as some kind of an input. I understood that the communication between Mapper-Reducer is done through the means of the key-value pairs in the Context, but what I would need is the transferring of these archive files to the respective Reducer (I would probably have one single Reducer, so all the files should be transferred/copied there somehow).

Is this possible? Is there a way to transfer files from Mapper to Reducer? If not, what is the best approach in scenarios like mine? Any suggestions would be greatly appreciated.

Thank you in advance,
Lehel.




Re: Transfer archives (or any file) from Mapper to Reducer?

Posted by Robert Evans <ev...@yahoo-inc.com>.
Be careful putting them in HDFS.  It does not scale very well, as the number of file opens will be on the order of Number of Mappers * Number of Reducers.  You can quickly do a denial of service on the namenode if you have a lot of mappers and reducers.

--Bobby Evans

On 5/21/12 4:02 AM, "Harsh J" <ha...@cloudera.com> wrote:

Biro,

I guess you could write these archives onto HDFS, and have your
reducers read it from a location there, but this method may be a bit
ugly. See http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
for properly writing files from tasks onto a DFS, or look at
MultipleOutputs API class.

Depending on how large these files are, you can also perhaps ship them
in via the KV pairs itself. A custom key or sort comparator can
further ensure that they are delivered in the first iteration of the
reducer - if the file is required before regular reduce() ops can
begin.

On Mon, May 21, 2012 at 1:42 PM, biro lehel <le...@yahoo.com> wrote:
> Dear all,
>
> In my Mapper, I run a script that processes my set of input text files, creates from them some other text files (this is done locally on the FS on my nodes), and as a result, each MapTask will produce an archive as a result. My issue is, that I'm looking for a way for the Reducer to "take" these archives as some kind of an input. I understood that the communication between Mapper-Reducer is done through the means of the key-value pairs in the Context, but what I would need is the transferring of these archive files to the respective Reducer (I would probably have one single Reducer, so all the files should be transferred/copied there somehow).
>
> Is this possible? Is there a way to transfer files from Mapper to Reducer? If not, what is the best approach in scenarios like mine? Any suggestions would be greatly appreciated.
>
> Thank you in advance,
> Lehel.
>
>
>



--
Harsh J


Re: Transfer archives (or any file) from Mapper to Reducer?

Posted by Harsh J <ha...@cloudera.com>.
Biro,

I guess you could write these archives onto HDFS, and have your
reducers read it from a location there, but this method may be a bit
ugly. See http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
for properly writing files from tasks onto a DFS, or look at
MultipleOutputs API class.

Depending on how large these files are, you can also perhaps ship them
in via the KV pairs itself. A custom key or sort comparator can
further ensure that they are delivered in the first iteration of the
reducer - if the file is required before regular reduce() ops can
begin.

On Mon, May 21, 2012 at 1:42 PM, biro lehel <le...@yahoo.com> wrote:
> Dear all,
>
> In my Mapper, I run a script that processes my set of input text files, creates from them some other text files (this is done locally on the FS on my nodes), and as a result, each MapTask will produce an archive as a result. My issue is, that I'm looking for a way for the Reducer to "take" these archives as some kind of an input. I understood that the communication between Mapper-Reducer is done through the means of the key-value pairs in the Context, but what I would need is the transferring of these archive files to the respective Reducer (I would probably have one single Reducer, so all the files should be transferred/copied there somehow).
>
> Is this possible? Is there a way to transfer files from Mapper to Reducer? If not, what is the best approach in scenarios like mine? Any suggestions would be greatly appreciated.
>
> Thank you in advance,
> Lehel.
>
>
>



-- 
Harsh J