You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by "Chinni, Ravi" <rc...@syncsort.com> on 2010/07/09 22:07:12 UTC

Setting the number of mappers to 0

I am trying to develop a MR application. Due to the kind of application
I am trying to develop, the mapper is a dummy (passes it's input to it's
output) task and I am only interested in having a partitioner and
reducer.

The MR framework allows us to set the number of reducers to 0. Is there
a way to set the number of mappers to 0? Basically, I want to avoid the
overhead of creating mapper tasks and calling the map function once per
record.



Is it feasibly to modify the MR framework so as not the create the
mapper tasks and have the partitioners read the input data from HDFS
directly?



Any pointers are appreciated.



Thanks,

Ravi Chinni





_____________________________________________________________________________

ATTENTION:

The information contained in this message (including any files transmitted 
with this message) may contain proprietary, trade secret or other 
confidential and/or legally privileged information. Any pricing 
information contained in this message or in any files transmitted with 
this message is always confidential and cannot be shared with any third 
parties without prior written approval from Syncsort. This message is 
intended to be read only by the individual or entity to whom it is 
addressed or by their designee. If the reader of this message is not the 
intended recipient, you are on notice that any use, disclosure, copying or 
distribution of this message, in any form, is strictly prohibited. If you 
have received this message in error, please immediately notify the sender 
and/or Syncsort and destroy all copies of this message in your possession, 
custody or control.

Re: Setting the number of mappers to 0

Posted by Eric Sammer <es...@cloudera.com>.

Ravi:

Currently there's no way to avoid the map stage and the sort and
shuffle that comes with it. The only real option is to have an
identity mapper that passes the keys / values through as you're doing
now.

On Fri, Jul 9, 2010 at 4:07 PM, Chinni, Ravi <rc...@syncsort.com> wrote:
> I am trying to develop a MR application. Due to the kind of application I am
> trying to develop, the mapper is a dummy (passes it’s input to it’s output)
> task and I am only interested in having a partitioner and reducer.
>
> The MR framework allows us to set the number of reducers to 0. Is there a
> way to set the number of mappers to 0? Basically, I want to avoid the
> overhead of creating mapper tasks and calling the map function once per
> record.
>
>
>
> Is it feasibly to modify the MR framework so as not the create the mapper
> tasks and have the partitioners read the input data from HDFS directly?
>
>
>
> Any pointers are appreciated.
>
>
>
> Thanks,
>
> Ravi Chinni
>
>
>
>
> _____________________________________________________________________________
>
> ATTENTION:
>
> The information contained in this message (including any files transmitted
> with this message) may contain proprietary, trade secret or other
> confidential and/or legally privileged information. Any pricing information
> contained in this message or in any files transmitted with this message is
> always confidential and cannot be shared with any third parties without
> prior written approval from Syncsort. This message is intended to be read
> only by the individual or entity to whom it is addressed or by their
> designee. If the reader of this message is not the intended recipient, you
> are on notice that any use, disclosure, copying or distribution of this
> message, in any form, is strictly prohibited. If you have received this
> message in error, please immediately notify the sender and/or Syncsort and
> destroy all copies of this message in your possession, custody or control.



-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com