You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by john smith <js...@gmail.com> on 2011/09/10 11:06:04 UTC

Disable Sorting?

Hi,

Some of the MR jobs I run doesn't need sorting of map-output in each
partition. Is there someway I can disable it?

Any help?

Thanks
jS

Re: Disable Sorting?

Posted by Joey Echeverria <jo...@cloudera.com>.
The sort is what's implementing the group by key function. You can't
have one without the other in Hadoop. Are you trying to disable the
sort because you think it's too slow?

-Joey

On Sun, Sep 11, 2011 at 2:43 AM, john smith <js...@gmail.com> wrote:
> Hi Arun,
>
> Suppose I am doing a simple wordcount and the map-phase is over. After the
> shuffle, in each partition, the inputs to the reducer, come in a sorted
> order of keys. I want to disable this.
>
> Take the same case of wc. I don't mind the order in which my reduce gets the
> keys of a single partition. I guess hadoop does an external sort for this. I
> want to disable that.
>
> Thanks,
> jS
>
> On Sun, Sep 11, 2011 at 7:03 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
>
>> The point of a 'reduce phase' is to aggregate keys from different maps
>> (i.e. all inputs).
>>
>> I'm not sure what you are trying to do, but a use-case will help.
>>
>> IAC, the only way to achieve what you are trying to do is to run to jobs
>> with the first a map-only job (i.e. #reduces = 0).
>>
>> Arun
>>
>> On Sep 10, 2011, at 10:19 PM, john smith wrote:
>>
>> > Hey,
>> >
>> > I have reduce phases too. But for each reduce, I dont need sorted input
>> > (map-output for that corresponding reduce task).
>> > Setting #red to 0 completely removes the reduce phase.
>> >
>> > Am I missing something?
>> >
>> > Thanks,
>> >
>> > On Sun, Sep 11, 2011 at 12:18 AM, Arun C Murthy <ac...@hortonworks.com>
>> wrote:
>> >
>> >> Run a map-only job with #reduces set to 0.
>> >>
>> >> Arun
>> >>
>> >> On Sep 10, 2011, at 2:06 AM, john smith wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Some of the MR jobs I run doesn't need sorting of map-output in each
>> >>> partition. Is there someway I can disable it?
>> >>>
>> >>> Any help?
>> >>>
>> >>> Thanks
>> >>> jS
>> >>
>> >>
>>
>>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Disable Sorting?

Posted by john smith <js...@gmail.com>.
Hi Arun,

Suppose I am doing a simple wordcount and the map-phase is over. After the
shuffle, in each partition, the inputs to the reducer, come in a sorted
order of keys. I want to disable this.

Take the same case of wc. I don't mind the order in which my reduce gets the
keys of a single partition. I guess hadoop does an external sort for this. I
want to disable that.

Thanks,
jS

On Sun, Sep 11, 2011 at 7:03 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> The point of a 'reduce phase' is to aggregate keys from different maps
> (i.e. all inputs).
>
> I'm not sure what you are trying to do, but a use-case will help.
>
> IAC, the only way to achieve what you are trying to do is to run to jobs
> with the first a map-only job (i.e. #reduces = 0).
>
> Arun
>
> On Sep 10, 2011, at 10:19 PM, john smith wrote:
>
> > Hey,
> >
> > I have reduce phases too. But for each reduce, I dont need sorted input
> > (map-output for that corresponding reduce task).
> > Setting #red to 0 completely removes the reduce phase.
> >
> > Am I missing something?
> >
> > Thanks,
> >
> > On Sun, Sep 11, 2011 at 12:18 AM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
> >
> >> Run a map-only job with #reduces set to 0.
> >>
> >> Arun
> >>
> >> On Sep 10, 2011, at 2:06 AM, john smith wrote:
> >>
> >>> Hi,
> >>>
> >>> Some of the MR jobs I run doesn't need sorting of map-output in each
> >>> partition. Is there someway I can disable it?
> >>>
> >>> Any help?
> >>>
> >>> Thanks
> >>> jS
> >>
> >>
>
>

Re: Disable Sorting?

Posted by Arun C Murthy <ac...@hortonworks.com>.
The point of a 'reduce phase' is to aggregate keys from different maps (i.e. all inputs).

I'm not sure what you are trying to do, but a use-case will help.

IAC, the only way to achieve what you are trying to do is to run to jobs with the first a map-only job (i.e. #reduces = 0).

Arun

On Sep 10, 2011, at 10:19 PM, john smith wrote:

> Hey,
> 
> I have reduce phases too. But for each reduce, I dont need sorted input
> (map-output for that corresponding reduce task).
> Setting #red to 0 completely removes the reduce phase.
> 
> Am I missing something?
> 
> Thanks,
> 
> On Sun, Sep 11, 2011 at 12:18 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
> 
>> Run a map-only job with #reduces set to 0.
>> 
>> Arun
>> 
>> On Sep 10, 2011, at 2:06 AM, john smith wrote:
>> 
>>> Hi,
>>> 
>>> Some of the MR jobs I run doesn't need sorting of map-output in each
>>> partition. Is there someway I can disable it?
>>> 
>>> Any help?
>>> 
>>> Thanks
>>> jS
>> 
>> 


Re: Disable Sorting?

Posted by john smith <js...@gmail.com>.
Hey,

I have reduce phases too. But for each reduce, I dont need sorted input
(map-output for that corresponding reduce task).
Setting #red to 0 completely removes the reduce phase.

Am I missing something?

Thanks,

On Sun, Sep 11, 2011 at 12:18 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Run a map-only job with #reduces set to 0.
>
> Arun
>
> On Sep 10, 2011, at 2:06 AM, john smith wrote:
>
> > Hi,
> >
> > Some of the MR jobs I run doesn't need sorting of map-output in each
> > partition. Is there someway I can disable it?
> >
> > Any help?
> >
> > Thanks
> > jS
>
>

Re: Disable Sorting?

Posted by Owen O'Malley <ow...@hortonworks.com>.
On Sat, Sep 10, 2011 at 12:33 PM, Meng Mao <me...@gmail.com> wrote:

> Is there a way to collate the possibly large number of map output files,
> though?


You can make fewer mappers by setting the mapred.min.split.size to define
the smallest input that will be given to a mapper.

There isn't currently a way of getting a collated, but unsorted list of
key/value pairs. For most applications, the in memory sort is fairly cheap
relative to the shuffle and other parts of the processing.

-- Owen

Re: Disable Sorting?

Posted by Meng Mao <me...@gmail.com>.
Is there a way to collate the possibly large number of map output files,
though?

On Sat, Sep 10, 2011 at 2:48 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Run a map-only job with #reduces set to 0.
>
> Arun
>
> On Sep 10, 2011, at 2:06 AM, john smith wrote:
>
> > Hi,
> >
> > Some of the MR jobs I run doesn't need sorting of map-output in each
> > partition. Is there someway I can disable it?
> >
> > Any help?
> >
> > Thanks
> > jS
>
>

Re: Disable Sorting?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Run a map-only job with #reduces set to 0.

Arun

On Sep 10, 2011, at 2:06 AM, john smith wrote:

> Hi,
> 
> Some of the MR jobs I run doesn't need sorting of map-output in each
> partition. Is there someway I can disable it?
> 
> Any help?
> 
> Thanks
> jS