You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Vadim Zaliva <kr...@gmail.com> on 2009/02/25 06:04:31 UTC

Re: Skip Reduce Phase

On Thu, Feb 7, 2008 at 10:07, Owen O'Malley <oo...@yahoo-inc.com> wrote:

> Setting it to 0 skips all of the buffering, sorting, merging, and shuffling.
> It passes the objects straight from the mapper to the output format, which
> writes it straight to hdfs.

I just tried to set number or Reduce tasks to 0, but Job Tracker shows
Reduce task working, doing "reduce > sort". I have a big data set and
it takes a while. It would be a good to find a way to skip it.

Vadim

Re: Skip Reduce Phase

Posted by Vadim Zaliva <kr...@gmail.com>.
I am sorry, it was my fault.  I have not updated JAR.
Now it seems to be working as expected. Thanks!

Vadim

On Tue, Feb 24, 2009 at 21:23, Jothi Padmanabhan <jo...@yahoo-inc.com> wrote:
> If you had set the number of reduce tasks to 0, you should not see the
> reduce>sort. How did you set the number of reducers?
> You could do that by doing
>
> job.setNumReduceTasks(0);
>
> Jothi
>
>
> On 2/25/09 10:34 AM, "Vadim Zaliva" <kr...@gmail.com> wrote:
>
>> On Thu, Feb 7, 2008 at 10:07, Owen O'Malley <oo...@yahoo-inc.com> wrote:
>>
>>> Setting it to 0 skips all of the buffering, sorting, merging, and shuffling.
>>> It passes the objects straight from the mapper to the output format, which
>>> writes it straight to hdfs.
>>
>> I just tried to set number or Reduce tasks to 0, but Job Tracker shows
>> Reduce task working, doing "reduce > sort". I have a big data set and
>> it takes a while. It would be a good to find a way to skip it.
>>
>> Vadim
>
>

Re: Skip Reduce Phase

Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
Sorry, this mail was intended for somebody else. Please disregard.




On 2/25/09 2:33 PM, "Jothi Padmanabhan" <jo...@yahoo-inc.com> wrote:

> Just to clarify -- setting test.build.data on the command line to point to
> some arbitrary directory in /tmp should work
> 
> ant -Dtestcase=TestMapReduceLocal -Dtest.output=yes
> -Dtest.build.data=/tmp/foo test-core
> 
> Jothi
> 
> 
> On 2/25/09 10:53 AM, "Jothi Padmanabhan" <jo...@yahoo-inc.com> wrote:
> 
>> If you had set the number of reduce tasks to 0, you should not see the
>> reduce>sort. How did you set the number of reducers?
>> You could do that by doing
>> 
>> job.setNumReduceTasks(0);
>> 
>> Jothi
>> 
>> 
>> On 2/25/09 10:34 AM, "Vadim Zaliva" <kr...@gmail.com> wrote:
>> 
>>> On Thu, Feb 7, 2008 at 10:07, Owen O'Malley <oo...@yahoo-inc.com> wrote:
>>> 
>>>> Setting it to 0 skips all of the buffering, sorting, merging, and
>>>> shuffling.
>>>> It passes the objects straight from the mapper to the output format, which
>>>> writes it straight to hdfs.
>>> 
>>> I just tried to set number or Reduce tasks to 0, but Job Tracker shows
>>> Reduce task working, doing "reduce > sort". I have a big data set and
>>> it takes a while. It would be a good to find a way to skip it.
>>> 
>>> Vadim
> 


Re: Skip Reduce Phase

Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
Just to clarify -- setting test.build.data on the command line to point to
some arbitrary directory in /tmp should work

ant -Dtestcase=TestMapReduceLocal -Dtest.output=yes
-Dtest.build.data=/tmp/foo test-core

Jothi


On 2/25/09 10:53 AM, "Jothi Padmanabhan" <jo...@yahoo-inc.com> wrote:

> If you had set the number of reduce tasks to 0, you should not see the
> reduce>sort. How did you set the number of reducers?
> You could do that by doing
> 
> job.setNumReduceTasks(0);
> 
> Jothi
> 
> 
> On 2/25/09 10:34 AM, "Vadim Zaliva" <kr...@gmail.com> wrote:
> 
>> On Thu, Feb 7, 2008 at 10:07, Owen O'Malley <oo...@yahoo-inc.com> wrote:
>> 
>>> Setting it to 0 skips all of the buffering, sorting, merging, and shuffling.
>>> It passes the objects straight from the mapper to the output format, which
>>> writes it straight to hdfs.
>> 
>> I just tried to set number or Reduce tasks to 0, but Job Tracker shows
>> Reduce task working, doing "reduce > sort". I have a big data set and
>> it takes a while. It would be a good to find a way to skip it.
>> 
>> Vadim


Re: Skip Reduce Phase

Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
If you had set the number of reduce tasks to 0, you should not see the
reduce>sort. How did you set the number of reducers?
You could do that by doing

job.setNumReduceTasks(0);

Jothi


On 2/25/09 10:34 AM, "Vadim Zaliva" <kr...@gmail.com> wrote:

> On Thu, Feb 7, 2008 at 10:07, Owen O'Malley <oo...@yahoo-inc.com> wrote:
> 
>> Setting it to 0 skips all of the buffering, sorting, merging, and shuffling.
>> It passes the objects straight from the mapper to the output format, which
>> writes it straight to hdfs.
> 
> I just tried to set number or Reduce tasks to 0, but Job Tracker shows
> Reduce task working, doing "reduce > sort". I have a big data set and
> it takes a while. It would be a good to find a way to skip it.
> 
> Vadim