You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Austin Chungath <au...@gmail.com> on 2013/03/04 20:57:31 UTC

Need help optimizing reducer

Hi all,

I have 1 reducer and I have around 600 thousand unique keys coming to it.
The total data is only around 30 mb.
My logic doesn't allow me to have more than 1 reducer.
It's taking too long to complete, around 2 hours. (till 66% it's fast then
it slows down/ I don't really think it has started doing anything till 66%
but then why does it show like that?).
Are there any job execution parameters that can help improve reducer
performace?
Any suggestions to improve things when we have to live with just one
reducer?

thanks,
Austin

Re: Need help optimizing reducer

Posted by Fatih Haltas <fa...@nyu.edu>.

I mean,
while trying to add newcoming reducer input value to already merged input
values,to construct whole input values of corresponding key value to
reducer, you might be reading every input values(which are output value of
mapper) from beginning to end.


On Tue, Mar 5, 2013 at 1:46 PM, Fatih Haltas <fa...@nyu.edu> wrote:

> Hi Austin,
>
> I am not sure whether you had  this kind of mistake or not but in any
> case, I would like to state:
> that you might be trying to read whole input values,(corresponding key
> values) to reducer function from beginning to end(which is the output value
> of mapper) while merging them into one output.
>
> If you can send reducer code, you may get more useful replies.
>
>
>
>
> On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija <ba...@gmail.com>wrote:
>
>> The reason why the reducer is fast upto 66% is be because of the Sorting
>> and Shuffling phase of the reduce and when the actual task is NOT yet
>> started.
>>
>> The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
>> data), sort and finally user-code (reduce). That is why your reduce
>> might be faster upto 66%. In order to speed up your program you may either
>> have to have more number of reducers or make your reducer code as optimized
>> as possible.
>>
>> Best,
>> Mahesh Balija,
>> Calsoft Labs.
>>
>>
>> On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com>wrote:
>>
>>> Hi all,
>>>
>>> I have 1 reducer and I have around 600 thousand unique keys coming to
>>> it. The total data is only around 30 mb.
>>> My logic doesn't allow me to have more than 1 reducer.
>>> It's taking too long to complete, around 2 hours. (till 66% it's fast
>>> then it slows down/ I don't really think it has started doing anything till
>>> 66% but then why does it show like that?).
>>> Are there any job execution parameters that can help improve reducer
>>> performace?
>>> Any suggestions to improve things when we have to live with just one
>>> reducer?
>>>
>>> thanks,
>>> Austin
>>>
>>
>>
>

Re: Need help optimizing reducer

Posted by Fatih Haltas <fa...@nyu.edu>.

I mean,
while trying to add newcoming reducer input value to already merged input
values,to construct whole input values of corresponding key value to
reducer, you might be reading every input values(which are output value of
mapper) from beginning to end.


On Tue, Mar 5, 2013 at 1:46 PM, Fatih Haltas <fa...@nyu.edu> wrote:

> Hi Austin,
>
> I am not sure whether you had  this kind of mistake or not but in any
> case, I would like to state:
> that you might be trying to read whole input values,(corresponding key
> values) to reducer function from beginning to end(which is the output value
> of mapper) while merging them into one output.
>
> If you can send reducer code, you may get more useful replies.
>
>
>
>
> On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija <ba...@gmail.com>wrote:
>
>> The reason why the reducer is fast upto 66% is be because of the Sorting
>> and Shuffling phase of the reduce and when the actual task is NOT yet
>> started.
>>
>> The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
>> data), sort and finally user-code (reduce). That is why your reduce
>> might be faster upto 66%. In order to speed up your program you may either
>> have to have more number of reducers or make your reducer code as optimized
>> as possible.
>>
>> Best,
>> Mahesh Balija,
>> Calsoft Labs.
>>
>>
>> On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com>wrote:
>>
>>> Hi all,
>>>
>>> I have 1 reducer and I have around 600 thousand unique keys coming to
>>> it. The total data is only around 30 mb.
>>> My logic doesn't allow me to have more than 1 reducer.
>>> It's taking too long to complete, around 2 hours. (till 66% it's fast
>>> then it slows down/ I don't really think it has started doing anything till
>>> 66% but then why does it show like that?).
>>> Are there any job execution parameters that can help improve reducer
>>> performace?
>>> Any suggestions to improve things when we have to live with just one
>>> reducer?
>>>
>>> thanks,
>>> Austin
>>>
>>
>>
>

Re: Need help optimizing reducer

Posted by Fatih Haltas <fa...@nyu.edu>.

I mean,
while trying to add newcoming reducer input value to already merged input
values,to construct whole input values of corresponding key value to
reducer, you might be reading every input values(which are output value of
mapper) from beginning to end.


On Tue, Mar 5, 2013 at 1:46 PM, Fatih Haltas <fa...@nyu.edu> wrote:

> Hi Austin,
>
> I am not sure whether you had  this kind of mistake or not but in any
> case, I would like to state:
> that you might be trying to read whole input values,(corresponding key
> values) to reducer function from beginning to end(which is the output value
> of mapper) while merging them into one output.
>
> If you can send reducer code, you may get more useful replies.
>
>
>
>
> On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija <ba...@gmail.com>wrote:
>
>> The reason why the reducer is fast upto 66% is be because of the Sorting
>> and Shuffling phase of the reduce and when the actual task is NOT yet
>> started.
>>
>> The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
>> data), sort and finally user-code (reduce). That is why your reduce
>> might be faster upto 66%. In order to speed up your program you may either
>> have to have more number of reducers or make your reducer code as optimized
>> as possible.
>>
>> Best,
>> Mahesh Balija,
>> Calsoft Labs.
>>
>>
>> On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com>wrote:
>>
>>> Hi all,
>>>
>>> I have 1 reducer and I have around 600 thousand unique keys coming to
>>> it. The total data is only around 30 mb.
>>> My logic doesn't allow me to have more than 1 reducer.
>>> It's taking too long to complete, around 2 hours. (till 66% it's fast
>>> then it slows down/ I don't really think it has started doing anything till
>>> 66% but then why does it show like that?).
>>> Are there any job execution parameters that can help improve reducer
>>> performace?
>>> Any suggestions to improve things when we have to live with just one
>>> reducer?
>>>
>>> thanks,
>>> Austin
>>>
>>
>>
>

Re: Need help optimizing reducer

Posted by Fatih Haltas <fa...@nyu.edu>.

I mean,
while trying to add newcoming reducer input value to already merged input
values,to construct whole input values of corresponding key value to
reducer, you might be reading every input values(which are output value of
mapper) from beginning to end.


On Tue, Mar 5, 2013 at 1:46 PM, Fatih Haltas <fa...@nyu.edu> wrote:

> Hi Austin,
>
> I am not sure whether you had  this kind of mistake or not but in any
> case, I would like to state:
> that you might be trying to read whole input values,(corresponding key
> values) to reducer function from beginning to end(which is the output value
> of mapper) while merging them into one output.
>
> If you can send reducer code, you may get more useful replies.
>
>
>
>
> On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija <ba...@gmail.com>wrote:
>
>> The reason why the reducer is fast upto 66% is be because of the Sorting
>> and Shuffling phase of the reduce and when the actual task is NOT yet
>> started.
>>
>> The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
>> data), sort and finally user-code (reduce). That is why your reduce
>> might be faster upto 66%. In order to speed up your program you may either
>> have to have more number of reducers or make your reducer code as optimized
>> as possible.
>>
>> Best,
>> Mahesh Balija,
>> Calsoft Labs.
>>
>>
>> On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com>wrote:
>>
>>> Hi all,
>>>
>>> I have 1 reducer and I have around 600 thousand unique keys coming to
>>> it. The total data is only around 30 mb.
>>> My logic doesn't allow me to have more than 1 reducer.
>>> It's taking too long to complete, around 2 hours. (till 66% it's fast
>>> then it slows down/ I don't really think it has started doing anything till
>>> 66% but then why does it show like that?).
>>> Are there any job execution parameters that can help improve reducer
>>> performace?
>>> Any suggestions to improve things when we have to live with just one
>>> reducer?
>>>
>>> thanks,
>>> Austin
>>>
>>
>>
>

Re: Need help optimizing reducer

Posted by Fatih Haltas <fa...@nyu.edu>.

Hi Austin,

I am not sure whether you had  this kind of mistake or not but in any case,
I would like to state:
that you might be trying to read whole input values,(corresponding key
values) to reducer function from beginning to end(which is the output value
of mapper) while merging them into one output.

If you can send reducer code, you may get more useful replies.




On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija <ba...@gmail.com>wrote:

> The reason why the reducer is fast upto 66% is be because of the Sorting
> and Shuffling phase of the reduce and when the actual task is NOT yet
> started.
>
> The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
> data), sort and finally user-code (reduce). That is why your reduce might
> be faster upto 66%. In order to speed up your program you may either have
> to have more number of reducers or make your reducer code as optimized as
> possible.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com>wrote:
>
>> Hi all,
>>
>> I have 1 reducer and I have around 600 thousand unique keys coming to it.
>> The total data is only around 30 mb.
>> My logic doesn't allow me to have more than 1 reducer.
>> It's taking too long to complete, around 2 hours. (till 66% it's fast
>> then it slows down/ I don't really think it has started doing anything till
>> 66% but then why does it show like that?).
>> Are there any job execution parameters that can help improve reducer
>> performace?
>> Any suggestions to improve things when we have to live with just one
>> reducer?
>>
>> thanks,
>> Austin
>>
>
>

Re: Need help optimizing reducer

Posted by Fatih Haltas <fa...@nyu.edu>.

Hi Austin,

I am not sure whether you had  this kind of mistake or not but in any case,
I would like to state:
that you might be trying to read whole input values,(corresponding key
values) to reducer function from beginning to end(which is the output value
of mapper) while merging them into one output.

If you can send reducer code, you may get more useful replies.




On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija <ba...@gmail.com>wrote:

> The reason why the reducer is fast upto 66% is be because of the Sorting
> and Shuffling phase of the reduce and when the actual task is NOT yet
> started.
>
> The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
> data), sort and finally user-code (reduce). That is why your reduce might
> be faster upto 66%. In order to speed up your program you may either have
> to have more number of reducers or make your reducer code as optimized as
> possible.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com>wrote:
>
>> Hi all,
>>
>> I have 1 reducer and I have around 600 thousand unique keys coming to it.
>> The total data is only around 30 mb.
>> My logic doesn't allow me to have more than 1 reducer.
>> It's taking too long to complete, around 2 hours. (till 66% it's fast
>> then it slows down/ I don't really think it has started doing anything till
>> 66% but then why does it show like that?).
>> Are there any job execution parameters that can help improve reducer
>> performace?
>> Any suggestions to improve things when we have to live with just one
>> reducer?
>>
>> thanks,
>> Austin
>>
>
>

Re: Need help optimizing reducer

Posted by Fatih Haltas <fa...@nyu.edu>.

Hi Austin,

I am not sure whether you had  this kind of mistake or not but in any case,
I would like to state:
that you might be trying to read whole input values,(corresponding key
values) to reducer function from beginning to end(which is the output value
of mapper) while merging them into one output.

If you can send reducer code, you may get more useful replies.




On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija <ba...@gmail.com>wrote:

> The reason why the reducer is fast upto 66% is be because of the Sorting
> and Shuffling phase of the reduce and when the actual task is NOT yet
> started.
>
> The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
> data), sort and finally user-code (reduce). That is why your reduce might
> be faster upto 66%. In order to speed up your program you may either have
> to have more number of reducers or make your reducer code as optimized as
> possible.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com>wrote:
>
>> Hi all,
>>
>> I have 1 reducer and I have around 600 thousand unique keys coming to it.
>> The total data is only around 30 mb.
>> My logic doesn't allow me to have more than 1 reducer.
>> It's taking too long to complete, around 2 hours. (till 66% it's fast
>> then it slows down/ I don't really think it has started doing anything till
>> 66% but then why does it show like that?).
>> Are there any job execution parameters that can help improve reducer
>> performace?
>> Any suggestions to improve things when we have to live with just one
>> reducer?
>>
>> thanks,
>> Austin
>>
>
>

Re: Need help optimizing reducer

Posted by Fatih Haltas <fa...@nyu.edu>.

Hi Austin,

I am not sure whether you had  this kind of mistake or not but in any case,
I would like to state:
that you might be trying to read whole input values,(corresponding key
values) to reducer function from beginning to end(which is the output value
of mapper) while merging them into one output.

If you can send reducer code, you may get more useful replies.




On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija <ba...@gmail.com>wrote:

> The reason why the reducer is fast upto 66% is be because of the Sorting
> and Shuffling phase of the reduce and when the actual task is NOT yet
> started.
>
> The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
> data), sort and finally user-code (reduce). That is why your reduce might
> be faster upto 66%. In order to speed up your program you may either have
> to have more number of reducers or make your reducer code as optimized as
> possible.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com>wrote:
>
>> Hi all,
>>
>> I have 1 reducer and I have around 600 thousand unique keys coming to it.
>> The total data is only around 30 mb.
>> My logic doesn't allow me to have more than 1 reducer.
>> It's taking too long to complete, around 2 hours. (till 66% it's fast
>> then it slows down/ I don't really think it has started doing anything till
>> 66% but then why does it show like that?).
>> Are there any job execution parameters that can help improve reducer
>> performace?
>> Any suggestions to improve things when we have to live with just one
>> reducer?
>>
>> thanks,
>> Austin
>>
>
>

Re: Need help optimizing reducer

Posted by Mahesh Balija <ba...@gmail.com>.

The reason why the reducer is fast upto 66% is be because of the Sorting
and Shuffling phase of the reduce and when the actual task is NOT yet
started.

The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
data), sort and finally user-code (reduce). That is why your reduce might
be faster upto 66%. In order to speed up your program you may either have
to have more number of reducers or make your reducer code as optimized as
possible.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com> wrote:

> Hi all,
>
> I have 1 reducer and I have around 600 thousand unique keys coming to it.
> The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then
> it slows down/ I don't really think it has started doing anything till 66%
> but then why does it show like that?).
> Are there any job execution parameters that can help improve reducer
> performace?
> Any suggestions to improve things when we have to live with just one
> reducer?
>
> thanks,
> Austin
>

Re: Need help optimizing reducer

Posted by Mahesh Balija <ba...@gmail.com>.

The reason why the reducer is fast upto 66% is be because of the Sorting
and Shuffling phase of the reduce and when the actual task is NOT yet
started.

The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
data), sort and finally user-code (reduce). That is why your reduce might
be faster upto 66%. In order to speed up your program you may either have
to have more number of reducers or make your reducer code as optimized as
possible.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com> wrote:

> Hi all,
>
> I have 1 reducer and I have around 600 thousand unique keys coming to it.
> The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then
> it slows down/ I don't really think it has started doing anything till 66%
> but then why does it show like that?).
> Are there any job execution parameters that can help improve reducer
> performace?
> Any suggestions to improve things when we have to live with just one
> reducer?
>
> thanks,
> Austin
>

Re: Need help optimizing reducer

Posted by samir das mohapatra <sa...@gmail.com>.

Austin,
  I think  you have to use partitioner to spawn more then one reducer for
small data set.
  Default Partitioner will allow you only one reducer, you have to
overwrite and implement you own logic to spawn more then one reducer.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com> wrote:

> Hi all,
>
> I have 1 reducer and I have around 600 thousand unique keys coming to it.
> The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then
> it slows down/ I don't really think it has started doing anything till 66%
> but then why does it show like that?).
> Are there any job execution parameters that can help improve reducer
> performace?
> Any suggestions to improve things when we have to live with just one
> reducer?
>
> thanks,
> Austin
>

Re: Need help optimizing reducer

Posted by Ajay Srivastava <Aj...@guavus.com>.

Are you using combiner ? If not, that will be first thing to do.


On 05-Mar-2013, at 1:27 AM, Austin Chungath wrote:

> Hi all,
> 
> I have 1 reducer and I have around 600 thousand unique keys coming to it. The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then it slows down/ I don't really think it has started doing anything till 66% but then why does it show like that?). 
> Are there any job execution parameters that can help improve reducer performace?
> Any suggestions to improve things when we have to live with just one reducer?
> 
> thanks,
> Austin

Re: Need help optimizing reducer

Posted by Ajay Srivastava <Aj...@guavus.com>.

Are you using combiner ? If not, that will be first thing to do.


On 05-Mar-2013, at 1:27 AM, Austin Chungath wrote:

> Hi all,
> 
> I have 1 reducer and I have around 600 thousand unique keys coming to it. The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then it slows down/ I don't really think it has started doing anything till 66% but then why does it show like that?). 
> Are there any job execution parameters that can help improve reducer performace?
> Any suggestions to improve things when we have to live with just one reducer?
> 
> thanks,
> Austin

Re: Need help optimizing reducer

Posted by Ajay Srivastava <Aj...@guavus.com>.

Are you using combiner ? If not, that will be first thing to do.


On 05-Mar-2013, at 1:27 AM, Austin Chungath wrote:

> Hi all,
> 
> I have 1 reducer and I have around 600 thousand unique keys coming to it. The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then it slows down/ I don't really think it has started doing anything till 66% but then why does it show like that?). 
> Are there any job execution parameters that can help improve reducer performace?
> Any suggestions to improve things when we have to live with just one reducer?
> 
> thanks,
> Austin

Re: Need help optimizing reducer

Posted by Mahesh Balija <ba...@gmail.com>.

The reason why the reducer is fast upto 66% is be because of the Sorting
and Shuffling phase of the reduce and when the actual task is NOT yet
started.

The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
data), sort and finally user-code (reduce). That is why your reduce might
be faster upto 66%. In order to speed up your program you may either have
to have more number of reducers or make your reducer code as optimized as
possible.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com> wrote:

> Hi all,
>
> I have 1 reducer and I have around 600 thousand unique keys coming to it.
> The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then
> it slows down/ I don't really think it has started doing anything till 66%
> but then why does it show like that?).
> Are there any job execution parameters that can help improve reducer
> performace?
> Any suggestions to improve things when we have to live with just one
> reducer?
>
> thanks,
> Austin
>

Re: Need help optimizing reducer

Posted by Ajay Srivastava <Aj...@guavus.com>.

Are you using combiner ? If not, that will be first thing to do.


On 05-Mar-2013, at 1:27 AM, Austin Chungath wrote:

> Hi all,
> 
> I have 1 reducer and I have around 600 thousand unique keys coming to it. The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then it slows down/ I don't really think it has started doing anything till 66% but then why does it show like that?). 
> Are there any job execution parameters that can help improve reducer performace?
> Any suggestions to improve things when we have to live with just one reducer?
> 
> thanks,
> Austin

Re: Need help optimizing reducer

Posted by Mahesh Balija <ba...@gmail.com>.

The reason why the reducer is fast upto 66% is be because of the Sorting
and Shuffling phase of the reduce and when the actual task is NOT yet
started.

The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
data), sort and finally user-code (reduce). That is why your reduce might
be faster upto 66%. In order to speed up your program you may either have
to have more number of reducers or make your reducer code as optimized as
possible.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com> wrote:

> Hi all,
>
> I have 1 reducer and I have around 600 thousand unique keys coming to it.
> The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then
> it slows down/ I don't really think it has started doing anything till 66%
> but then why does it show like that?).
> Are there any job execution parameters that can help improve reducer
> performace?
> Any suggestions to improve things when we have to live with just one
> reducer?
>
> thanks,
> Austin
>

Re: Need help optimizing reducer

Posted by samir das mohapatra <sa...@gmail.com>.

Austin,
  I think  you have to use partitioner to spawn more then one reducer for
small data set.
  Default Partitioner will allow you only one reducer, you have to
overwrite and implement you own logic to spawn more then one reducer.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com> wrote:

> Hi all,
>
> I have 1 reducer and I have around 600 thousand unique keys coming to it.
> The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then
> it slows down/ I don't really think it has started doing anything till 66%
> but then why does it show like that?).
> Are there any job execution parameters that can help improve reducer
> performace?
> Any suggestions to improve things when we have to live with just one
> reducer?
>
> thanks,
> Austin
>

Re: Need help optimizing reducer

Posted by samir das mohapatra <sa...@gmail.com>.

Austin,
  I think  you have to use partitioner to spawn more then one reducer for
small data set.
  Default Partitioner will allow you only one reducer, you have to
overwrite and implement you own logic to spawn more then one reducer.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com> wrote:

> Hi all,
>
> I have 1 reducer and I have around 600 thousand unique keys coming to it.
> The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then
> it slows down/ I don't really think it has started doing anything till 66%
> but then why does it show like that?).
> Are there any job execution parameters that can help improve reducer
> performace?
> Any suggestions to improve things when we have to live with just one
> reducer?
>
> thanks,
> Austin
>

Re: Need help optimizing reducer

Posted by samir das mohapatra <sa...@gmail.com>.

Austin,
  I think  you have to use partitioner to spawn more then one reducer for
small data set.
  Default Partitioner will allow you only one reducer, you have to
overwrite and implement you own logic to spawn more then one reducer.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <au...@gmail.com> wrote:

> Hi all,
>
> I have 1 reducer and I have around 600 thousand unique keys coming to it.
> The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then
> it slows down/ I don't really think it has started doing anything till 66%
> but then why does it show like that?).
> Are there any job execution parameters that can help improve reducer
> performace?
> Any suggestions to improve things when we have to live with just one
> reducer?
>
> thanks,
> Austin
>