You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Chih-Hsien Wu <ch...@gmail.com> on 2013/11/26 00:08:51 UTC

Only one reducer running on canopy generator

Hi all,  I have been experiencing memory issue while working with Mahout
canopy algorithm on big set of data on Hadoop. I notice that only one
reducer was running while other nodes were idle. I was wondering if
increasing the number of reduce tasks would ease down the memory usage and
speed up procedure. However, I realize that by configuring
"mapred.reduce.tasks" on Hadoop has no effect on canopy reduce tasks. It's
still running only with one reducer. Now, I'm question if canopy is set
that way, or am I not configuring correct on Hadoop?

Re: Only one reducer running on canopy generator

Posted by Chih-Hsien Wu <ch...@gmail.com>.
I got another question. The error "Java Heap Error" is kind of broad. I
don't know where I run out of the memory exactly. In other words, I'm
allowed to configure the daemon's heap sizes on Amazon Web services but
which heap size should I adjust, e.g. datanode, tasktracker, namenode?


On Tue, Nov 26, 2013 at 8:59 AM, Chih-Hsien Wu <ch...@gmail.com> wrote:

> Hey Suneel, I did hit the OOM during the generation phase. I increase the
> JVM by tuning up "mapred.child.java.opts" to the max (like 8g) but to no
> avail. I also notice that there are ton of free memory not be utilized!?.
> This might correspond to what you say that generation only take one
> reducer. So my question is, would increasing heap size of worknode or
> namenode help in this case?
>
>
> On Mon, Nov 25, 2013 at 6:59 PM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> Canopy Clustering is a 2 step process: Canopy Generation followed by
>> Canopy Clustering.
>>
>> For Canopy Generation, it uses a single reducer (and this cannot be
>> overidden), while the Clustering task uses multiple reducers.
>>
>> You seem to be hitting OOM during the Canopy generation phase.
>>
>>
>>
>>
>>
>> On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu <ch...@gmail.com>
>> wrote:
>>
>> Hi all,  I have been experiencing memory issue while working with Mahout
>> canopy algorithm on big set of data on Hadoop. I notice that only one
>> reducer was running while other nodes were idle. I was wondering if
>> increasing the number of reduce tasks would ease down the memory usage and
>> speed up procedure. However, I realize that by configuring
>> "mapred.reduce.tasks" on Hadoop has no effect on canopy reduce tasks. It's
>> still running only with one reducer. Now, I'm question if canopy is set
>> that way, or am I not configuring correct on Hadoop?
>>
>
>

Re: Only one reducer running on canopy generator

Posted by Chih-Hsien Wu <ch...@gmail.com>.
Hey Suneel, I did hit the OOM during the generation phase. I increase the
JVM by tuning up "mapred.child.java.opts" to the max (like 8g) but to no
avail. I also notice that there are ton of free memory not be utilized!?.
This might correspond to what you say that generation only take one
reducer. So my question is, would increasing heap size of worknode or
namenode help in this case?


On Mon, Nov 25, 2013 at 6:59 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Canopy Clustering is a 2 step process: Canopy Generation followed by
> Canopy Clustering.
>
> For Canopy Generation, it uses a single reducer (and this cannot be
> overidden), while the Clustering task uses multiple reducers.
>
> You seem to be hitting OOM during the Canopy generation phase.
>
>
>
>
>
> On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu <ch...@gmail.com>
> wrote:
>
> Hi all,  I have been experiencing memory issue while working with Mahout
> canopy algorithm on big set of data on Hadoop. I notice that only one
> reducer was running while other nodes were idle. I was wondering if
> increasing the number of reduce tasks would ease down the memory usage and
> speed up procedure. However, I realize that by configuring
> "mapred.reduce.tasks" on Hadoop has no effect on canopy reduce tasks. It's
> still running only with one reducer. Now, I'm question if canopy is set
> that way, or am I not configuring correct on Hadoop?
>

Re: Only one reducer running on canopy generator

Posted by Suneel Marthi <su...@yahoo.com>.
Canopy Clustering is a 2 step process: Canopy Generation followed by Canopy Clustering.

For Canopy Generation, it uses a single reducer (and this cannot be overidden), while the Clustering task uses multiple reducers.

You seem to be hitting OOM during the Canopy generation phase.





On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu <ch...@gmail.com> wrote:
 
Hi all,  I have been experiencing memory issue while working with Mahout
canopy algorithm on big set of data on Hadoop. I notice that only one
reducer was running while other nodes were idle. I was wondering if
increasing the number of reduce tasks would ease down the memory usage and
speed up procedure. However, I realize that by configuring
"mapred.reduce.tasks" on Hadoop has no effect on canopy reduce tasks. It's
still running only with one reducer. Now, I'm question if canopy is set
that way, or am I not configuring correct on Hadoop?