You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by pricila rr <pr...@gmail.com> on 2012/11/10 20:11:38 UTC

Jobs Hadoop-Mahout: Full Capacity

Hello,
How to run jobs on Hadoop-Mahout, using processor full capacity?
I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout not
use all this capacity.

Thank you,
Pricila

Re: Jobs Hadoop-Mahout: Full Capacity

Posted by Gustavo Enrique Salazar Torres <gs...@ime.usp.br>.
Hi Pricila:

I believe you need to configure this property:

mapreduce.tasktracker.map.tasks.maximum

To more than 2 which is the default configuration. You can also configure
the number of reducers by changing
mapreduce.tasktracker.reduce.tasks.maximum.

Regards
Gustavo


On Sat, Nov 10, 2012 at 5:11 PM, pricila rr <pr...@gmail.com> wrote:

> Hello,
> How to run jobs on Hadoop-Mahout, using processor full capacity?
> I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout not
> use all this capacity.
>
> Thank you,
> Pricila
>



-- 
Gustavo Salazar Torres
---------------------------------
- "When describing your own work, be humble and don’t use superlatives of
praise, either explicitly or implicitly, even if you are enthusiastic"
- Mathematical Writing, Donald Knuth

Re: Jobs Hadoop-Mahout: Full Capacity

Posted by Ted Dunning <te...@gmail.com>.
If you want k-means speed see the new k-means code:
https://github.com/tdunning/knn

Can you describe your data a bit?

On Sat, Nov 10, 2012 at 11:22 AM, pricila rr <pr...@gmail.com> wrote:

> I am running kmeans algorithm.
> Increasing the number of tasktrackers and datanodes, increase the speed?
>
> Thank you
>
> 2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>
>
> > I would imagine optimizing Mahout jobs are not fundamentally different
> from
> > optiimizing any Hadoop job. Make sure you have optimal amount of task per
> > node configured, as well as optimal amount of memory to prevent GC
> > thrashing. (Iterative Mahout batches tend to create GC churn at somewhat
> > respectable rate). When optimized correctly, individual Mahout tasks tend
> > to be CPU bound.
> >
> > Could you tell which Mahout method specifically you are talking about?
> >
> >
> > On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com>
> wrote:
> >
> > > Hello,
> > > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > > I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout
> > not
> > > use all this capacity.
> > >
> > > Thank you,
> > > Pricila
> > >
> >
>

Re: Jobs Hadoop-Mahout: Full Capacity

Posted by Sean Owen <sr...@gmail.com>.
That means you are having Hadoop run at most 1 reducer at a time across the
whole cluster. In any Hadoop job this needs to be set to about the number
of open reduce slots.


On Sat, Nov 10, 2012 at 7:28 PM, pricila rr <pr...@gmail.com> wrote:

> No, as is the default
>
> 2012/11/10 Sean Owen <sr...@gmail.com>
>
> > Did you set -Dmapred.reduce.tasks ? it defaults to 1.
> >
> >
> > On Sat, Nov 10, 2012 at 7:22 PM, pricila rr <pr...@gmail.com>
> wrote:
> >
> > > I am running kmeans algorithm.
> > > Increasing the number of tasktrackers and datanodes, increase the
> speed?
> > >
> > > Thank you
> > >
> > > 2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>
> > >
> > > > I would imagine optimizing Mahout jobs are not fundamentally
> different
> > > from
> > > > optiimizing any Hadoop job. Make sure you have optimal amount of task
> > per
> > > > node configured, as well as optimal amount of memory to prevent GC
> > > > thrashing. (Iterative Mahout batches tend to create GC churn at
> > somewhat
> > > > respectable rate). When optimized correctly, individual Mahout tasks
> > tend
> > > > to be CPU bound.
> > > >
> > > > Could you tell which Mahout method specifically you are talking
> about?
> > > >
> > > >
> > > > On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello,
> > > > > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > > > > I have 10 slaves and 1 master, with i5 CPU. But the jobs
> > Hadoop-Mahout
> > > > not
> > > > > use all this capacity.
> > > > >
> > > > > Thank you,
> > > > > Pricila
> > > > >
> > > >
> > >
> >
>

Re: Jobs Hadoop-Mahout: Full Capacity

Posted by pricila rr <pr...@gmail.com>.
No, as is the default

2012/11/10 Sean Owen <sr...@gmail.com>

> Did you set -Dmapred.reduce.tasks ? it defaults to 1.
>
>
> On Sat, Nov 10, 2012 at 7:22 PM, pricila rr <pr...@gmail.com> wrote:
>
> > I am running kmeans algorithm.
> > Increasing the number of tasktrackers and datanodes, increase the speed?
> >
> > Thank you
> >
> > 2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>
> >
> > > I would imagine optimizing Mahout jobs are not fundamentally different
> > from
> > > optiimizing any Hadoop job. Make sure you have optimal amount of task
> per
> > > node configured, as well as optimal amount of memory to prevent GC
> > > thrashing. (Iterative Mahout batches tend to create GC churn at
> somewhat
> > > respectable rate). When optimized correctly, individual Mahout tasks
> tend
> > > to be CPU bound.
> > >
> > > Could you tell which Mahout method specifically you are talking about?
> > >
> > >
> > > On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > > > I have 10 slaves and 1 master, with i5 CPU. But the jobs
> Hadoop-Mahout
> > > not
> > > > use all this capacity.
> > > >
> > > > Thank you,
> > > > Pricila
> > > >
> > >
> >
>

Re: Jobs Hadoop-Mahout: Full Capacity

Posted by Sean Owen <sr...@gmail.com>.
Did you set -Dmapred.reduce.tasks ? it defaults to 1.


On Sat, Nov 10, 2012 at 7:22 PM, pricila rr <pr...@gmail.com> wrote:

> I am running kmeans algorithm.
> Increasing the number of tasktrackers and datanodes, increase the speed?
>
> Thank you
>
> 2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>
>
> > I would imagine optimizing Mahout jobs are not fundamentally different
> from
> > optiimizing any Hadoop job. Make sure you have optimal amount of task per
> > node configured, as well as optimal amount of memory to prevent GC
> > thrashing. (Iterative Mahout batches tend to create GC churn at somewhat
> > respectable rate). When optimized correctly, individual Mahout tasks tend
> > to be CPU bound.
> >
> > Could you tell which Mahout method specifically you are talking about?
> >
> >
> > On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com>
> wrote:
> >
> > > Hello,
> > > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > > I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout
> > not
> > > use all this capacity.
> > >
> > > Thank you,
> > > Pricila
> > >
> >
>

Re: Jobs Hadoop-Mahout: Full Capacity

Posted by pricila rr <pr...@gmail.com>.
I am running kmeans algorithm.
Increasing the number of tasktrackers and datanodes, increase the speed?

Thank you

2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>

> I would imagine optimizing Mahout jobs are not fundamentally different from
> optiimizing any Hadoop job. Make sure you have optimal amount of task per
> node configured, as well as optimal amount of memory to prevent GC
> thrashing. (Iterative Mahout batches tend to create GC churn at somewhat
> respectable rate). When optimized correctly, individual Mahout tasks tend
> to be CPU bound.
>
> Could you tell which Mahout method specifically you are talking about?
>
>
> On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com> wrote:
>
> > Hello,
> > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout
> not
> > use all this capacity.
> >
> > Thank you,
> > Pricila
> >
>

Re: Jobs Hadoop-Mahout: Full Capacity

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I would imagine optimizing Mahout jobs are not fundamentally different from
optiimizing any Hadoop job. Make sure you have optimal amount of task per
node configured, as well as optimal amount of memory to prevent GC
thrashing. (Iterative Mahout batches tend to create GC churn at somewhat
respectable rate). When optimized correctly, individual Mahout tasks tend
to be CPU bound.

Could you tell which Mahout method specifically you are talking about?


On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com> wrote:

> Hello,
> How to run jobs on Hadoop-Mahout, using processor full capacity?
> I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout not
> use all this capacity.
>
> Thank you,
> Pricila
>