You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by pricila rr <pr...@gmail.com> on 2012/11/10 20:11:38 UTC
Jobs Hadoop-Mahout: Full Capacity
Hello,
How to run jobs on Hadoop-Mahout, using processor full capacity?
I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout not
use all this capacity.
Thank you,
Pricila
Re: Jobs Hadoop-Mahout: Full Capacity
Posted by Gustavo Enrique Salazar Torres <gs...@ime.usp.br>.
Hi Pricila:
I believe you need to configure this property:
mapreduce.tasktracker.map.tasks.maximum
To more than 2 which is the default configuration. You can also configure
the number of reducers by changing
mapreduce.tasktracker.reduce.tasks.maximum.
Regards
Gustavo
On Sat, Nov 10, 2012 at 5:11 PM, pricila rr <pr...@gmail.com> wrote:
> Hello,
> How to run jobs on Hadoop-Mahout, using processor full capacity?
> I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout not
> use all this capacity.
>
> Thank you,
> Pricila
>
--
Gustavo Salazar Torres
---------------------------------
- "When describing your own work, be humble and don’t use superlatives of
praise, either explicitly or implicitly, even if you are enthusiastic"
- Mathematical Writing, Donald Knuth
Re: Jobs Hadoop-Mahout: Full Capacity
Posted by Ted Dunning <te...@gmail.com>.
If you want k-means speed see the new k-means code:
https://github.com/tdunning/knn
Can you describe your data a bit?
On Sat, Nov 10, 2012 at 11:22 AM, pricila rr <pr...@gmail.com> wrote:
> I am running kmeans algorithm.
> Increasing the number of tasktrackers and datanodes, increase the speed?
>
> Thank you
>
> 2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>
>
> > I would imagine optimizing Mahout jobs are not fundamentally different
> from
> > optiimizing any Hadoop job. Make sure you have optimal amount of task per
> > node configured, as well as optimal amount of memory to prevent GC
> > thrashing. (Iterative Mahout batches tend to create GC churn at somewhat
> > respectable rate). When optimized correctly, individual Mahout tasks tend
> > to be CPU bound.
> >
> > Could you tell which Mahout method specifically you are talking about?
> >
> >
> > On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com>
> wrote:
> >
> > > Hello,
> > > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > > I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout
> > not
> > > use all this capacity.
> > >
> > > Thank you,
> > > Pricila
> > >
> >
>
Re: Jobs Hadoop-Mahout: Full Capacity
Posted by Sean Owen <sr...@gmail.com>.
That means you are having Hadoop run at most 1 reducer at a time across the
whole cluster. In any Hadoop job this needs to be set to about the number
of open reduce slots.
On Sat, Nov 10, 2012 at 7:28 PM, pricila rr <pr...@gmail.com> wrote:
> No, as is the default
>
> 2012/11/10 Sean Owen <sr...@gmail.com>
>
> > Did you set -Dmapred.reduce.tasks ? it defaults to 1.
> >
> >
> > On Sat, Nov 10, 2012 at 7:22 PM, pricila rr <pr...@gmail.com>
> wrote:
> >
> > > I am running kmeans algorithm.
> > > Increasing the number of tasktrackers and datanodes, increase the
> speed?
> > >
> > > Thank you
> > >
> > > 2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>
> > >
> > > > I would imagine optimizing Mahout jobs are not fundamentally
> different
> > > from
> > > > optiimizing any Hadoop job. Make sure you have optimal amount of task
> > per
> > > > node configured, as well as optimal amount of memory to prevent GC
> > > > thrashing. (Iterative Mahout batches tend to create GC churn at
> > somewhat
> > > > respectable rate). When optimized correctly, individual Mahout tasks
> > tend
> > > > to be CPU bound.
> > > >
> > > > Could you tell which Mahout method specifically you are talking
> about?
> > > >
> > > >
> > > > On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello,
> > > > > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > > > > I have 10 slaves and 1 master, with i5 CPU. But the jobs
> > Hadoop-Mahout
> > > > not
> > > > > use all this capacity.
> > > > >
> > > > > Thank you,
> > > > > Pricila
> > > > >
> > > >
> > >
> >
>
Re: Jobs Hadoop-Mahout: Full Capacity
Posted by pricila rr <pr...@gmail.com>.
No, as is the default
2012/11/10 Sean Owen <sr...@gmail.com>
> Did you set -Dmapred.reduce.tasks ? it defaults to 1.
>
>
> On Sat, Nov 10, 2012 at 7:22 PM, pricila rr <pr...@gmail.com> wrote:
>
> > I am running kmeans algorithm.
> > Increasing the number of tasktrackers and datanodes, increase the speed?
> >
> > Thank you
> >
> > 2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>
> >
> > > I would imagine optimizing Mahout jobs are not fundamentally different
> > from
> > > optiimizing any Hadoop job. Make sure you have optimal amount of task
> per
> > > node configured, as well as optimal amount of memory to prevent GC
> > > thrashing. (Iterative Mahout batches tend to create GC churn at
> somewhat
> > > respectable rate). When optimized correctly, individual Mahout tasks
> tend
> > > to be CPU bound.
> > >
> > > Could you tell which Mahout method specifically you are talking about?
> > >
> > >
> > > On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > > > I have 10 slaves and 1 master, with i5 CPU. But the jobs
> Hadoop-Mahout
> > > not
> > > > use all this capacity.
> > > >
> > > > Thank you,
> > > > Pricila
> > > >
> > >
> >
>
Re: Jobs Hadoop-Mahout: Full Capacity
Posted by Sean Owen <sr...@gmail.com>.
Did you set -Dmapred.reduce.tasks ? it defaults to 1.
On Sat, Nov 10, 2012 at 7:22 PM, pricila rr <pr...@gmail.com> wrote:
> I am running kmeans algorithm.
> Increasing the number of tasktrackers and datanodes, increase the speed?
>
> Thank you
>
> 2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>
>
> > I would imagine optimizing Mahout jobs are not fundamentally different
> from
> > optiimizing any Hadoop job. Make sure you have optimal amount of task per
> > node configured, as well as optimal amount of memory to prevent GC
> > thrashing. (Iterative Mahout batches tend to create GC churn at somewhat
> > respectable rate). When optimized correctly, individual Mahout tasks tend
> > to be CPU bound.
> >
> > Could you tell which Mahout method specifically you are talking about?
> >
> >
> > On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com>
> wrote:
> >
> > > Hello,
> > > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > > I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout
> > not
> > > use all this capacity.
> > >
> > > Thank you,
> > > Pricila
> > >
> >
>
Re: Jobs Hadoop-Mahout: Full Capacity
Posted by pricila rr <pr...@gmail.com>.
I am running kmeans algorithm.
Increasing the number of tasktrackers and datanodes, increase the speed?
Thank you
2012/11/10 Dmitriy Lyubimov <dl...@gmail.com>
> I would imagine optimizing Mahout jobs are not fundamentally different from
> optiimizing any Hadoop job. Make sure you have optimal amount of task per
> node configured, as well as optimal amount of memory to prevent GC
> thrashing. (Iterative Mahout batches tend to create GC churn at somewhat
> respectable rate). When optimized correctly, individual Mahout tasks tend
> to be CPU bound.
>
> Could you tell which Mahout method specifically you are talking about?
>
>
> On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com> wrote:
>
> > Hello,
> > How to run jobs on Hadoop-Mahout, using processor full capacity?
> > I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout
> not
> > use all this capacity.
> >
> > Thank you,
> > Pricila
> >
>
Re: Jobs Hadoop-Mahout: Full Capacity
Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I would imagine optimizing Mahout jobs are not fundamentally different from
optiimizing any Hadoop job. Make sure you have optimal amount of task per
node configured, as well as optimal amount of memory to prevent GC
thrashing. (Iterative Mahout batches tend to create GC churn at somewhat
respectable rate). When optimized correctly, individual Mahout tasks tend
to be CPU bound.
Could you tell which Mahout method specifically you are talking about?
On Sat, Nov 10, 2012 at 11:11 AM, pricila rr <pr...@gmail.com> wrote:
> Hello,
> How to run jobs on Hadoop-Mahout, using processor full capacity?
> I have 10 slaves and 1 master, with i5 CPU. But the jobs Hadoop-Mahout not
> use all this capacity.
>
> Thank you,
> Pricila
>