You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by rmx <ru...@hotmail.com> on 2010/11/26 23:45:11 UTC

is it necessary set mapred.map.tasks running mahout on a cluster?

Hi,

I read on Mahout in Action that I should set -Dmapred.map.tasks=X where X
would be the number of cores of my cluster.
I have been running experiments on amazon EC2 m.large instances.
I have been using kmeans over 1.1GB dataset.
I never set up that flag. I noticed that on a 10 machine cluster the maximum
cpu usage is 60%. 

Am I proceeding right? Shall I setup the flag? how?

thanks

-- 
View this message in context: http://lucene.472066.n3.nabble.com/is-it-necessary-set-mapred-map-tasks-running-mahout-on-a-cluster-tp1975103p1975103.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: is it necessary set mapred.map.tasks running mahout on a cluster?

Posted by Sean Owen <sr...@gmail.com>.
I tend to let the cluster decide these things based on the input size
and splits. But yes if you're not getting enough CPU utilization you
can try running more mappers. If you're I/O bound, it won't
necessarily help, but if not, it should increase throughput.

On Fri, Nov 26, 2010 at 10:45 PM, rmx <ru...@hotmail.com> wrote:
>
> Hi,
>
> I read on Mahout in Action that I should set -Dmapred.map.tasks=X where X
> would be the number of cores of my cluster.
> I have been running experiments on amazon EC2 m.large instances.
> I have been using kmeans over 1.1GB dataset.
> I never set up that flag. I noticed that on a 10 machine cluster the maximum
> cpu usage is 60%.
>
> Am I proceeding right? Shall I setup the flag? how?
>
> thanks
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/is-it-necessary-set-mapred-map-tasks-running-mahout-on-a-cluster-tp1975103p1975103.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>