You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Elaine Gan <el...@gmo.jp> on 2012/09/13 04:24:45 UTC

Is mahout kmeans slow ?

Hi,

I'm trying to do some text analysis using mahout kmeans (clustering),
processing the data on hadoop.
--numClusters = 160 
--maxIter (-x) maxIter = 200

Well my data is small, around 500MB .
I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as
maximum.
When i run the mahout task, i can see that the number of map tasks are
the most 3, so i guess i do not need to do any tuning on this at this
moment.

One iteration took around 1.5mins ~ 2mins to finish.
I am not sure whether this is normal or is it consider slow, can anyone
gives me an advice on this?

And with x = 200, it tooks me around 200x2mins = 6 hours 
to finish the whole analysis..
Is it something which is unavoided?
The bigger the "x" is, the longer time it takes to finish the kmeans job?

Any ways to improve on the mahout kmeans to speed it up?

Thank you.





Re: Is mahout kmeans slow ?

Posted by Elaine Gan <el...@gmo.jp>.
Hi,

Sorry, i sent to the wrong ML.
Please ignore this.

Thank you.

> Hi,
> 
> I'm trying to do some text analysis using mahout kmeans (clustering),
> processing the data on hadoop.
> --numClusters = 160 
> --maxIter (-x) maxIter = 200
> 
> Well my data is small, around 500MB .
> I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as
> maximum.
> When i run the mahout task, i can see that the number of map tasks are
> the most 3, so i guess i do not need to do any tuning on this at this
> moment.
> 
> One iteration took around 1.5mins ~ 2mins to finish.
> I am not sure whether this is normal or is it consider slow, can anyone
> gives me an advice on this?
> 
> And with x = 200, it tooks me around 200x2mins = 6 hours 
> to finish the whole analysis..
> Is it something which is unavoided?
> The bigger the "x" is, the longer time it takes to finish the kmeans job?
> 
> Any ways to improve on the mahout kmeans to speed it up?
> 
> Thank you.
> 
> 


Re: Is mahout kmeans slow ?

Posted by Elaine Gan <el...@gmo.jp>.
Hi,

Sorry, i sent to the wrong ML.
Please ignore this.

Thank you.

> Hi,
> 
> I'm trying to do some text analysis using mahout kmeans (clustering),
> processing the data on hadoop.
> --numClusters = 160 
> --maxIter (-x) maxIter = 200
> 
> Well my data is small, around 500MB .
> I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as
> maximum.
> When i run the mahout task, i can see that the number of map tasks are
> the most 3, so i guess i do not need to do any tuning on this at this
> moment.
> 
> One iteration took around 1.5mins ~ 2mins to finish.
> I am not sure whether this is normal or is it consider slow, can anyone
> gives me an advice on this?
> 
> And with x = 200, it tooks me around 200x2mins = 6 hours 
> to finish the whole analysis..
> Is it something which is unavoided?
> The bigger the "x" is, the longer time it takes to finish the kmeans job?
> 
> Any ways to improve on the mahout kmeans to speed it up?
> 
> Thank you.
> 
> 


Re: Is mahout kmeans slow ?

Posted by Elaine Gan <el...@gmo.jp>.
Hi,

Sorry, i sent to the wrong ML.
Please ignore this.

Thank you.

> Hi,
> 
> I'm trying to do some text analysis using mahout kmeans (clustering),
> processing the data on hadoop.
> --numClusters = 160 
> --maxIter (-x) maxIter = 200
> 
> Well my data is small, around 500MB .
> I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as
> maximum.
> When i run the mahout task, i can see that the number of map tasks are
> the most 3, so i guess i do not need to do any tuning on this at this
> moment.
> 
> One iteration took around 1.5mins ~ 2mins to finish.
> I am not sure whether this is normal or is it consider slow, can anyone
> gives me an advice on this?
> 
> And with x = 200, it tooks me around 200x2mins = 6 hours 
> to finish the whole analysis..
> Is it something which is unavoided?
> The bigger the "x" is, the longer time it takes to finish the kmeans job?
> 
> Any ways to improve on the mahout kmeans to speed it up?
> 
> Thank you.
> 
> 


Re: Is mahout kmeans slow ?

Posted by Elaine Gan <el...@gmo.jp>.
Hi,

Sorry, i sent to the wrong ML.
Please ignore this.

Thank you.

> Hi,
> 
> I'm trying to do some text analysis using mahout kmeans (clustering),
> processing the data on hadoop.
> --numClusters = 160 
> --maxIter (-x) maxIter = 200
> 
> Well my data is small, around 500MB .
> I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as
> maximum.
> When i run the mahout task, i can see that the number of map tasks are
> the most 3, so i guess i do not need to do any tuning on this at this
> moment.
> 
> One iteration took around 1.5mins ~ 2mins to finish.
> I am not sure whether this is normal or is it consider slow, can anyone
> gives me an advice on this?
> 
> And with x = 200, it tooks me around 200x2mins = 6 hours 
> to finish the whole analysis..
> Is it something which is unavoided?
> The bigger the "x" is, the longer time it takes to finish the kmeans job?
> 
> Any ways to improve on the mahout kmeans to speed it up?
> 
> Thank you.
> 
>