You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sebastian Schelter (JIRA)" <ji...@apache.org> on 2014/04/20 11:39:15 UTC
[jira] [Resolved] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout
0.9 in EMR
[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Schelter resolved MAHOUT-1431.
----------------------------------------
Resolution: Cannot Reproduce
Closing this as we didn't get another answer for 6 weeks.
> Comparison of Mahout 0.8 vs mahout 0.9 in EMR
> ---------------------------------------------
>
> Key: MAHOUT-1431
> URL: https://issues.apache.org/jira/browse/MAHOUT-1431
> Project: Mahout
> Issue Type: Question
> Components: Clustering
> Affects Versions: 0.8, 0.9
> Reporter: yannis ats
> Labels: performance
>
> Hi all,
> i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and
> i performed kmeans experiments with both versions in amazon EMR.
> What i found is that mahout 0.8 is faster than mahout 0.9
> in particular i observed that mahout 0.8 is performing less iterations and every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout 0.8 is twice as fast as that of 0.9
> the hadoop version was 1.0.x and the input of the data was roughly 2 million datapoints with dimensionality of 1800.
> The input parameters in both experiments were exactly the same,modulo the initialization which was random in both cases and i can understand that this may affect the convergence(the amount of iterations),but i am baffled by the fact that every iteration takes almost twice the time in 0.9 vs 0.8
> Is this normal?is this expected?
> thank you in advance for your time.
--
This message was sent by Atlassian JIRA
(v6.2#6252)