You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Frank Scholten <fr...@frankscholten.nl> on 2014/03/31 19:19:09 UTC

Difference between CiMapper and ClusterIterator

Hi all,

I noticed in the CIMapper that the policy.update() call is done in the
setup of the mapper, while
in the ClusterIterator it is called for every vector in the iteration.

In the sequential version there is only a single policy while in the MR
version we will get a policy per mapper. Which implementation is correct?
If I recall correctly from the previous K-means implementation the update
centroids step was done at the end of each iteration, so I think the
policy.update() call should be moved outside of the vector loop in
ClusterIterator.

Thoughts?

Cheers,

Frank