You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by hiroshi leon <hi...@hotmail.com> on 2014/03/25 08:32:06 UTC
CIMapper and CIReducer Mahout k-Means implementation
Hello everybody,
I am new to this mapReduce and mahout and I have been revising the code in the past few days of the mapReduce implementation of mahout K-Means and there are things that I still do not understand. For example:
In the CIMapper we have three main functions:
Setup()
Map()
Cleanup()
in the CIMapper setup() I noticed that there is a clusterClassifier and a policy.
1 -I was wondering what is the meaning of these classes and policy?
2 -What are the possible clusters types that can be classified? is it related with hierarchical clustering, centroid based clustering, distribution based
clustering etc?
3 -At a high level what is the map and cleanup functions doing for the CIMapper()?
protected void map(WritableComparable<?> key, VectorWritable value, Context context) throws IOException,
InterruptedException {
Vector probabilities = classifier.classify(value.get());
Vector selections = policy.select(probabilities);
for (Iterator<Element> it = selections.iterateNonZero(); it.hasNext();) {
Element el = it.next();
classifier.train(el.index(), value.get(), el.get());
}
}
protected void cleanup(Context context) throws IOException, InterruptedException {
List<Cluster> clusters = classifier.getModels();
ClusterWritable cw = new ClusterWritable();
for (int index = 0; index < clusters.size(); index++) {
cw.setValue(clusters.get(index));
context.write(new IntWritable(index), cw);
}
super.cleanup(context);
}
4 - What is the reduce function doing for the CIReducer()?
protected void reduce(IntWritable key, Iterable<ClusterWritable> values, Context context) throws IOException,
InterruptedException {
Iterator<ClusterWritable> iter = values.iterator();
ClusterWritable first = null;
while (iter.hasNext()) {
ClusterWritable cw = iter.next();
if (first == null) {
first = cw;
} else {
first.getValue().observe(cw.getValue());
}
}
List<Cluster> models = new ArrayList<Cluster>();
models.add(first.getValue());
classifier = new ClusterClassifier(models, policy);
classifier.close();
context.write(key, first);
}
Thank you so much in advance, any idea or guidance I will really appreciate.
Best regards