You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by hiroshi leon <hi...@hotmail.com> on 2014/03/25 08:32:06 UTC

CIMapper and CIReducer Mahout k-Means implementation

Hello everybody,

I am new to this mapReduce and mahout and I have been revising the code in the past few days of the mapReduce implementation of mahout K-Means and  there are things that I still do not understand. For example:

In the CIMapper we have three main functions:

Setup()
Map()
Cleanup()

in the CIMapper setup() I noticed that there is a clusterClassifier and a policy. 

1 -I was wondering what is the meaning of these classes and policy?
2 -What are the possible clusters types that can be classified? is it related with hierarchical clustering, centroid based clustering, distribution based
clustering etc?

3 -At a high level what is the map and cleanup functions doing for the CIMapper()?

  protected void map(WritableComparable<?> key, VectorWritable value, Context context) throws IOException,
      InterruptedException {
    Vector probabilities = classifier.classify(value.get());
    Vector selections = policy.select(probabilities);
    for (Iterator<Element> it = selections.iterateNonZero(); it.hasNext();) {
      Element el = it.next();
      classifier.train(el.index(), value.get(), el.get());
    }
  }

  protected void cleanup(Context context) throws IOException, InterruptedException {
    List<Cluster> clusters = classifier.getModels();
    ClusterWritable cw = new ClusterWritable();
    for (int index = 0; index < clusters.size(); index++) {
      cw.setValue(clusters.get(index));
      context.write(new IntWritable(index), cw);
    }
    super.cleanup(context);
  }

4 - What is the reduce function doing for the CIReducer()?

  protected void reduce(IntWritable key, Iterable<ClusterWritable> values, Context context) throws IOException,
      InterruptedException {
    Iterator<ClusterWritable> iter = values.iterator();
    ClusterWritable first = null;
    while (iter.hasNext()) {
      ClusterWritable cw = iter.next();
      if (first == null) {
        first = cw;
      } else {
        first.getValue().observe(cw.getValue());
      }
    }
    List<Cluster> models = new ArrayList<Cluster>();
    models.add(first.getValue());
    classifier = new ClusterClassifier(models, policy);
    classifier.close();
    context.write(key, first);
  }


Thank you so much in advance, any idea or guidance I will really appreciate.

Best regards