You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Scott C. Cote" <sc...@gmail.com> on 2013/12/23 23:22:21 UTC

Questions related to MiA and Quick tour of text analysisŠ..

All,

Two questions related to "Quick tour of text analysis using the Mahout
command line"

1.  metrics:
When moving through the process of performing the cluster analysis ­ one can
use many different metrics.  In the tour, the choice was made to use the
Cosine metric.  Is there any problems that can arise from using the cosine
metric to define the clusters, but use tanimoto or euclid to dump the
clusters?  I have so far remained consistent in that once starting with
Cosine, go all the way with cosine.  When does it make sense to not do what
I am doing?

To be clear ­ the current version of the tour does NOT specify that a metric
should be used when dumping a cluster, so the default "Euclid" is used.

2. Parameters around canopy cluster:
What are parameters t3 and t4?  I know that they are optional reducers and
t1 and t2 are used for them if t3 and t4 are not specified.

https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering

Lots of discussion about t1 and t2, but t3 and t4 are not covered in MiA
either.  Are these params that I should ignore for now?

SCott