You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Alex Luya <al...@gmail.com> on 2010/09/07 15:57:37 UTC
How to analyze the result of clustering based on mahout 0.4?
Hello:
I found directory : output/clusteredPoints doesn't existed.and the result of dump like this:
---------------------------------------------------------------------------------------
VL-21569{n=760 c=[0.4:0.009, 0.68:0.012, 0.75:0.011, 0.79:0.013, 00:0.062, 00.11:0.012,
Top Terms:
quarter => 2.8133782223651282
share => 2.619699128050553
earnings => 2.210144190411819
dlrs => 2.1388998663739156
cts => 2.0921635480303515
dividend => 2.0305285077346
company => 1.9935854278112712
said => 1.9911234617233275
its => 1.8312319523409792
year => 1.6385857475431342
---------------------------------------------------------------------------------------
what does first line mean?
Re: How to analyze the result of clustering based on mahout 0.4?
Posted by Jeff Eastman <jd...@windwardsolutions.com>.
It's a little cryptic I suppose. This looks like ClusterDumper output.
The first line is a formatted representation of a converged k-Means
cluster (VL) id = 21569. It observed (was assigned) 760 points during
the last iteration. It has a center vector (c=[...]) with several terms
and looks to be sparse. The sparse vector terms print index:value and it
looks like the term dictionary you provided contains some floating point
coefficients at the beginning. From the top terms printout following, it
looks like only some of your terms are numeric and indeed the top 10 for
this cluster all have textual values (quarter, share, earnings, ...,
year). Buried in the other term printouts you should see
"quarter:2.813", "share:2.62" and so fourth. The cluster also has a
radius vector (r=[...]) which is the standard deviation of the 760
observed data points.
On 9/7/10 6:57 AM, Alex Luya wrote:
> Hello:
> I found directory : output/clusteredPoints---------------------------------------------------------------------------------------
> VL-21569{n=760 c=[0.4:0.009, 0.68:0.012, 0.75:0.011, 0.79:0.013, 00:0.062, 00.11:0.012,
> Top Terms:
> quarter => 2.8133782223651282
> share => 2.619699128050553
> earnings => 2.210144190411819
> dlrs => 2.1388998663739156
> cts => 2.0921635480303515
> dividend => 2.0305285077346
> company => 1.9935854278112712
> said => 1.9911234617233275
> its => 1.8312319523409792
> year => 1.6385857475431342
> ---------------------------------------------------------------------------------------
>
> what does first line mean?
>