You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Hasan, Maryam" <mh...@WPI.EDU> on 2013/09/12 17:05:53 UTC

clusterdump return wrong results!

Hi all,

I am using Mahout's java API  for kmean clustering of twitter messages. I created a TF-IDF vector and wrapped it as a Named Vector  for each tweet and used it as input for kmean.
When I run clusterdump to get top terms in each cluster, it returns terms that don't exist in the clusters. I highlight them in the sample result.
Here is a sample result of clusterdump:
:VL-1116{n=130 c=[0M:0.004, 0NK:0.002, 0R:0.002, A:0.064, A0R:0.002, AF:0.003, AFN:0.001, AFR:0.007,
Top Terms:
KL                                      => 0.07769706134287822
A                                       => 0.06352121615635654
M                                       => 0.03582733435331364
H                                       =>0.021431033796627138
S                                       =>0.020221392207273467
MK                                      =>0.019890449212907294
AM                                      =>0.019525728541802116
RT                                      => 0.01929212447882407
LK                                      =>0.017212190989566795
T                                       =>  0.0170344367682126
KT                                      =>0.016276377419697542
KLT                                     =>0.015241326420350601
PT                                      =>0.014453650382358636
AL                                      => 0.01389273778624652
KL AM                                   =>0.013482969807870861
N                                       =>0.012985948754913663
TN                                      =>0.012417229729768981
LF                                      =>0.012310466840955945
SR                                      =>0.012184438393888218
AR                                      =>0.011200981758363569
Weight : [props - optional]:  Point:
1.0: //My therapist literally made me feel 10x worse like wow can I kill myself now? = [A:0.089, ARS:0.130, FL:0.120, KL:0.105, KN:0.111, LK:0.113, LTR:0.126, M:0.103, MT:0.120, N:0.104, S:0.091]
1.0: 67 people killed on the hands of #Assad regime forces in #Syria today. #Lebanon = [AST:0.120, KL:0.103, PPL:0.119, SR:0.110, TT:0.111]
1.0: @GGOD_4life should have kept her company she was there for a minute = [HF:0.104, HR:0.104, KMPN:0.109, KT:0.091, LF:0.102, MNT:0.108, X:0.106, XLT:0.111]
1.0: @Il_Comico91 @lorenzo2168 no,ho nokiaüëç = [AL:0.131, H:0.131, NK:0.145]
1.0: @Necro_Nom_Icon yeah man like don't be messing with my chakras or i'll kill your vibe too huffff = [A:0.087, AKN:0.123, AR:0.097, KL:0.101, L:0.108, LK:0.109, M:0.089, MN:0.112, MS:0.123, NM:0.124, T:0.085, TN:0.111, TN T:0.119]
1.0: @SEC_Logo @DanWetzel The @NCAA has no teeth!  No bite.  I guess they could suspend all college football for a year = [A:0.077, AL:0.101, AR:0.098, FTPL:0.111, H:0.102, KLT:0.114, KS:0.118, LK:0.109, NK:0.115, PT:0.108, SK:0.114]
1.0: @Sir_Jaydee baba no vex, Godwon kill sinzu, let's not argue . = [ARK:0.121, KL:0.103, LT:0.117, PP:0.124, S:0.091, SR:0.111]
1.0: @Vi0lent_Y0uth @2_AZREAL_9 {Lands behind Noah} No I won't hurt her, but I could kill her. = [A:0.109, AN:0.121, F:0.121, HR:0.151, HRT:0.140, KL:0.120, KLT:0.136, N:0.118, T:0.088]

Thanks
maryam