You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by rmx <ru...@hotmail.com> on 2010/11/17 22:28:52 UTC

k-means output missing some cluster centers coordinates

Hi.

I am running k-means over a numerical dataset with 41 variables.

The output is missing  some cluster centers coordinates.

For example in this center it misses the coordinates number 6, 8, 10, 12,
13, 14,...:
"VL-489651{n=290923 c=[0:0.081, 1:2.931, 2:8.846, 3:1.000, 4:1009.936,
5:10.337, 7:0.000, 9:0.000, 11:0.033, 15:0.000, 16:0.001, 17:0.000,
18:0.000, 22:489.476, 23:489.492, 24:0.000, 25:0.000, 26:0.000, 27:0.000,
28:0.999, 29:0.001, 30:0.013, 31:250.361, 32:250.280, 33:0.986, 34:0.002,
35:0.969, 36:0.000, 37:0.000, 38:0.000, 39:0.000, 40:0.000, 41:8.723]
r=[0:2.377, 1:0.366, 2:1.750, 3:0.044, 4:949.196, 5:69.735, 7:0.009,
9:0.010, 11:0.179, 15:0.009, 16:0.031, 17:0.011, 18:0.020, 22:94.125,
23:94.032, 24:0.005, 25:0.005, 26:0.003, 27:0.003, 28:0.019, 29:0.031,
30:0.112, 31:28.323, 32:26.883, 33:0.087, 34:0.024, 35:0.169, 36:0.004,
37:0.011, 38:0.005, 39:0.014, 40:0.002, 41:1.470]}"

Thank you in advance
-- 
View this message in context: http://lucene.472066.n3.nabble.com/k-means-output-missing-some-cluster-centers-coordinates-tp1919928p1919928.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: k-means output missing some cluster centers coordinates

Posted by Matt Tanquary <ma...@gmail.com>.
I believe your thought is correct although I don't think it's precise to
assume any rounding. It may just be truncation for the purpose of displaying
the values. You might want to look at the code that displays your output to
see if you can increase the precision that's displayed.

On Thu, Nov 18, 2010 at 3:23 AM, rmx <ru...@hotmail.com> wrote:

>
> That is what I thought at first place.
> But if you check the value of the 7th coordinate you will see that is
> 0.000.
> This made me very confuse. It seems that it is still not presenting the
> results on a sparse way.
> A justification maybe that the value of the 7th is not zero but it was
> rounded to 0.000 in the output.
> Someone can please confirm if my justification is correct?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/k-means-output-missing-some-cluster-centers-coordinates-tp1919928p1923072.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>



-- 
Have you thanked a teacher today? ---> http://www.liftateacher.org

Re: k-means output missing some cluster centers coordinates

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Here is a quick walkthrough for doing kmeans clustering and looking at 
the input and output.
https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
Be aware that some command line params have changed since it was written 
for 0.6. For instance -s has changed to -i in some cases (as I recall). 
Also clusterdump needs an output file now so will not output to the 
terminal. When in doubt try the command with no params to get help.

The mahout documentation needs a bit of cleanup. Too see all the 
available docs try the "view in hierarchy" format for the 
cwiki.apache.org here it shows some docs not linked to in any other ways 
I can find.
https://cwiki.apache.org/confluence/pages/listpages-dirview.action?key=MAHOUT&openId=74539#selectedPageInHierarchy

Also I highly recommend Mahout in Action by Manning press.

On 7/20/12 1:59 AM, Videnova, Svetlana wrote:
> That's a very good question, I was expecting an answer too...
>
> That was the answer giver to me from mahout users:
> " the type of input and output depends on the job you want to run."
>
> I was clustering .txt files for the moment.
>
> -----Message d'origine-----
> De : shriram [mailto:ghai12000@gmail.com]
> Envoyé : vendredi 20 juillet 2012 10:52
> À : mahout-user@lucene.apache.org
> Objet : RE: k-means output missing some cluster centers coordinates
>
> what should be the input format for mahout??? can anybody tell me.. I'm confused.. I'm not able to make head or tail out of the output that I'm getting
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/k-means-output-missing-some-cluster-centers-coordinates-tp1919928p3996138.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
>
>
>



RE: k-means output missing some cluster centers coordinates

Posted by "Videnova, Svetlana" <sv...@logica.com>.
That's a very good question, I was expecting an answer too...

That was the answer giver to me from mahout users:
" the type of input and output depends on the job you want to run."

I was clustering .txt files for the moment.

-----Message d'origine-----
De : shriram [mailto:ghai12000@gmail.com] 
Envoyé : vendredi 20 juillet 2012 10:52
À : mahout-user@lucene.apache.org
Objet : RE: k-means output missing some cluster centers coordinates

what should be the input format for mahout??? can anybody tell me.. I'm confused.. I'm not able to make head or tail out of the output that I'm getting



--
View this message in context: http://lucene.472066.n3.nabble.com/k-means-output-missing-some-cluster-centers-coordinates-tp1919928p3996138.html
Sent from the Mahout User List mailing list archive at Nabble.com.


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.



RE: k-means output missing some cluster centers coordinates

Posted by shriram <gh...@gmail.com>.
what should be the input format for mahout??? can anybody tell me.. I'm
confused.. I'm not able to make head or tail out of the output that I'm
getting



--
View this message in context: http://lucene.472066.n3.nabble.com/k-means-output-missing-some-cluster-centers-coordinates-tp1919928p3996138.html
Sent from the Mahout User List mailing list archive at Nabble.com.

RE: k-means output missing some cluster centers coordinates

Posted by Jeff Eastman <je...@Narus.com>.
+1 The formatting only prints the first 3 significant digits so any values less than that will look like 0.000.

-----Original Message-----
From: rmx [mailto:ruimaximo@hotmail.com] 
Sent: Thursday, November 18, 2010 2:24 AM
To: mahout-user@lucene.apache.org
Subject: Re: k-means output missing some cluster centers coordinates


That is what I thought at first place.
But if you check the value of the 7th coordinate you will see that is 0.000.
This made me very confuse. It seems that it is still not presenting the
results on a sparse way. 
A justification maybe that the value of the 7th is not zero but it was
rounded to 0.000 in the output.
Someone can please confirm if my justification is correct?

-- 
View this message in context: http://lucene.472066.n3.nabble.com/k-means-output-missing-some-cluster-centers-coordinates-tp1919928p1923072.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: k-means output missing some cluster centers coordinates

Posted by rmx <ru...@hotmail.com>.
That is what I thought at first place.
But if you check the value of the 7th coordinate you will see that is 0.000.
This made me very confuse. It seems that it is still not presenting the
results on a sparse way. 
A justification maybe that the value of the 7th is not zero but it was
rounded to 0.000 in the output.
Someone can please confirm if my justification is correct?

-- 
View this message in context: http://lucene.472066.n3.nabble.com/k-means-output-missing-some-cluster-centers-coordinates-tp1919928p1923072.html
Sent from the Mahout User List mailing list archive at Nabble.com.

RE: k-means output missing some cluster centers coordinates

Posted by Jeff Eastman <je...@Narus.com>.
Clustering uses sparse vectors by default. The missing coordinate values must be zeros.

-----Original Message-----
From: rmx [mailto:ruimaximo@hotmail.com] 
Sent: Wednesday, November 17, 2010 1:29 PM
To: mahout-user@lucene.apache.org
Subject: k-means output missing some cluster centers coordinates


Hi.

I am running k-means over a numerical dataset with 41 variables.

The output is missing  some cluster centers coordinates.

For example in this center it misses the coordinates number 6, 8, 10, 12,
13, 14,...:
"VL-489651{n=290923 c=[0:0.081, 1:2.931, 2:8.846, 3:1.000, 4:1009.936,
5:10.337, 7:0.000, 9:0.000, 11:0.033, 15:0.000, 16:0.001, 17:0.000,
18:0.000, 22:489.476, 23:489.492, 24:0.000, 25:0.000, 26:0.000, 27:0.000,
28:0.999, 29:0.001, 30:0.013, 31:250.361, 32:250.280, 33:0.986, 34:0.002,
35:0.969, 36:0.000, 37:0.000, 38:0.000, 39:0.000, 40:0.000, 41:8.723]
r=[0:2.377, 1:0.366, 2:1.750, 3:0.044, 4:949.196, 5:69.735, 7:0.009,
9:0.010, 11:0.179, 15:0.009, 16:0.031, 17:0.011, 18:0.020, 22:94.125,
23:94.032, 24:0.005, 25:0.005, 26:0.003, 27:0.003, 28:0.019, 29:0.031,
30:0.112, 31:28.323, 32:26.883, 33:0.087, 34:0.024, 35:0.169, 36:0.004,
37:0.011, 38:0.005, 39:0.014, 40:0.002, 41:1.470]}"

Thank you in advance
-- 
View this message in context: http://lucene.472066.n3.nabble.com/k-means-output-missing-some-cluster-centers-coordinates-tp1919928p1919928.html
Sent from the Mahout User List mailing list archive at Nabble.com.