You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Baoqiang Cao <bq...@gmail.com> on 2012/03/19 03:08:44 UTC

empty vector out of clusterdump

Hi,

I used mahout kmeans and then clusterdump. The biggest cluster (number
of members is 844992), here is the result:

VL-1705919{n=844992 c=[] r=[]}
        Top Terms:
        Weight : [props - optional]:  Point:
        1.0 : [distance=0.0]: []
        1.0 : [distance=0.0]: []
        1.0 : [distance=0.0]: []
        1.0 : [distance=0.0]: []
        1.0 : [distance=0.0]: []
        1.0 : [distance=0.0]: []

What does this mean? This whole cluster is made of empty vectors(members)?

Best,
Baoqiang

Re: empty vector out of clusterdump

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Empty? Note that the printouts of Mahout vectors prints only the 
non-zero elements. It looks like you may have had many such zero vectors 
and they were clustered into VL-1705919 which has zero for center and 
radius. If your other clusters look differently, then I think this is 
probably correct.


On 3/20/12 6:10 AM, Baoqiang wrote:
> Yes, I used -cl in kmeans step. It is that the biggest cluster is empty, all others are not empty. I don't know why.
>
> Sent from my iPhone
>
> On Mar 20, 2012, at 1:36 AM, Paritosh Ranjan<pr...@xebia.com>  wrote:
>
>> Did you run kmeans with -cl<run input vector clustering>   option set to "true"?
>>
>>
>> On 19-03-2012 07:38, Baoqiang Cao wrote:
>>> Hi,
>>>
>>> I used mahout kmeans and then clusterdump. The biggest cluster (number
>>> of members is 844992), here is the result:
>>>
>>> VL-1705919{n=844992 c=[] r=[]}
>>>          Top Terms:
>>>          Weight : [props - optional]:  Point:
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>
>>> What does this mean? This whole cluster is made of empty vectors(members)?
>>>
>>> Best,
>>> Baoqiang
>


Re: empty vector out of clusterdump

Posted by Paritosh Ranjan <pr...@xebia.com>.
Can you try cluster output post processor once?

You will find the documentation of how to use it here
https://cwiki.apache.org/MAHOUT/top-down-clustering.html

If you get empty vectors with clusterpp also, then the problem is in the 
clustering step somewhere, else there is some problem in cluster dumper.
It will at least help figure out the problem area.

On 20-03-2012 17:40, Baoqiang wrote:
> Yes, I used -cl in kmeans step. It is that the biggest cluster is empty, all others are not empty. I don't know why.
>
> Sent from my iPhone
>
> On Mar 20, 2012, at 1:36 AM, Paritosh Ranjan<pr...@xebia.com>  wrote:
>
>> Did you run kmeans with -cl<run input vector clustering>   option set to "true"?
>>
>>
>> On 19-03-2012 07:38, Baoqiang Cao wrote:
>>> Hi,
>>>
>>> I used mahout kmeans and then clusterdump. The biggest cluster (number
>>> of members is 844992), here is the result:
>>>
>>> VL-1705919{n=844992 c=[] r=[]}
>>>          Top Terms:
>>>          Weight : [props - optional]:  Point:
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>          1.0 : [distance=0.0]: []
>>>
>>> What does this mean? This whole cluster is made of empty vectors(members)?
>>>
>>> Best,
>>> Baoqiang


Re: empty vector out of clusterdump

Posted by Baoqiang <bq...@gmail.com>.
Yes, I used -cl in kmeans step. It is that the biggest cluster is empty, all others are not empty. I don't know why.

Sent from my iPhone

On Mar 20, 2012, at 1:36 AM, Paritosh Ranjan <pr...@xebia.com> wrote:

> Did you run kmeans with -cl<run input vector clustering>  option set to "true"?
> 
> 
> On 19-03-2012 07:38, Baoqiang Cao wrote:
>> Hi,
>> 
>> I used mahout kmeans and then clusterdump. The biggest cluster (number
>> of members is 844992), here is the result:
>> 
>> VL-1705919{n=844992 c=[] r=[]}
>>         Top Terms:
>>         Weight : [props - optional]:  Point:
>>         1.0 : [distance=0.0]: []
>>         1.0 : [distance=0.0]: []
>>         1.0 : [distance=0.0]: []
>>         1.0 : [distance=0.0]: []
>>         1.0 : [distance=0.0]: []
>>         1.0 : [distance=0.0]: []
>> 
>> What does this mean? This whole cluster is made of empty vectors(members)?
>> 
>> Best,
>> Baoqiang
> 

Re: empty vector out of clusterdump

Posted by Paritosh Ranjan <pr...@xebia.com>.
Did you run kmeans with -cl<run input vector clustering>  option set to "true"?


On 19-03-2012 07:38, Baoqiang Cao wrote:
> Hi,
>
> I used mahout kmeans and then clusterdump. The biggest cluster (number
> of members is 844992), here is the result:
>
> VL-1705919{n=844992 c=[] r=[]}
>          Top Terms:
>          Weight : [props - optional]:  Point:
>          1.0 : [distance=0.0]: []
>          1.0 : [distance=0.0]: []
>          1.0 : [distance=0.0]: []
>          1.0 : [distance=0.0]: []
>          1.0 : [distance=0.0]: []
>          1.0 : [distance=0.0]: []
>
> What does this mean? This whole cluster is made of empty vectors(members)?
>
> Best,
> Baoqiang