You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by sri krishna <kr...@yahoo.com> on 2013/02/01 12:13:41 UTC

intial centriods for fuzzy k means algorithm

Hi,


I have sample set of few documents of each cluster(no of clusters known and also few documents in each cluster are known in advance). How to initialize the centriods with known documents, so that algorithm runs using the given data points as centriods in mahout ?

Re: intial centriods for fuzzy k means algorithm

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Clusters have a constructor that accepts a vector that you can use for this.

On 2/2/13 2:17 PM, sri krishna wrote:
>
> I checked the source code for usage of ClusterWritables to write centriods to a
> sequence file i found out this
>
> SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path,
> Text.class, ClusterWritable.class);
> ClusterWritable clusterWritable = new ClusterWritable();
>
> clusterWritable.setValue(canopy);
> writer.append(new Text(canopy.getIdentifier()), clusterWritable);
>
> heresetValue  expects Cluster, what is the way i can convert a raw
> vector(SequentialAccessSparseVector) to Cluster type ?
>
>
>
> ________________________________
>   From: Jeff Eastman <jd...@windwardsolutions.com>
> To: user@mahout.apache.org
> Sent: Saturday, 2 February 2013 12:18 AM
> Subject: Re: intial centriods for fuzzy k means algorithm
>   
> If you don't specify a -k value but specify a -ci directory that
> contains clusters you want to use for the prior then the ClusterIterator
> will use them for kmeans and fuzzyk. You will need to create one or more
> sequence files containing ClusterWritables to do this.
>
> On 2/1/13 9:08 AM, sri krishna wrote:
>> my question was more like how can i generate new centriods based on the predefined points i give, as in my case i know the number of clusters and also few points in each of the cluster.
>>
>>    
>>
>>
>>
>>
>> ________________________________
>>     From: Rajesh Nikam <ra...@gmail.com>
>> To: user@mahout.apache.org; sri krishna <kr...@yahoo.com>
>> Sent: Friday, 1 February 2013 5:07 PM
>> Subject: Re: intial centriods for fuzzy k means algorithm
>>    
>> you could use canopy clustering from mahout to initialize centroids.
>>
>> Thanks
>> Rajesh
>>
>>
>> On Fri, Feb 1, 2013 at 4:43 PM, sri krishna <kr...@yahoo.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> I have sample set of few documents of each cluster(no of clusters known
>>> and also few documents in each cluster are known in advance). How to
>>> initialize the centriods with known documents, so that algorithm runs using
>>> the given data points as centriods in mahout ?


Re: intial centriods for fuzzy k means algorithm

Posted by sri krishna <kr...@yahoo.com>.

I checked the source code for usage of ClusterWritables to write centriods to a 
sequence file i found out this 

SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path,
Text.class, ClusterWritable.class);
ClusterWritable clusterWritable = new ClusterWritable();

clusterWritable.setValue(canopy);
writer.append(new Text(canopy.getIdentifier()), clusterWritable);

heresetValue  expects Cluster, what is the way i can convert a raw 
vector(SequentialAccessSparseVector) to Cluster type ?



________________________________
 From: Jeff Eastman <jd...@windwardsolutions.com>
To: user@mahout.apache.org 
Sent: Saturday, 2 February 2013 12:18 AM
Subject: Re: intial centriods for fuzzy k means algorithm
 
If you don't specify a -k value but specify a -ci directory that 
contains clusters you want to use for the prior then the ClusterIterator 
will use them for kmeans and fuzzyk. You will need to create one or more 
sequence files containing ClusterWritables to do this.

On 2/1/13 9:08 AM, sri krishna wrote:
> my question was more like how can i generate new centriods based on the predefined points i give, as in my case i know the number of clusters and also few points in each of the cluster.
>
>  
>
>
>
>
> ________________________________
>   From: Rajesh Nikam <ra...@gmail.com>
> To: user@mahout.apache.org; sri krishna <kr...@yahoo.com>
> Sent: Friday, 1 February 2013 5:07 PM
> Subject: Re: intial centriods for fuzzy k means algorithm
>  
> you could use canopy clustering from mahout to initialize centroids.
>
> Thanks
> Rajesh
>
>
> On Fri, Feb 1, 2013 at 4:43 PM, sri krishna <kr...@yahoo.com> wrote:
>
>> Hi,
>>
>>
>> I have sample set of few documents of each cluster(no of clusters known
>> and also few documents in each cluster are known in advance). How to
>> initialize the centriods with known documents, so that algorithm runs using
>> the given data points as centriods in mahout ?

Re: intial centriods for fuzzy k means algorithm

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
If you don't specify a -k value but specify a -ci directory that 
contains clusters you want to use for the prior then the ClusterIterator 
will use them for kmeans and fuzzyk. You will need to create one or more 
sequence files containing ClusterWritables to do this.

On 2/1/13 9:08 AM, sri krishna wrote:
> my question was more like how can i generate new centriods based on the predefined points i give, as in my case i know the number of clusters and also few points in each of the cluster.
>
>   
>
>
>
>
> ________________________________
>   From: Rajesh Nikam <ra...@gmail.com>
> To: user@mahout.apache.org; sri krishna <kr...@yahoo.com>
> Sent: Friday, 1 February 2013 5:07 PM
> Subject: Re: intial centriods for fuzzy k means algorithm
>   
> you could use canopy clustering from mahout to initialize centroids.
>
> Thanks
> Rajesh
>
>
> On Fri, Feb 1, 2013 at 4:43 PM, sri krishna <kr...@yahoo.com> wrote:
>
>> Hi,
>>
>>
>> I have sample set of few documents of each cluster(no of clusters known
>> and also few documents in each cluster are known in advance). How to
>> initialize the centriods with known documents, so that algorithm runs using
>> the given data points as centriods in mahout ?


Re: intial centriods for fuzzy k means algorithm

Posted by sri krishna <kr...@yahoo.com>.
my question was more like how can i generate new centriods based on the predefined points i give, as in my case i know the number of clusters and also few points in each of the cluster.

 




________________________________
 From: Rajesh Nikam <ra...@gmail.com>
To: user@mahout.apache.org; sri krishna <kr...@yahoo.com> 
Sent: Friday, 1 February 2013 5:07 PM
Subject: Re: intial centriods for fuzzy k means algorithm
 
you could use canopy clustering from mahout to initialize centroids.

Thanks
Rajesh


On Fri, Feb 1, 2013 at 4:43 PM, sri krishna <kr...@yahoo.com> wrote:

> Hi,
>
>
> I have sample set of few documents of each cluster(no of clusters known
> and also few documents in each cluster are known in advance). How to
> initialize the centriods with known documents, so that algorithm runs using
> the given data points as centriods in mahout ?

Re: intial centriods for fuzzy k means algorithm

Posted by Rajesh Nikam <ra...@gmail.com>.
you could use canopy clustering from mahout to initialize centroids.

Thanks
Rajesh


On Fri, Feb 1, 2013 at 4:43 PM, sri krishna <kr...@yahoo.com> wrote:

> Hi,
>
>
> I have sample set of few documents of each cluster(no of clusters known
> and also few documents in each cluster are known in advance). How to
> initialize the centriods with known documents, so that algorithm runs using
> the given data points as centriods in mahout ?