You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Faizan(Aroha)" <fa...@arohalabs.net> on 2011/12/19 09:02:48 UTC

Clustering - k-means as a search

Hello,

 

I'm trying to implement k-means as a search.

 

I've performed k-means clustering on a huge dataset.

 

Now if  I have a new (small)dataset or document , how will I determine with
which cluster it belongs?

 

Thanks in advance.

 

 

Regards,

Faizan Shaikh

Aroha Labs(Private) Ltd


Re: Clustering - k-means as a search

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
The KMeansDriver has a method (clusterData) which you can invoke from a 
Java program to cluster (classify) your new data with the old clusters. 
You need to be sure the vectors are the same size (and the elements 
denote the same attributes) for this to work. There is currently no CLI 
to invoke this step independently from the buildClusters (training) step 
and this is indeed under development.

As Paritosh indicates, we are planning to refactor all of the 
clusterData implementations into an independent job so the redundant 
implementations in the various clustering algorithms can be consolidated.

On 12/19/11 3:46 AM, Paritosh Ranjan wrote:
> This feature is in development.
>
> Try using ClusterClassifier. Populate it with the clusters you have as 
> models.
> Then use ClusterIterator with KMeansClusteringPolicy.
>
> Hope it would solve your problem.
>
> On 19-12-2011 15:11, Faizan(Aroha) wrote:
>> Yes you are correct. Do you have any suggestions ?
>>
>> -----Original Message-----
>> From: Paritosh Ranjan [mailto:pranjan@xebia.com]
>> Sent: Monday, December 19, 2011 1:27 PM
>> To: user@mahout.apache.org
>> Subject: Re: Clustering - k-means as a search
>>
>> You want to classify the new vectors (smaller dataset)  with the old
>> clusters ( huge dataset ). Am I correct?
>>
>> Paritosh
>>
>> On 19-12-2011 13:32, Faizan(Aroha) wrote:
>>> Hello,
>>>
>>>
>>>
>>> I'm trying to implement k-means as a search.
>>>
>>>
>>>
>>> I've performed k-means clustering on a huge dataset.
>>>
>>>
>>>
>>> Now if  I have a new (small)dataset or document , how will I determine
>>> with which cluster it belongs?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>>
>>> Faizan Shaikh
>>>
>>> Aroha Labs(Private) Ltd
>>>
>>>
>>>
>>>
>>> -----
>>> No virus found in this message.
>>> Checked by AVG - www.avg.com
>>> Version: 10.0.1415 / Virus Database: 2108/4089 - Release Date:
>>> 12/18/11
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1415 / Virus Database: 2108/4089 - Release Date: 12/18/11
>
>
>


Re: Clustering - k-means as a search

Posted by Paritosh Ranjan <pr...@xebia.com>.
This feature is in development.

Try using ClusterClassifier. Populate it with the clusters you have as 
models.
Then use ClusterIterator with KMeansClusteringPolicy.

Hope it would solve your problem.

On 19-12-2011 15:11, Faizan(Aroha) wrote:
> Yes you are correct. Do you have any suggestions ?
>
> -----Original Message-----
> From: Paritosh Ranjan [mailto:pranjan@xebia.com]
> Sent: Monday, December 19, 2011 1:27 PM
> To: user@mahout.apache.org
> Subject: Re: Clustering - k-means as a search
>
> You want to classify the new vectors (smaller dataset)  with the old
> clusters ( huge dataset ). Am I correct?
>
> Paritosh
>
> On 19-12-2011 13:32, Faizan(Aroha) wrote:
>> Hello,
>>
>>
>>
>> I'm trying to implement k-means as a search.
>>
>>
>>
>> I've performed k-means clustering on a huge dataset.
>>
>>
>>
>> Now if  I have a new (small)dataset or document , how will I determine
>> with which cluster it belongs?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>>
>>
>> Regards,
>>
>> Faizan Shaikh
>>
>> Aroha Labs(Private) Ltd
>>
>>
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1415 / Virus Database: 2108/4089 - Release Date:
>> 12/18/11
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1415 / Virus Database: 2108/4089 - Release Date: 12/18/11


RE: Clustering - k-means as a search

Posted by "Faizan(Aroha)" <fa...@arohalabs.net>.
Yes you are correct. Do you have any suggestions ?

-----Original Message-----
From: Paritosh Ranjan [mailto:pranjan@xebia.com] 
Sent: Monday, December 19, 2011 1:27 PM
To: user@mahout.apache.org
Subject: Re: Clustering - k-means as a search

You want to classify the new vectors (smaller dataset)  with the old
clusters ( huge dataset ). Am I correct?

Paritosh

On 19-12-2011 13:32, Faizan(Aroha) wrote:
> Hello,
>
>
>
> I'm trying to implement k-means as a search.
>
>
>
> I've performed k-means clustering on a huge dataset.
>
>
>
> Now if  I have a new (small)dataset or document , how will I determine 
> with which cluster it belongs?
>
>
>
> Thanks in advance.
>
>
>
>
>
> Regards,
>
> Faizan Shaikh
>
> Aroha Labs(Private) Ltd
>
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1415 / Virus Database: 2108/4089 - Release Date: 
> 12/18/11


Re: Clustering - k-means as a search

Posted by Paritosh Ranjan <pr...@xebia.com>.
You want to classify the new vectors (smaller dataset)  with the old 
clusters ( huge dataset ). Am I correct?

Paritosh

On 19-12-2011 13:32, Faizan(Aroha) wrote:
> Hello,
>
>
>
> I'm trying to implement k-means as a search.
>
>
>
> I've performed k-means clustering on a huge dataset.
>
>
>
> Now if  I have a new (small)dataset or document , how will I determine with
> which cluster it belongs?
>
>
>
> Thanks in advance.
>
>
>
>
>
> Regards,
>
> Faizan Shaikh
>
> Aroha Labs(Private) Ltd
>
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1415 / Virus Database: 2108/4089 - Release Date: 12/18/11