You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by siddharth0ece <si...@gmail.com> on 2012/05/19 09:46:41 UTC

How to make normal Text suitable for Kmeans using mahout

Friends,

I have a .txt file with so many keywords, it is in normal notepad text
format. I wanted to use Mahout Kmeans to cluster similar type of keywords
together. Can you please help on how to go about this, I have been doing lot
of search but have no idea how to do it. Please help me urgently, how shall
I convert this text file in mahout friendly format and go about this.

I will be highly thankful for your help.

Regards
Siddharth

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-make-normal-Text-suitable-for-Kmeans-using-mahout-tp3984839.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: How to make normal Text suitable for Kmeans using mahout

Posted by vybe3142 <vy...@gmail.com>.
See
https://cwiki.apache.org/MAHOUT/quick-tour-of-text-analysis-using-the-mahout-command-line.html

and

https://cwiki.apache.org/MAHOUT/k-means-clustering.html . Study the shell
script referenced in the link.

Hope that helps



--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-make-normal-Text-suitable-for-Kmeans-using-mahout-tp3984839p4036002.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: How to make normal Text suitable for Kmeans using mahout

Posted by Paritosh Ranjan <pr...@xebia.com>.
"to cluster similar type of keywords together"

How do you know which keywords are how much similar? Do you already have 
that similarity information between keywords or you want to find similar 
Strings i.e abc similar to abd?

On 19-05-2012 13:16, siddharth0ece wrote:
> Friends,
>
> I have a .txt file with so many keywords, it is in normal notepad text
> format. I wanted to use Mahout Kmeans to cluster similar type of keywords
> together. Can you please help on how to go about this, I have been doing lot
> of search but have no idea how to do it. Please help me urgently, how shall
> I convert this text file in mahout friendly format and go about this.
>
> I will be highly thankful for your help.
>
> Regards
> Siddharth
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-to-make-normal-Text-suitable-for-Kmeans-using-mahout-tp3984839.html
> Sent from the Mahout User List mailing list archive at Nabble.com.


Re: How to make normal Text suitable for Kmeans using mahout

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Siddharth,

              Your question is not very clear on what (either
keywords/documents) you want to cluster.
              BTW if you are looking for document clustering its straight
approach using the term/keyword weights and you can find documentation in
Mahout in Action or some other links on how to do this.
              Through vectorization process you have to convert your input
data into vectors which Mahout understands.

              If you are looking for Keyword clustering then probably you
need to identify certain features which could be helpful for finding
keyword clusters for example, whether a keyword is a NOUN, VERB, ADJECTIVE
etc and the synonyms associated with a word etc based on your requirement.
              After feature selection you need to create vectors associated
with each keyword and your vector can contain the values for all your
identified features.
              Finally you can pass through these vectors to K-Means
clustering algorithm in order to get keyword clusters.
              You can have better documentation in Mahout in Action on
clustering documents.

Best,
Mahesh Balija,
CalsoftLabs.

On Sat, May 19, 2012 at 1:16 PM, siddharth0ece <si...@gmail.com>wrote:

> Friends,
>
> I have a .txt file with so many keywords, it is in normal notepad text
> format. I wanted to use Mahout Kmeans to cluster similar type of keywords
> together. Can you please help on how to go about this, I have been doing
> lot
> of search but have no idea how to do it. Please help me urgently, how shall
> I convert this text file in mahout friendly format and go about this.
>
> I will be highly thankful for your help.
>
> Regards
> Siddharth
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-make-normal-Text-suitable-for-Kmeans-using-mahout-tp3984839.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>