You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Rajan Gupta <ra...@gmail.com> on 2013/06/24 09:09:10 UTC

Need Help in Clustering

Hi,
I am new to mahout.

i have text data in fomat as

Id,age,income,perwt,sex,city,product
1,23,2200,40,2,Boston,product #1

I want to perform kmeans clustering based on 2 feilds that is age and
income.And i also want perform in specific number of clusters.

I have already performed clustering by changing file into sequence > vector
files but i get empty file while performing clusterdump.I guess their is
something wrong in the way the class are written and the way my input file
is.

Can anyone help me how to do this.

Thanks is advance
Rajan Gupta

Re: Need Help in Clustering

Posted by Ted Dunning <te...@gmail.com>.
On Mon, Jun 24, 2013 at 12:14 PM, Rajan Gupta <ra...@gmail.com>wrote:

> Do i need to create custom code for this, if yes do help me
>

Yes.  You definitely need custom code for this.

You also need to think about your data and why you want clusters.

What does age mean to a cluster?  Are people with the same age supposed to
be the same in some sense?  What does 5 years difference mean?  Is the
distance from 20 to 25 the same as the different between 55 and 60?

What about city?  How many cities are there?  Do you have any sense of
which cities are more like some than others?

What about income?  Should perhaps use log(income) for computing distances?

What is "perwt"?

Why is there just one product per line?  What products are more similar
than others?

Re: Need Help in Clustering

Posted by Rajan Gupta <ra...@gmail.com>.
Thanks for your response

yes,I get clustered points after running Kmeans. I have done clustering
sucessfully  with 20newsdata and reuters data.Clusterdump also works
properly with above stated examples.
Now,
i have text data in fomat as

Id,age,income,perwt,sex,city,product
1,23,2200,40,2,Boston,product #1

--------------------------

i want to have ouput as

"Id",'age",'income","perwt","sex","city","product","cluster"
1,23,2200,25,2,"Boston","product #1",1
2,26,6600,30,1,"New york","product #5",3
3,24,4400,48,2,"Portland","product #24",2
4,29,9900,60,1,"San Jose","product #70",4

Can anyone help...


Do i need to create custom code for this, if yes do help me

Thanks In advance,

Regards,
Rajan Gupta



On Mon, Jun 24, 2013 at 12:46 PM, Suneel Marthi <su...@yahoo.com>wrote:

> How are u converting your data to sequencefile?
> If you are not sure check this link:
> http://stackoverflow.com/questions/13663567/mahout-csv-to-vector-and-running-the-program
>
> Are you getting any clusteredpoints after running k-means?
>
> It would help if you could list the commands you had executed for
> troubleshooting.
>
>
>
> ________________________________
>  From: Rajan Gupta <ra...@gmail.com>
> To: dev@mahout.apache.org
> Sent: Monday, June 24, 2013 3:09 AM
> Subject: Need Help in Clustering
>
>
> Hi,
> I am new to mahout.
>
> i have text data in fomat as
>
> Id,age,income,perwt,sex,city,product
> 1,23,2200,40,2,Boston,product #1
>
> I want to perform kmeans clustering based on 2 feilds that is age and
> income.And i also want perform in specific number of clusters.
>
> I have already performed clustering by changing file into sequence > vector
> files but i get empty file while performing clusterdump.I guess their is
> something wrong in the way the class are written and the way my input file
> is.
>
> Can anyone help me how to do this.
>
> Thanks is advance
> Rajan Gupta
>

Re: Need Help in Clustering

Posted by Suneel Marthi <su...@yahoo.com>.
How are u converting your data to sequencefile?  
If you are not sure check this link: http://stackoverflow.com/questions/13663567/mahout-csv-to-vector-and-running-the-program

Are you getting any clusteredpoints after running k-means?

It would help if you could list the commands you had executed for troubleshooting.



________________________________
 From: Rajan Gupta <ra...@gmail.com>
To: dev@mahout.apache.org 
Sent: Monday, June 24, 2013 3:09 AM
Subject: Need Help in Clustering
 

Hi,
I am new to mahout.

i have text data in fomat as

Id,age,income,perwt,sex,city,product
1,23,2200,40,2,Boston,product #1

I want to perform kmeans clustering based on 2 feilds that is age and
income.And i also want perform in specific number of clusters.

I have already performed clustering by changing file into sequence > vector
files but i get empty file while performing clusterdump.I guess their is
something wrong in the way the class are written and the way my input file
is.

Can anyone help me how to do this.

Thanks is advance
Rajan Gupta