You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Allen, Ronald L." <al...@ornl.gov> on 2014/01/31 13:55:24 UTC

Using Mahout to cluster a large CSV file

Hi all,

Has anyone had any success using Mahout kmeans to cluster a data in a single large CSV file?  If so, how did you do it?

Thanks,
Ronnie

Re: Using Mahout to cluster a large CSV file

Posted by Bertrand Dechoux <de...@gmail.com>.
I guess the big (no pun intended) question is what is your definition of a
large CSV.

Bertrand


On Fri, Jan 31, 2014 at 2:17 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Use Mahout's CSVVectorIterator.java to read ur input CSV file and generate
> vectors.
>
> You pass in a java.io.Reader to your CSV file and it generates Dense
> Vectors (from CSV).
>
> U could then feed the generated vectors into KMeans clustering.
>
>
>
>
> On Friday, January 31, 2014 7:55 AM, "Allen, Ronald L." <al...@ornl.gov>
> wrote:
>
> Hi all,
>
> Has anyone had any success using Mahout kmeans to cluster a data in a
> single large CSV file?  If so, how did you do it?
>
> Thanks,
> Ronnie
>

RE: Using Mahout to cluster a large CSV file

Posted by "Allen, Ronald L." <al...@ornl.gov>.
Thank you for the response!

I will try this out and let you know how it goes!
________________________________________
From: Suneel Marthi [suneel_marthi@yahoo.com]
Sent: Friday, January 31, 2014 8:17 AM
To: user@mahout.apache.org
Subject: Re: Using Mahout to cluster a large CSV file

Use Mahout's CSVVectorIterator.java to read ur input CSV file and generate vectors.

You pass in a java.io.Reader to your CSV file and it generates Dense Vectors (from CSV).

U could then feed the generated vectors into KMeans clustering.




On Friday, January 31, 2014 7:55 AM, "Allen, Ronald L." <al...@ornl.gov> wrote:

Hi all,

Has anyone had any success using Mahout kmeans to cluster a data in a single large CSV file?  If so, how did you do it?

Thanks,
Ronnie

Re: Using Mahout to cluster a large CSV file

Posted by Suneel Marthi <su...@yahoo.com>.
Use Mahout's CSVVectorIterator.java to read ur input CSV file and generate vectors.

You pass in a java.io.Reader to your CSV file and it generates Dense Vectors (from CSV).

U could then feed the generated vectors into KMeans clustering.




On Friday, January 31, 2014 7:55 AM, "Allen, Ronald L." <al...@ornl.gov> wrote:
 
Hi all,

Has anyone had any success using Mahout kmeans to cluster a data in a single large CSV file?  If so, how did you do it?

Thanks,
Ronnie