You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Sengupta, Sohini IN BLR SISL" <so...@siemens.com> on 2011/03/23 10:52:11 UTC

A very basic question

Hi everyone,
Can anyone advise me how to create an n-dimensional feature vector for running meanshift clustering on Mahout.
I have around 5000 dimensional 60,000 datapoints.

Any pointers / references will be helpful.
Thanks,
Sohini

________________________________
Important notice: This e-mail and any attachment there to contains corporate proprietary information. If you have received it by mistake, please notify us immediately by reply e-mail and delete this e-mail and its attachments from your system.
Thank You.

Re: A very basic question

Posted by Grant Ingersoll <gr...@gmail.com>.
On Mar 23, 2011, at 11:17 PM, Sengupta, Sohini IN BLR SISL wrote:

> Hi Grant,
> This data is of type "double" e.g. am attaching a part of the feature vector below:
> 
> -0.05212,  0.03613,  0.00007, -0.01740, -0.00507, -0.01592, -0.00183, -0.00275, -0.00718, -0.01380,  0.00535,  0.00303, -0.00592, -0.00254, -0.03029,  0.00042, -0.00261, -0.00585, -0.00380,  0.01916, -0.01359, -0.00324,  0.00225,  0.00113, -0.00261,  0.00134,  0.00239, -0.00077, -0.00056,  0.02134,  0.00247, -0.00275, -0.00113, -0.00345, -0.01380, .................

Probably the easiest way is to programmatically build them.  See the VectorIterable classes, for instance in the utils package.  For instance, in there is an example of building and writing Mahout vectors from Lucene.  Shouldn't be too hard to do from this format either.

> 
> Thanks and regards,
> Sohini
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com]
> Sent: Thursday, March 24, 2011 12:38 AM
> To: user@mahout.apache.org
> Cc: Sengupta, Sohini IN BLR SISL
> Subject: Re: A very basic question
> 
> Hi Sohini,
> 
> Where is your data coming from and what kind of data is it?
> 
> On Mar 23, 2011, at 5:52 AM, Sengupta, Sohini IN BLR SISL wrote:
> 
>> Hi everyone,
>> Can anyone advise me how to create an n-dimensional feature vector for running meanshift clustering on Mahout.
>> I have around 5000 dimensional 60,000 datapoints.
>> 
>> Any pointers / references will be helpful.
>> Thanks,
>> Sohini
>> 
>> ________________________________
>> Important notice: This e-mail and any attachment there to contains corporate proprietary information. If you have received it by mistake, please notify us immediately by reply e-mail and delete this e-mail and its attachments from your system.
>> Thank You.
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> Important notice: This e-mail and any attachment there to contains corporate proprietary information. If you have received it by mistake, please notify us immediately by reply e-mail and delete this e-mail and its attachments from your system.
> Thank You.


RE: A very basic question

Posted by "Sengupta, Sohini IN BLR SISL" <so...@siemens.com>.
Hi Grant,
This data is of type "double" e.g. am attaching a part of the feature vector below:

-0.05212,  0.03613,  0.00007, -0.01740, -0.00507, -0.01592, -0.00183, -0.00275, -0.00718, -0.01380,  0.00535,  0.00303, -0.00592, -0.00254, -0.03029,  0.00042, -0.00261, -0.00585, -0.00380,  0.01916, -0.01359, -0.00324,  0.00225,  0.00113, -0.00261,  0.00134,  0.00239, -0.00077, -0.00056,  0.02134,  0.00247, -0.00275, -0.00113, -0.00345, -0.01380, .................

Thanks and regards,
Sohini

-----Original Message-----
From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com]
Sent: Thursday, March 24, 2011 12:38 AM
To: user@mahout.apache.org
Cc: Sengupta, Sohini IN BLR SISL
Subject: Re: A very basic question

Hi Sohini,

Where is your data coming from and what kind of data is it?

On Mar 23, 2011, at 5:52 AM, Sengupta, Sohini IN BLR SISL wrote:

> Hi everyone,
> Can anyone advise me how to create an n-dimensional feature vector for running meanshift clustering on Mahout.
> I have around 5000 dimensional 60,000 datapoints.
>
> Any pointers / references will be helpful.
> Thanks,
> Sohini
>
> ________________________________
> Important notice: This e-mail and any attachment there to contains corporate proprietary information. If you have received it by mistake, please notify us immediately by reply e-mail and delete this e-mail and its attachments from your system.
> Thank You.

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


Important notice: This e-mail and any attachment there to contains corporate proprietary information. If you have received it by mistake, please notify us immediately by reply e-mail and delete this e-mail and its attachments from your system.
Thank You.

Re: A very basic question

Posted by Grant Ingersoll <gr...@gmail.com>.
Hi Sohini,

Where is your data coming from and what kind of data is it?

On Mar 23, 2011, at 5:52 AM, Sengupta, Sohini IN BLR SISL wrote:

> Hi everyone,
> Can anyone advise me how to create an n-dimensional feature vector for running meanshift clustering on Mahout.
> I have around 5000 dimensional 60,000 datapoints.
> 
> Any pointers / references will be helpful.
> Thanks,
> Sohini
> 
> ________________________________
> Important notice: This e-mail and any attachment there to contains corporate proprietary information. If you have received it by mistake, please notify us immediately by reply e-mail and delete this e-mail and its attachments from your system.
> Thank You.

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search