You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by "VanIngen, Erik (FIPS)" <Er...@fao.org> on 2010/10/20 10:04:58 UTC

[math] EuclideanIntegerPoint EuclideanDoublePoint

Good morning!

I need to to cluster analysis on values like this:
1.814263985     -0.633923297
2.501153739     -0.559033358
2.408755862     -0.509902975
1.935495243     -0.330554484
0.728818279     -0.169024633
-0.523861032    0.110392311

I can use EuclideanIntegerPoint, but than I have to convert the values to integers and would loose precission. So my trick would be to multiply with 1000, cluster and multiply the values with 0.001. Would that be a valid approach from a methodology point of view?

Are there any plans to develop a EuclideanDoublePoint?

Cheers,
Erik van Ingen

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] EuclideanIntegerPoint EuclideanDoublePoint

Posted by Luc Maisonobe <Lu...@free.fr>.

Le 20/10/2010 10:04, VanIngen, Erik (FIPS) a écrit :
> Good morning!
> 
> I need to to cluster analysis on values like this:
> 1.814263985     -0.633923297
> 2.501153739     -0.559033358
> 2.408755862     -0.509902975
> 1.935495243     -0.330554484
> 0.728818279     -0.169024633
> -0.523861032    0.110392311
> 
> I can use EuclideanIntegerPoint, but than I have to convert the values to integers and would loose precission. So my trick would be to multiply with 1000, cluster and multiply the values with 0.001. Would that be a valid approach from a methodology point of view?
> 
> Are there any plans to develop a EuclideanDoublePoint?

The K-means++ clusterer can handle any implementation of the Clusterable
interface. The intent is to allow users to provide their own class to
suit their needs. The EuclideanIntegerPoint can be seen as a simple
reference implementation. There are no plans to add other
implementations yet.

In order to avoid data duplication, I would suggest that your existing
class that already holds the values implements the Clusterable interface
by itself. This way, you can directly provide your own data to K-means++.

Hope this helps
Luc

> 
> Cheers,
> Erik van Ingen
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] EuclideanIntegerPoint EuclideanDoublePoint

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Oct 20, 2010 at 1:04 AM, VanIngen, Erik (FIPS) <
Erik.VanIngen@fao.org> wrote:

> Good morning!
>
> I need to to cluster analysis on values like this:
> 1.814263985     -0.633923297
> 2.501153739     -0.559033358
> 2.408755862     -0.509902975
> 1.935495243     -0.330554484
> 0.728818279     -0.169024633
> -0.523861032    0.110392311
>
> I can use EuclideanIntegerPoint, but than I have to convert the values to
> integers and would loose precission. So my trick would be to multiply with
> 1000, cluster and multiply the values with 0.001. Would that be a valid
> approach from a methodology point of view?
>

Numerically, this approach will often be a disaster.  I wouldn't recommend
it.


>
> Are there any plans to develop a EuclideanDoublePoint?
>
> Apache Mahout has a bunch of clustering code that you could use.  It isn't
limited to two dimensions, either, as a EDP might be.

RE: KMeansPlusPlusClusterer breaks on division by zero

Posted by "VanIngen, Erik (FIPS)" <Er...@fao.org>.

FYI

Issue has been resolved by Apache within 26 hours:
https://issues.apache.org/jira/browse/MATH-429

Thanks a lot!


-----Original Message-----
From: VanIngen, Erik (FIPS)
Sent: 22 October 2010 10:10
To: 'Commons Users List'
Cc: Ellenbroek, Anton (FIPS); Calderini, Francesco (FIPS); Grainger, Richard (FIPS); Sibeni, Fabrizio (FIPS)
Subject: KMeansPlusPlusClusterer breaks on division by zero


Good morning,

I just have filed a bugreport in JIRA: https://issues.apache.org/jira/browse/MATH-429

Kind Regards,
Erik van Ingen

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

KMeansPlusPlusClusterer breaks on division by zero

Posted by "VanIngen, Erik (FIPS)" <Er...@fao.org>.

Good morning,

I just have filed a bugreport in JIRA:
https://issues.apache.org/jira/browse/MATH-429

Kind Regards,
Erik van Ingen

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org