You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by sr...@apache.org on 2012/10/26 17:49:47 UTC

svn commit: r1402553 - /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java

Author: srowen
Date: Fri Oct 26 15:49:47 2012
New Revision: 1402553

URL: http://svn.apache.org/viewvc?rev=1402553&view=rev
Log:
Fix possible NaN issue in Euclidean distance, per http://stackoverflow.com/questions/13089214/nan-distances-in-mahout-euclidean-implementation

Modified:
    mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java

Modified: mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java
URL: http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java?rev=1402553&r1=1402552&r2=1402553&view=diff
==============================================================================
--- mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java (original)
+++ mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java Fri Oct 26 15:49:47 2012
@@ -46,7 +46,9 @@ public class EuclideanDistanceSimilarity
 
   @Override
   public double similarity(double dots, double normA, double normB, int numberOfColumns) {
-    double euclideanDistance = Math.sqrt(normA - 2 * dots + normB);
+    // Arg can't be negative in theory, but can in practice due to rounding, so cap it.
+    // Also note that normA / normB are actually the squares of the norms.
+    double euclideanDistance = Math.sqrt(Math.max(0.0, normA - 2 * dots + normB));
     return 1.0 / (1.0 + euclideanDistance);
   }
 



Re: svn commit: r1402553 - /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java

Posted by Ted Dunning <te...@gmail.com>.
Yeah.. This will prevent negative squared distances and is probably OK.

On Fri, Oct 26, 2012 at 4:54 PM, Sean Owen <sr...@gmail.com> wrote:

> It might matter a little -- given how that particular computation is
> structured, the original vectors aren't available any more and the
> alternative would be a bunch of recalculation anyway. I think the
> speed / elegance benefit probably trumps precision issues.
>
> At least -- stare decisis, that's how it had always been anyway, this
> was just fixing round-off errors. Which is I suppose exactly what you
> mean.
>
> On Fri, Oct 26, 2012 at 9:48 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > I am not sure if this matters in this context, but using this formula
> will
> > lose precision for very near points.  That can affect ordering in the
> limit.
> >
> > By lose precision, I mean it can degrade to 7-8 sig figs instead of 16 or
> > so.  I doubt this matters, but I wouldn't know if it does.
>

Re: svn commit: r1402553 - /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java

Posted by Sean Owen <sr...@gmail.com>.
It might matter a little -- given how that particular computation is
structured, the original vectors aren't available any more and the
alternative would be a bunch of recalculation anyway. I think the
speed / elegance benefit probably trumps precision issues.

At least -- stare decisis, that's how it had always been anyway, this
was just fixing round-off errors. Which is I suppose exactly what you
mean.

On Fri, Oct 26, 2012 at 9:48 PM, Ted Dunning <te...@gmail.com> wrote:
> I am not sure if this matters in this context, but using this formula will
> lose precision for very near points.  That can affect ordering in the limit.
>
> By lose precision, I mean it can degrade to 7-8 sig figs instead of 16 or
> so.  I doubt this matters, but I wouldn't know if it does.

Fwd: svn commit: r1402553 - /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java

Posted by Ted Dunning <te...@gmail.com>.
I am not sure if this matters in this context, but using this formula will
lose precision for very near points.  That can affect ordering in the limit.

By lose precision, I mean it can degrade to 7-8 sig figs instead of 16 or
so.  I doubt this matters, but I wouldn't know if it does.

---------- Forwarded message ----------
From: <sr...@apache.org>
Date: Fri, Oct 26, 2012 at 11:49 AM
Subject: svn commit: r1402553 -
/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java
To: commits@mahout.apache.org


Author: srowen
Date: Fri Oct 26 15:49:47 2012
New Revision: 1402553

URL: http://svn.apache.org/viewvc?rev=1402553&view=rev
Log:
Fix possible NaN issue in Euclidean distance, per
http://stackoverflow.com/questions/13089214/nan-distances-in-mahout-euclidean-implementation

Modified:

mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java

Modified:
mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java
URL:
http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java?rev=1402553&r1=1402552&r2=1402553&view=diff
==============================================================================
---
mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java
(original)
+++
mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java
Fri Oct 26 15:49:47 2012
@@ -46,7 +46,9 @@ public class EuclideanDistanceSimilarity

   @Override
   public double similarity(double dots, double normA, double normB, int
numberOfColumns) {
-    double euclideanDistance = Math.sqrt(normA - 2 * dots + normB);
+    // Arg can't be negative in theory, but can in practice due to
rounding, so cap it.
+    // Also note that normA / normB are actually the squares of the norms.
+    double euclideanDistance = Math.sqrt(Math.max(0.0, normA - 2 * dots +
normB));
     return 1.0 / (1.0 + euclideanDistance);
   }