You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jake Mannix (JIRA)" <ji...@apache.org> on 2009/10/03 23:03:25 UTC

[jira] Commented: (MAHOUT-181) DistanceMeasure is broken: iteration is done over nonZeroElements of v1.plus(v2), not v1.minus(v2)

    [ https://issues.apache.org/jira/browse/MAHOUT-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761955#action_12761955 ] 

Jake Mannix commented on MAHOUT-181:
------------------------------------

*bump*  

Has anyone looked at this?  The patch fixes the bug in all but TanimotoDistanceMeasure, which I didn't fix because I thought that whoever contributed it knew better what they really wanted to do, but if nobody else wants to, I can update the patch to fix that as well, given the correct definition in Wikipedia.  

> DistanceMeasure is broken: iteration is done over nonZeroElements of v1.plus(v2), not v1.minus(v2)
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-181
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-181
>             Project: Mahout
>          Issue Type: Bug
>          Components: Matrix
>    Affects Versions: 0.2
>         Environment: all
>            Reporter: Jake Mannix
>             Fix For: 0.2
>
>         Attachments: MAHOUT-181.patch
>
>
> SquaredEuclideanDistanceMeasure iterates over v1.plus(v2), which has the right number of nonzero elements if v1.get(i) != -v2.get(i) for all i indexing nonzero elements, but for example, the simple case of looking at SquaredEuclideanDisanceMeasure.distance(v, v.assign(new NegateFunction())) yeilds zero on current trunk, instead of 4*v.lengthSquared().
> Attached is a patch with a unit test which checks that DistanceMeasure.distance always returns nonnegative results and in particular also does not return , as well as a fix for ManhattanDistanceMeasure, SquaredEuclideanDistanceMeasure, and EuclideanDistanceMeasure.
> Unfortunately, the attached unit test reveals that the TanimotoDistanceMeasure is more broken than I can fix at present.  It doesn't appear to be properly using the referenced formula in wikipedia, and in fact sometimes returns negative results.  This means that with this patch applied, TestTanimotoDistanceMeasure is failing (and rightfully so).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.