You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Karl Wettin (JIRA)" <ji...@apache.org> on 2008/04/14 20:33:05 UTC

[jira] Created: (MAHOUT-42) Tanimoto coefficient distance measure

Tanimoto coefficient distance measure
-------------------------------------

                 Key: MAHOUT-42
                 URL: https://issues.apache.org/jira/browse/MAHOUT-42
             Project: Mahout
          Issue Type: New Feature
            Reporter: Karl Wettin
            Assignee: Karl Wettin


http://en.wikipedia.org/wiki/Jaccard_index

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-42) Tanimoto coefficient distance measure

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590599#action_12590599 ] 

Karl Wettin commented on MAHOUT-42:
-----------------------------------

{quote}
This is probably not what you had in mind.
This is the kind of bug you would expect to show up if we have vector0=sparsevector{1,0,2} and vector1=sparsevector{3,3,2}
You probably expect a2=5 and b2=20, but you will get a2=14, b2=11
{quote}

Ahh, right. Thanks!

The new patch should fix that. 

> Tanimoto coefficient distance measure
> -------------------------------------
>
>                 Key: MAHOUT-42
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-42
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>         Attachments: MAHOUT-42.txt, MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-42) Tanimoto coefficient distance measure

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588693#action_12588693 ] 

Karl Wettin commented on MAHOUT-42:
-----------------------------------

There must be a much better solution than using a Set<Feature> as I do here.

> Tanimoto coefficient distance measure
> -------------------------------------
>
>                 Key: MAHOUT-42
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-42
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>         Attachments: MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-42) Tanimoto coefficient distance measure

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin updated MAHOUT-42:
------------------------------

    Attachment: MAHOUT-42.txt

> Tanimoto coefficient distance measure
> -------------------------------------
>
>                 Key: MAHOUT-42
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-42
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>         Attachments: MAHOUT-42.txt, MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-42) Tanimoto coefficient distance measure

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin updated MAHOUT-42:
------------------------------

    Attachment: MAHOUT-42.txt

> Tanimoto coefficient distance measure
> -------------------------------------
>
>                 Key: MAHOUT-42
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-42
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>         Attachments: MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (MAHOUT-42) Tanimoto coefficient distance measure

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin closed MAHOUT-42.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.1

> Tanimoto coefficient distance measure
> -------------------------------------
>
>                 Key: MAHOUT-42
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-42
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>             Fix For: 0.1
>
>         Attachments: MAHOUT-42.txt, MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-42) Tanimoto coefficient distance measure

Posted by "Samee Zahur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589396#action_12589396 ] 

Samee Zahur commented on MAHOUT-42:
-----------------------------------

I think what you are doing here is exactly what I tried to hide when designing VectorPair in MAHOUT-34, which basically did the same thing. (maybe the committers were hoping for a more general solution).

In any case: Say when you call calculate(vector0,vector1), feature 0 and 2 gets visited. So variables are now:

{noformat}
a2 = vector0[0]^2 + vector0[2]^2
b2 = vector1[0]^2 + vector1[2]^2
{noformat}

Then when you call calculate(vector1,vector0), lets say feature 1 gets visited. But the method was invoked with parameters reversed, so variables now get these values:

{noformat}
a2 = vector0[0]^2 + vector0[2]^2 + vector1[1]^2
b2 = vector1[0]^2 + vector1[2]^2 + vector0[1]^2
{noformat}

This is probably not what you had in mind.
This is the kind of bug you would expect to show up if we have vector0=sparsevector{1,0,2} and vector1=sparsevector{3,3,2}
You probably expect a2=5 and b2=20, but you will get a2=14, b2=11

> Tanimoto coefficient distance measure
> -------------------------------------
>
>                 Key: MAHOUT-42
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-42
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>         Attachments: MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.