You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Karl Wettin (JIRA)" <ji...@apache.org> on 2008/04/14 20:33:05 UTC
[jira] Created: (MAHOUT-42) Tanimoto coefficient distance measure
Tanimoto coefficient distance measure
-------------------------------------
Key: MAHOUT-42
URL: https://issues.apache.org/jira/browse/MAHOUT-42
Project: Mahout
Issue Type: New Feature
Reporter: Karl Wettin
Assignee: Karl Wettin
http://en.wikipedia.org/wiki/Jaccard_index
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-42) Tanimoto coefficient distance measure
Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590599#action_12590599 ]
Karl Wettin commented on MAHOUT-42:
-----------------------------------
{quote}
This is probably not what you had in mind.
This is the kind of bug you would expect to show up if we have vector0=sparsevector{1,0,2} and vector1=sparsevector{3,3,2}
You probably expect a2=5 and b2=20, but you will get a2=14, b2=11
{quote}
Ahh, right. Thanks!
The new patch should fix that.
> Tanimoto coefficient distance measure
> -------------------------------------
>
> Key: MAHOUT-42
> URL: https://issues.apache.org/jira/browse/MAHOUT-42
> Project: Mahout
> Issue Type: New Feature
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Attachments: MAHOUT-42.txt, MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-42) Tanimoto coefficient distance measure
Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588693#action_12588693 ]
Karl Wettin commented on MAHOUT-42:
-----------------------------------
There must be a much better solution than using a Set<Feature> as I do here.
> Tanimoto coefficient distance measure
> -------------------------------------
>
> Key: MAHOUT-42
> URL: https://issues.apache.org/jira/browse/MAHOUT-42
> Project: Mahout
> Issue Type: New Feature
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Attachments: MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-42) Tanimoto coefficient distance measure
Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wettin updated MAHOUT-42:
------------------------------
Attachment: MAHOUT-42.txt
> Tanimoto coefficient distance measure
> -------------------------------------
>
> Key: MAHOUT-42
> URL: https://issues.apache.org/jira/browse/MAHOUT-42
> Project: Mahout
> Issue Type: New Feature
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Attachments: MAHOUT-42.txt, MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-42) Tanimoto coefficient distance measure
Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wettin updated MAHOUT-42:
------------------------------
Attachment: MAHOUT-42.txt
> Tanimoto coefficient distance measure
> -------------------------------------
>
> Key: MAHOUT-42
> URL: https://issues.apache.org/jira/browse/MAHOUT-42
> Project: Mahout
> Issue Type: New Feature
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Attachments: MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Closed: (MAHOUT-42) Tanimoto coefficient distance measure
Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wettin closed MAHOUT-42.
-----------------------------
Resolution: Fixed
Fix Version/s: 0.1
> Tanimoto coefficient distance measure
> -------------------------------------
>
> Key: MAHOUT-42
> URL: https://issues.apache.org/jira/browse/MAHOUT-42
> Project: Mahout
> Issue Type: New Feature
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Fix For: 0.1
>
> Attachments: MAHOUT-42.txt, MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-42) Tanimoto coefficient distance measure
Posted by "Samee Zahur (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589396#action_12589396 ]
Samee Zahur commented on MAHOUT-42:
-----------------------------------
I think what you are doing here is exactly what I tried to hide when designing VectorPair in MAHOUT-34, which basically did the same thing. (maybe the committers were hoping for a more general solution).
In any case: Say when you call calculate(vector0,vector1), feature 0 and 2 gets visited. So variables are now:
{noformat}
a2 = vector0[0]^2 + vector0[2]^2
b2 = vector1[0]^2 + vector1[2]^2
{noformat}
Then when you call calculate(vector1,vector0), lets say feature 1 gets visited. But the method was invoked with parameters reversed, so variables now get these values:
{noformat}
a2 = vector0[0]^2 + vector0[2]^2 + vector1[1]^2
b2 = vector1[0]^2 + vector1[2]^2 + vector0[1]^2
{noformat}
This is probably not what you had in mind.
This is the kind of bug you would expect to show up if we have vector0=sparsevector{1,0,2} and vector1=sparsevector{3,3,2}
You probably expect a2=5 and b2=20, but you will get a2=14, b2=11
> Tanimoto coefficient distance measure
> -------------------------------------
>
> Key: MAHOUT-42
> URL: https://issues.apache.org/jira/browse/MAHOUT-42
> Project: Mahout
> Issue Type: New Feature
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Attachments: MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.