You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sebastian Schelter (Created) (JIRA)" <ji...@apache.org> on 2011/11/29 14:13:41 UTC
[jira] [Created] (MAHOUT-902) TanimotoCoefficientSimilarity should
return Double.NaN for two items that have zero overlap
TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
-------------------------------------------------------------------------------------------
Key: MAHOUT-902
URL: https://issues.apache.org/jira/browse/MAHOUT-902
Project: Mahout
Issue Type: Improvement
Components: Collaborative Filtering
Affects Versions: 0.6
Reporter: Sebastian Schelter
org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-902) TanimotoCoefficientSimilarity should
return Double.NaN for two items that have zero overlap
Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-902:
-----------------------------
Resolution: Fixed
Fix Version/s: 0.6
Assignee: Sean Owen
Status: Resolved (was: Patch Available)
Tests pass, and this looks like a good fix, and no controversy, and has a patch from the interested party, so I just committed.
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Assignee: Sean Owen
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Fix For: 0.6
>
> Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Remi Melisson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162931#comment-13162931 ]
Remi Melisson commented on MAHOUT-902:
--------------------------------------
cool ;)
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Assignee: Sean Owen
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Fix For: 0.6
>
> Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Remi Melisson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162393#comment-13162393 ]
Remi Melisson commented on MAHOUT-902:
--------------------------------------
Hi,
Maybe, I misunderstand the description but it seems to be the case :
(line 77)
http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/similarity/TanimotoCoefficientSimilarity.java?view=markup
tested by :
(testNoCorrelation)
http://svn.apache.org/viewvc/mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/similarity/TanimotoCoefficientSimilarityTest.java?view=markup
Correct me if I'm wrong.
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Labels: MAHOUT_INTRO_CONTRIBUTE
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-902) TanimotoCoefficientSimilarity should
return Double.NaN for two items that have zero overlap
Posted by "Remi Melisson (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Remi Melisson updated MAHOUT-902:
---------------------------------
Status: Patch Available (was: Open)
A small patch for this issue, with the corresponding test case.
Hope it's ok.
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162885#comment-13162885 ]
Sean Owen commented on MAHOUT-902:
----------------------------------
That's fine, I think we also need to change the distributed code too, which is the one-liner I mentioned above. If there are no objections I'll put it all together and commit.
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-902) TanimotoCoefficientSimilarity should
return Double.NaN for two items that have zero overlap
Posted by "Remi Melisson (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Remi Melisson updated MAHOUT-902:
---------------------------------
Attachment: MAHOUT-902.patch
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163431#comment-13163431 ]
Hudson commented on MAHOUT-902:
-------------------------------
Integrated in Mahout-Quality #1227 (See [https://builds.apache.org/job/Mahout-Quality/1227/])
MAHOUT-902 oops do not return NaN from distributed item-item simliarity as 0 means 'ignore'
srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1210603
Files :
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/TanimotoCoefficientSimilarity.java
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Assignee: Sean Owen
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Fix For: 0.6
>
> Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Sebastian Schelter (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162726#comment-13162726 ]
Sebastian Schelter commented on MAHOUT-902:
-------------------------------------------
@Remi this is for the user-user similarity computation only.
@Sean Yes, this change is trivial. I thought it would make a nice first ticket for people who want to start contributing to Mahout (that why its marked with MAHOUT_INTRO_CONTRIBUTE)
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Labels: MAHOUT_INTRO_CONTRIBUTE
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163027#comment-13163027 ]
Hudson commented on MAHOUT-902:
-------------------------------
Integrated in Mahout-Quality #1226 (See [https://builds.apache.org/job/Mahout-Quality/1226/])
MAHOUT-902 item similarity is now NaN for no overlap
srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1210544
Files :
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/similarity/TanimotoCoefficientSimilarity.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/TanimotoCoefficientSimilarity.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/similarity/TanimotoCoefficientSimilarityTest.java
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Assignee: Sean Owen
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Fix For: 0.6
>
> Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162999#comment-13162999 ]
Sean Owen commented on MAHOUT-902:
----------------------------------
Ah I think I misunderstood what was to change from the start. I'll revert that bit.
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Assignee: Sean Owen
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Fix For: 0.6
>
> Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162398#comment-13162398 ]
Sean Owen commented on MAHOUT-902:
----------------------------------
Sebastian is this turned around? yes the non-distributed implementation already has this behavior. I understood from your message that you proposed to make the distributed version return the same thing.
I think it's as simple as making its similarity computation into:
return dots == 0 ? Double.NaN : dots / (normA + normB - dots);
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Labels: MAHOUT_INTRO_CONTRIBUTE
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Sebastian Schelter (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162992#comment-13162992 ]
Sebastian Schelter commented on MAHOUT-902:
-------------------------------------------
We must not change the distributed code. In the distributed case, 0 is equivalent to non-existent and will already be ignored by iterateNonZero().
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Assignee: Sean Owen
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Fix For: 0.6
>
> Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity
should return Double.NaN for two items that have zero overlap
Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162760#comment-13162760 ]
Sean Owen commented on MAHOUT-902:
----------------------------------
Ah, right it affects the item-item computation in the non-distributed version too. Well I have a patch ready. I'll wait a bit until I post the 'answer'
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-902
> URL: https://issues.apache.org/jira/browse/MAHOUT-902
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Sebastian Schelter
> Labels: MAHOUT_INTRO_CONTRIBUTE
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira