You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sebastian Schelter (Created) (JIRA)" <ji...@apache.org> on 2011/11/29 14:13:41 UTC

[jira] [Created] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
-------------------------------------------------------------------------------------------

                 Key: MAHOUT-902
                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
    Affects Versions: 0.6
            Reporter: Sebastian Schelter


org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-902:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.6
         Assignee: Sean Owen
           Status: Resolved  (was: Patch Available)

Tests pass, and this looks like a good fix, and no controversy, and has a patch from the interested party, so I just committed.
                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>             Fix For: 0.6
>
>         Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Remi Melisson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162931#comment-13162931 ] 

Remi Melisson commented on MAHOUT-902:
--------------------------------------

cool ;)
                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>             Fix For: 0.6
>
>         Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Remi Melisson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162393#comment-13162393 ] 

Remi Melisson commented on MAHOUT-902:
--------------------------------------

Hi,
Maybe, I misunderstand the description but it seems to be the case :
(line 77)
http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/similarity/TanimotoCoefficientSimilarity.java?view=markup

tested by :
(testNoCorrelation)
http://svn.apache.org/viewvc/mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/similarity/TanimotoCoefficientSimilarityTest.java?view=markup

Correct me if I'm wrong.
                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Remi Melisson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Remi Melisson updated MAHOUT-902:
---------------------------------

    Status: Patch Available  (was: Open)

A small patch for this issue, with the corresponding test case.
Hope it's ok.
                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>         Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162885#comment-13162885 ] 

Sean Owen commented on MAHOUT-902:
----------------------------------

That's fine, I think we also need to change the distributed code too, which is the one-liner I mentioned above. If there are no objections I'll put it all together and commit.
                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>         Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Remi Melisson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Remi Melisson updated MAHOUT-902:
---------------------------------

    Attachment: MAHOUT-902.patch
    
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>         Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163431#comment-13163431 ] 

Hudson commented on MAHOUT-902:
-------------------------------

Integrated in Mahout-Quality #1227 (See [https://builds.apache.org/job/Mahout-Quality/1227/])
    MAHOUT-902 oops do not return NaN from distributed item-item simliarity as 0 means 'ignore'

srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1210603
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/TanimotoCoefficientSimilarity.java

                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>             Fix For: 0.6
>
>         Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Sebastian Schelter (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162726#comment-13162726 ] 

Sebastian Schelter commented on MAHOUT-902:
-------------------------------------------

@Remi this is for the user-user similarity computation only.

@Sean Yes, this change is trivial. I thought it would make a nice first ticket for people who want to start contributing to Mahout (that why its marked with MAHOUT_INTRO_CONTRIBUTE)
                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163027#comment-13163027 ] 

Hudson commented on MAHOUT-902:
-------------------------------

Integrated in Mahout-Quality #1226 (See [https://builds.apache.org/job/Mahout-Quality/1226/])
    MAHOUT-902 item similarity is now NaN for no overlap

srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1210544
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/similarity/TanimotoCoefficientSimilarity.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/TanimotoCoefficientSimilarity.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/similarity/TanimotoCoefficientSimilarityTest.java

                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>             Fix For: 0.6
>
>         Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162999#comment-13162999 ] 

Sean Owen commented on MAHOUT-902:
----------------------------------

Ah I think I misunderstood what was to change from the start. I'll revert that bit.
                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>             Fix For: 0.6
>
>         Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162398#comment-13162398 ] 

Sean Owen commented on MAHOUT-902:
----------------------------------

Sebastian is this turned around? yes the non-distributed implementation already has this behavior. I understood from your message that you proposed to make the distributed version return the same thing.

I think it's as simple as making its similarity computation into:

    return dots == 0 ? Double.NaN : dots / (normA + normB - dots);

                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Sebastian Schelter (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162992#comment-13162992 ] 

Sebastian Schelter commented on MAHOUT-902:
-------------------------------------------

We must not change the distributed code. In the distributed case, 0 is equivalent to non-existent and will already be ignored by iterateNonZero(). 
                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>             Fix For: 0.6
>
>         Attachments: MAHOUT-902.patch
>
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162760#comment-13162760 ] 

Sean Owen commented on MAHOUT-902:
----------------------------------

Ah, right it affects the item-item computation in the non-distributed version too. Well I have a patch ready. I'll wait a bit until I post the 'answer'
                
> TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-902
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-902
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap. Please also provide a unit-test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira