You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2008/11/07 19:04:54 UTC

[jira] Created: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

UserSimilarity-based NearestNNeighborhood
-----------------------------------------

                 Key: MAHOUT-95
                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
            Reporter: Otis Gospodnetic
            Priority: Minor
         Attachments: UserSimilarityNearestNUserNeighborhood.java

A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.

The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.

I'll update the unit test and provide a patch for that if others think this can go in.

Thoughts?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by Sean Owen <sr...@gmail.com>.
Change this.minSim = 0;   to    this.minSim = Double.NEGATIVE_INFINITY;
and then you can remove the check for it being > 0 later. This is
because similarity could be negative.
Then you can cull the commented lines, then looks good to me.

Yeah I like just pushing it together rather than making two classes here.

On Thu, Jan 15, 2009 at 9:36 PM, Otis Gospodnetic (JIRA)
<ji...@apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Otis Gospodnetic updated MAHOUT-95:
> -----------------------------------
>
>    Attachment: MAHOUT-95.patch
>
> Something like this?
>
> The other patch was a bit more code, but was a bit more "explicit" for both a person reading and the JVM.  Will the JVM inline those if tests in the now modified estimate(User user) method?
>
>
>> UserSimilarity-based NearestNNeighborhood
>> -----------------------------------------
>>
>>                 Key: MAHOUT-95
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>>             Project: Mahout
>>          Issue Type: Improvement
>>          Components: Collaborative Filtering
>>            Reporter: Otis Gospodnetic
>>            Priority: Minor
>>             Fix For: 0.1
>>
>>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch
>>
>>
>> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
>> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
>> I'll update the unit test and provide a patch for that if others think this can go in.
>> Thoughts?
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Updated: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated MAHOUT-95:
-----------------------------------

    Attachment: UserSimilarityNearestNUserNeighborhood.java

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: UserSimilarityNearestNUserNeighborhood.java
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic resolved MAHOUT-95.
------------------------------------

    Resolution: Fixed
      Assignee: Otis Gospodnetic

Finito.

Sending        core/src/main/java/org/apache/mahout/cf/taste/impl/neighborhood/NearestNUserNeighborhood.java
Transmitting file data .
Committed revision 738980.


> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated MAHOUT-95:
-----------------------------------

    Attachment: MAHOUT-95.patch

I think this is it.

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645989#action_12645989 ] 

Sean Owen commented on MAHOUT-95:
---------------------------------

Looks OK to me. I agree it makes sense to maintain only 1 class if it encapsulates the other two. I changed the original today so you may want to look at the original again.

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662213#action_12662213 ] 

Sean Owen commented on MAHOUT-95:
---------------------------------

It's been a while since I looked at this so apologies if I lost the original point --

I think the bit you want to change is the inner class Estimator. estimate() should return NaN if the similarity it computes is below your threshold. Then you don't need changes to getTopUsers() or Rescorers (indeed Rescorer<Item> doesn't seem to be what you want). The only chnage you need is what's in your patch now -- a new parameter to the constructor that is made available to Estimator then.

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662139#action_12662139 ] 

Otis Gospodnetic commented on MAHOUT-95:
----------------------------------------

I just had a look at the code and your suggestion after a pause.  Yes, that would work...

But how about this:
I want to ignore users whose theUser-user similarity is < N.
But I can't do that with the current API if I provide my implementation of Rescorer<User> -- I can only pass in Rescorer<Item> rescorers.  I tried adding the API that takes Rescorer<User>, but quickly found myself doing surgery on a number of classes (Recommender, TopItems, NearestNUserNeighborhood and maybe more), so I cowardly backed out...

Maybe I'm missing the right way to add support for passing in an instance of Rescorer<User>?


> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668180#action_12668180 ] 

Sean Owen commented on MAHOUT-95:
---------------------------------

LGTM, commit away.

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663502#action_12663502 ] 

Sean Owen commented on MAHOUT-95:
---------------------------------

Looks OK. I think you don't need two Estimators -- just overload one to serve both purposes. With that, feel free to commit.

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated MAHOUT-95:
-----------------------------------

    Attachment: MAHOUT-95.patch

Something like this?

The other patch was a bit more code, but was a bit more "explicit" for both a person reading and the JVM.  Will the JVM inline those if tests in the now modified estimate(User user) method?


> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646512#action_12646512 ] 

Sean Owen commented on MAHOUT-95:
---------------------------------

I don't think you need to copy getTopUsers(). This could be accomplished with a Rescorer. Pass in one that returns the original value if the similarity is at least the threshold, otherwise NaN.

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated MAHOUT-95:
-----------------------------------

    Attachment: MAHOUT-95-diff-against-nearestN.txt
                MAHOUT-95.patch

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, UserSimilarityNearestNUserNeighborhood.java
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated MAHOUT-95:
-----------------------------------

    Fix Version/s: 0.1

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated MAHOUT-95:
-----------------------------------

    Attachment: MAHOUT-95.patch

Cleaned up version:
* No TopItems modifications
* Addition of NearestNUserNeighborhood.MinSimilarityEstimator
* Additional NearestNUserNeighborhood ctor that takes minSimilarity and uses MinSimilarityEstimator if minSimilarity > 0.0


> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662139#action_12662139 ] 

otis edited comment on MAHOUT-95 at 1/8/09 2:15 PM:
----------------------------------------------------------------

I just had a look at the code and your suggestion after a pause.  Yes, that would work...

But how about this:
I want to ignore users whose theUser-user similarity is < N.
But I can't do that with the current API if I provide my implementation of Rescorer<*User*> -- I can only pass in Rescorer<*Item*> rescorers.  I tried adding the API that takes Rescorer<User>, but quickly found myself doing surgery on a number of classes (Recommender, TopItems, NearestNUserNeighborhood and maybe more), so I cowardly backed out...

Maybe I'm missing the right way to add support for passing in an instance of Rescorer<User>?

Note that I could still create something like:
{code}
public class MinItemSimilarityRescorer<T> implements Rescorer<Item> {
  private double minSimilarity;
  public MinItemSimilarityRescorer(double minSimilarity) {
    this.minSimilarity = minSimilarity;
  }

  public boolean isFiltered(Item thing) {
    return false;
  }

  // IGNORE Item and make this type-agnostic
  public double rescore(Item thing, double originalScore) {
    return (originalScore < minSimilarity) ? Double.NaN : originalScore;
  }
}
{code}

And have this Rescorer never even look at the <T>, thus making it type-agnostic.... but it feels wrong to have <Item> there if the scores I'm really looking at are user-user similarity scores.

To put this in context, this rescorer gets called in TopItems.getTopUsers, so that's what I'm looking at:

{code}
double similarity = estimator.estimate(user);
double rescoredSimilarity = rescorer == null ? similarity : rescorer.rescore(user, similarity);
{code}


      was (Author: otis):
    I just had a look at the code and your suggestion after a pause.  Yes, that would work...

But how about this:
I want to ignore users whose theUser-user similarity is < N.
But I can't do that with the current API if I provide my implementation of Rescorer<User> -- I can only pass in Rescorer<Item> rescorers.  I tried adding the API that takes Rescorer<User>, but quickly found myself doing surgery on a number of classes (Recommender, TopItems, NearestNUserNeighborhood and maybe more), so I cowardly backed out...

Maybe I'm missing the right way to add support for passing in an instance of Rescorer<User>?

  
> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated MAHOUT-95:
-----------------------------------

    Attachment: MAHOUT-95.patch

Changed NearestNUserNeighborhood and TopItems classes.
(ignore earlier patches)


> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated MAHOUT-95:
-----------------------------------

    Attachment:     (was: UserSimilarityNearestNUserNeighborhood.java)

> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>
>                 Key: MAHOUT-95
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-95
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: MAHOUT-95-diff-against-nearestN.txt, MAHOUT-95.patch
>
>
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity parameter, which is the primary factor for including/excluding other users from the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class (either keep this one and axe NearestNUserNeighborhood or add this functionality to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can go in.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.