You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Sebastian Schelter (JIRA)" <ji...@apache.org> on 2010/07/22 12:55:50 UTC

[jira] Created: (MAHOUT-445) Customizable strategies for candidate item fetching

Customizable strategies for candidate item fetching
---------------------------------------------------

                 Key: MAHOUT-445
                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
            Reporter: Sebastian Schelter


At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.

The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-445) Customizable strategies for candidate item fetching

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-445:
--------------------------------------

    Attachment: MAHOUT-445.patch

> Customizable strategies for candidate item fetching
> ---------------------------------------------------
>
>                 Key: MAHOUT-445
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-445.patch
>
>
> At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.
> The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-445) Customizable strategies for candidate item fetching

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-445:
--------------------------------------

    Attachment: MAHOUT-445-2.patch

> Customizable strategies for candidate item fetching
> ---------------------------------------------------
>
>                 Key: MAHOUT-445
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-445-2.patch, MAHOUT-445.patch
>
>
> At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.
> The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-445) Customizable strategies for candidate item fetching

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-445:
--------------------------------------

    Attachment: MAHOUT-445-3.patch

> Customizable strategies for candidate item fetching
> ---------------------------------------------------
>
>                 Key: MAHOUT-445
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-445-2.patch, MAHOUT-445-3.patch, MAHOUT-445.patch
>
>
> At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.
> The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-445) Customizable strategies for candidate item fetching

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-445:
--------------------------------------

    Status: Patch Available  (was: Open)

> Customizable strategies for candidate item fetching
> ---------------------------------------------------
>
>                 Key: MAHOUT-445
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-445.patch
>
>
> At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.
> The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-445) Customizable strategies for candidate item fetching

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893646#action_12893646 ] 

Sebastian Schelter commented on MAHOUT-445:
-------------------------------------------

Patch updated. Added another fetching strategy that only considers max(100, 20*log(max(N_users, N_items))) preferences per item, as suggested by Ted Dunning, hope I understood it correctly.

> Customizable strategies for candidate item fetching
> ---------------------------------------------------
>
>                 Key: MAHOUT-445
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-445-2.patch, MAHOUT-445.patch
>
>
> At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.
> The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-445) Customizable strategies for candidate item fetching

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-445:
-----------------------------

           Status: Resolved  (was: Patch Available)
         Assignee: Sean Owen
    Fix Version/s: 0.4
       Resolution: Fixed

> Customizable strategies for candidate item fetching
> ---------------------------------------------------
>
>                 Key: MAHOUT-445
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>             Fix For: 0.4
>
>         Attachments: MAHOUT-445-2.patch, MAHOUT-445-3.patch, MAHOUT-445.patch
>
>
> At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.
> The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-445) Customizable strategies for candidate item fetching

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894259#action_12894259 ] 

Sebastian Schelter commented on MAHOUT-445:
-------------------------------------------

Good points Sean, 

I've added randomized sampling via the FixedSizeSamplingIterator and I've added constructors taking the strategy object as a param to KnnItemBasedRecommender and SVDRecommender.

> Customizable strategies for candidate item fetching
> ---------------------------------------------------
>
>                 Key: MAHOUT-445
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-445-2.patch, MAHOUT-445-3.patch, MAHOUT-445.patch
>
>
> At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.
> The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-445) Customizable strategies for candidate item fetching

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894293#action_12894293 ] 

Hudson commented on MAHOUT-445:
-------------------------------

Integrated in Mahout-Quality #164 (See [http://hudson.zones.apache.org/hudson/job/Mahout-Quality/164/])
    MAHOUT-445


> Customizable strategies for candidate item fetching
> ---------------------------------------------------
>
>                 Key: MAHOUT-445
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>             Fix For: 0.4
>
>         Attachments: MAHOUT-445-2.patch, MAHOUT-445-3.patch, MAHOUT-445.patch
>
>
> At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.
> The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-445) Customizable strategies for candidate item fetching

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894181#action_12894181 ] 

Sean Owen commented on MAHOUT-445:
----------------------------------

I like the patch. For the sampling implementation, it's not quite sampling randomly? just taking the first few? That seems less than ideal. There is a SamplingIterator and counterpart for long primitives that could be useful here.

I suppose all Recommender implemeentations should have at least one constructor now that takes the strategy object as a param?

> Customizable strategies for candidate item fetching
> ---------------------------------------------------
>
>                 Key: MAHOUT-445
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-445
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-445-2.patch, MAHOUT-445.patch
>
>
> At the beginning of the recommendation process, a recommender has to identify a set of "candidate items" which are items that could possibly be recommended to the user, the final result of the recommender's computation will  be a subset of those.
> The current approach in AbstractRecommender.getAllOtherItems(...) turns out to be very slow if there is a high number of cooccurrences in the data (like in the grouplens 1M dataset for example). The aim of this patch is to make the way in which these candidate items are identified customizable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.