You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sebastian Schelter (JIRA)" <ji...@apache.org> on 2010/04/04 19:25:27 UTC

[jira] Created: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

Computation of pairwise cosine similarities for Item-Based Collaborative Filtering
----------------------------------------------------------------------------------

                 Key: MAHOUT-362
                 URL: https://issues.apache.org/jira/browse/MAHOUT-362
             Project: Mahout
          Issue Type: New Feature
          Components: Collaborative Filtering
            Reporter: Sebastian Schelter


Provides a map/reduce job to precompute the pairwise cosine similarities between the item vectors of the user-item-matrix.  

The code uses a slightly modified version of the algorithm suggested in "Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce" (http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853312#action_12853312 ] 

Sean Owen commented on MAHOUT-362:
----------------------------------

Agree, I'm taking a first pass to unify this with the recommender stuff first. Then that should make it somewhat easier and clearer to think about refactoring up further.

> Computation of pairwise cosine similarities for Item-Based Collaborative Filtering
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-362
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-362
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-362.patch
>
>
> Provides a map/reduce job to precompute the pairwise cosine similarities between the item vectors of the user-item-matrix.  
> The code uses a slightly modified version of the algorithm suggested in "Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce" (http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-362:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.4
         Assignee: Sean Owen
           Status: Resolved  (was: Patch Available)

After much tweaking and polish and refactoring, I'm done integrating this for now. 

> Computation of pairwise cosine similarities for Item-Based Collaborative Filtering
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-362
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-362
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sean Owen
>             Fix For: 0.4
>
>         Attachments: MAHOUT-362.patch
>
>
> Provides a map/reduce job to precompute the pairwise cosine similarities between the item vectors of the user-item-matrix.  
> The code uses a slightly modified version of the algorithm suggested in "Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce" (http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853311#action_12853311 ] 

Jake Mannix commented on MAHOUT-362:
------------------------------------

It would be really nice if this code could be extend outside of the realm of Taste. This is a very general sparse matrix operation, and should probably live in the core o.a.m.math.hadoop package, operating on general sparse vectors.  

I can adapt it to that package after it's committed though, no need to hold it up on account of "insufficient generality" :)

> Computation of pairwise cosine similarities for Item-Based Collaborative Filtering
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-362
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-362
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-362.patch
>
>
> Provides a map/reduce job to precompute the pairwise cosine similarities between the item vectors of the user-item-matrix.  
> The code uses a slightly modified version of the algorithm suggested in "Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce" (http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-362:
--------------------------------------

    Attachment: MAHOUT-362.patch

> Computation of pairwise cosine similarities for Item-Based Collaborative Filtering
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-362
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-362
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-362.patch
>
>
> Provides a map/reduce job to precompute the pairwise cosine similarities between the item vectors of the user-item-matrix.  
> The code uses a slightly modified version of the algorithm suggested in "Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce" (http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853310#action_12853310 ] 

Sean Owen commented on MAHOUT-362:
----------------------------------

Really nice job on the code, very clean. I'm going to spend a little time tweaking it to match my style whims, and streamline a few things, and maybe move stuff out to 'common' areas and hook in to other common code to increase reuse in a few cases. Will commit soon.

> Computation of pairwise cosine similarities for Item-Based Collaborative Filtering
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-362
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-362
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-362.patch
>
>
> Provides a map/reduce job to precompute the pairwise cosine similarities between the item vectors of the user-item-matrix.  
> The code uses a slightly modified version of the algorithm suggested in "Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce" (http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-362:
--------------------------------------

    Status: Patch Available  (was: Open)

> Computation of pairwise cosine similarities for Item-Based Collaborative Filtering
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-362
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-362
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>
> Provides a map/reduce job to precompute the pairwise cosine similarities between the item vectors of the user-item-matrix.  
> The code uses a slightly modified version of the algorithm suggested in "Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce" (http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.