You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ted Dunning (JIRA)" <ji...@apache.org> on 2014/04/09 17:54:16 UTC

[jira] [Comment Edited] (MAHOUT-1422) Make a version of RSJ that uses two inputs

    [ https://issues.apache.org/jira/browse/MAHOUT-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964303#comment-13964303 ] 

Ted Dunning edited comment on MAHOUT-1422 at 4/9/14 3:52 PM:
-------------------------------------------------------------

The easy way to look at this is using the model of matrix multiplication (even though the details may change depending on metric).

If you generally have multiple behavior types, you can view the overall behavior history as an adjoined matrix:

{noformat}     A = [A_1 A_2 ... ]{noformat}

Each of the A_i are history matrices for different kinds of behavior.  The rows are still users, but there are more columns.  Let us assume that we want to recommend items of type 1 while using all types of behavior.

The full cooccurrence matrix can be computed pretty straightforwardly:
{noformat}
    A' A = [A_1'A_1 A_1' A_2 ... \\ A_2' A_1 A_2' A_2 ... ]
{noformat}       
And recommendations look like this:
{noformat}
    [r_1]      [A_1'A_1   A_1' A_2 ...] [h_1]
    [r_2]   =  [A_1'A_1   A_1' A_2 ...] [h_2]
    [...]      [ ...                  ] [...]
{noformat}
IF we only want recommendations of type 1, then when we use history of all types, we get

{noformat}
r_1 = A_1' A_1 h_1 + A_1' A_2 h_2 + ...
{noformat}

This line of argument tells us two things about down-sampling in cross-recommendation:

- resampling in terms of columns works exactly as with simple cooccurrence (from the original adjoint form)

- the computational cost of the co- and cross- occurrence can be bounded by limiting each of the A_i separately to a maximum number of non-zeros per row.  

This means that the down-sampling that we currently have should work just fine in the new case of cross-recommendation.  Each action matrix should be handled separately.



was (Author: tdunning):
The easy way to look at this is using the model of matrix multiplication (even though the details may change depending on metric).

If you generally have multiple behavior types, you can view the overall behavior history as an adjointed matrix:

{noformat}     A = [A_1 A_2 ... ]{noformat}

Each of the A_i are history matrices for different kinds of behavior.  The rows are still users, but there are more columns.  Let us assume that we want to recommend items of type 1 while using all types of behavior.

The full cooccurrence matrix can be computed pretty straightforwardly:
{noformat}
    A' A = [A_1'A_1 A_1' A_2 ... \\ A_2' A_1 A_2' A_2 ... ]
{noformat}       
And recommendations look like this:
{noformat}
    [r_1]      [A_1'A_1   A_1' A_2 ...] [h_1]
    [r_2]   =  [A_1'A_1   A_1' A_2 ...] [h_2]
    [...]      [ ...                  ] [...]

IF we only want recommendations of type 1, then when we use history of all types, we get

{noformat}
r_1 = A_1' A_1 h_1 + A_1' A_2 h_2 + ...
{noformat}

This line of argument tells us two things about down-sampling in cross-recommendation:

- resampling in terms of columns works exactly as with simple cooccurrence (from the original adjoint form)

- the computational cost of the co- and cross- occurrence can be bounded by limiting each of the A_i separately to a maximum number of non-zeros per row.  

This means that the down-sampling that we currently have should work just fine in the new case of cross-recommendation.  Each action matrix should be handled separately.


> Make a version of RSJ that uses two inputs
> ------------------------------------------
>
>                 Key: MAHOUT-1422
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1422
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>         Environment: mapreduce
>            Reporter: Pat Ferrel
>              Labels: recommender, similarity
>             Fix For: 1.0
>
>
> Currently the RowSimiairtyJob uses a similarity measure to pairwise compare all rows in a DistributedRowMatrix.
> For many applications including a cross-action recommender we need something like RSJ that takes two DRMs and compares matching rows of each.  The output would be the same form as RSJ, and ideally would allow the use of any similarity type already defined--especially LLR.
> There are two implementations of a Cross-Recommender one based on the Mahout RecommenderJob, and another based on Solr, that can immediately benefit from a Cross-RSJ. 
> A modification of the matrix multiply job may be a place to start since the current RSJ seems to rely heavily if self-similarity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)