You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Pat Ferrel (JIRA)" <ji...@apache.org> on 2014/02/21 17:36:19 UTC

[jira] [Commented] (MAHOUT-1422) Make a version of RSJ that uses two inputs

    [ https://issues.apache.org/jira/browse/MAHOUT-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908487#comment-13908487 ] 

Pat Ferrel commented on MAHOUT-1422:
------------------------------------

This job should recreate AB' for cooccurrence similarity. So the row space (number of dimensions) of both input matrices must be the same. There are some applications where the column spaces of the two are not the same and this should be allowed as it is in the matrix multiply special case. The column id spaces should not be interpreted as representing the same things, whereas the row id spaces are identical.

The options for the CrossRowSimilairtyJob can be a superset of the current RSJ with the addition of a second input matrix. --input will need to be --input1 and --input2 and --numberOfColumns will need to be --numberOfColumns1 and --numberOfColumns2 or some such.
 
See Ted for further description of these asymmetric applications, where the two column spaces are not the same.

Also note that current job makes the assumption of a symmetric DRM as output and this will not be the case for a XRSJ. 

> Make a version of RSJ that uses two inputs
> ------------------------------------------
>
>                 Key: MAHOUT-1422
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1422
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>         Environment: mapreduce
>            Reporter: Pat Ferrel
>              Labels: recommender, similarity
>             Fix For: 1.0
>
>
> Currently the RowSimiairtyJob uses a similarity measure to pairwise compare all row in a DistributedRowMatrix.
> For many applications including a cross-action recommender we need something like RSJ that takes two DRMs and compares matching rows of each.  The output would be the same form as RSJ, and ideally would allow the use of any similarity type already defined--especially LLR.
> There are two implementations of a Cross-Recommender one based on the Mahout RecommenderJob, and another based on Solr, that can immediately benefit from a Cross-RSJ. 
> A modification of the matrix multiply job may be a place to start since the current RSJ seems to rely heavily if self-similarity.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)