You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Debasish Das (JIRA)" <ji...@apache.org> on 2014/11/12 00:08:33 UTC

[jira] [Commented] (SPARK-3066) Support recommendAll in matrix factorization model

    [ https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207298#comment-14207298 ] 

Debasish Das commented on SPARK-3066:
-------------------------------------

[~mengxr] I am testing recommendAllUsers and recommendAllProducts API and I will add the code to RankingMetrics PR:
https://github.com/apache/spark/pull/3098

I have not used level-3 BLAS yet since we should be able to re-use DistributedMatrix API that's coming online (here all the matrices are dense)...I used ideas 1 and 2 and I also add a skipRatings in the API (using that you can skip the ratings that each user has already provided...for the validation I skip the train set basically)

Example API:

def recommendAllUsers(num: Int, skipUserRatings: RDD[Rating]) = {
    val skipUsers = skipUserRatings.map { x => ((x.user, x.product), x.rating) }
    val productVectors = productFeatures.collect
    recommend(productVectors, userFeatures, num, skipUsers)
  }

  def recommendAllProducts(num: Int, skipProductRatings: RDD[Rating]) = {
    val skipProducts = skipProductRatings.map { x => ((x.product, x.user), x.rating) }
    val userVectors = userFeatures.collect
    recommend(userVectors, productFeatures, num, skipProducts)
  }

> Support recommendAll in matrix factorization model
> --------------------------------------------------
>
>                 Key: SPARK-3066
>                 URL: https://issues.apache.org/jira/browse/SPARK-3066
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Xiangrui Meng
>
> ALS returns a matrix factorization model, which we can use to predict ratings for individual queries as well as small batches. In practice, users may want to compute top-k recommendations offline for all users. It is very expensive but a common problem. We can do some optimization like
> 1) collect one side (either user or product) and broadcast it as a matrix
> 2) use level-3 BLAS to compute inner products
> 3) use Utils.takeOrdered to find top-k



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org