You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2015/04/06 21:46:12 UTC
[jira] [Closed] (SPARK-6711) Support parallelized online matrix
factorization for Collaborative Filtering
[ https://issues.apache.org/jira/browse/SPARK-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng closed SPARK-6711.
--------------------------------
Resolution: Duplicate
> Support parallelized online matrix factorization for Collaborative Filtering
> -----------------------------------------------------------------------------
>
> Key: SPARK-6711
> URL: https://issues.apache.org/jira/browse/SPARK-6711
> Project: Spark
> Issue Type: Improvement
> Components: MLlib, Streaming
> Reporter: Chunnan Yao
> Original Estimate: 840h
> Remaining Estimate: 840h
>
> On-line Collaborative Filtering(CF) has been widely used and studied. To re-train a CF model from scratch every time when new data comes in is very inefficient (http://stackoverflow.com/questions/27734329/apache-spark-incremental-training-of-als-model). However, in Spark community we see few discussion about collaborative filtering on streaming data. Given streaming k-means, streaming logistic regression, and the on-going incremental model training of Naive Bayes Classifier (SPARK-4144), we think it is meaningful to consider streaming Collaborative Filtering support on MLlib.
> We have already been considering about this issue during the past week. We plan to refer to this paper
> (https://www.cs.utexas.edu/~cjohnson/ParallelCollabFilt.pdf). It is based on SGD instead of ALS, which is easier to be tackled under streaming data.
> Fortunately, the authors of this paper have implemented their algorithm as a Github Project, based on Storm:
> https://github.com/MrChrisJohnson/CollabStream
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org