You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org> on 2014/06/02 19:50:02 UTC

[jira] [Commented] (MAHOUT-1365) Weighted ALS-WR iterator for Spark

    [ https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015639#comment-14015639 ] 

Dmitriy Lyubimov commented on MAHOUT-1365:
------------------------------------------

[~ssc] Since you've done you before, can you please eyeball this and make a suggestion ? 
my current implementation proceeds with computations based on formula (7) in the pdf which is in its turn is derived directly from both papers.  (we ignore baseline confidence which i denote as c_0 in which case the expression under inversion comes apart as V'V which is common, tiny for all item vectors so it is just computed once and broadcasted; and then individual item correction U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0).

That kind of means that every U row has to send a message to every V for which where c!= c_0. I previously have done it with pregel. It turns out, in spark Bagel is a moot point since it is simply using groupBy underneath rather than a custom multicast communication. Still though, if i did it today, I would have to do a coGroup or something to achieve similar effect. Question is if there's a neat way to translate it into our current set of linear algebra primitives, or that's it, it would be our first case when we would have to create our first method that in part would be tightly coupled to Spark? Any thoughts?

> Weighted ALS-WR iterator for Spark
> ----------------------------------
>
>                 Key: MAHOUT-1365
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1365
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>         Attachments: distributed-als-with-confidence.pdf
>
>
> Given preference P and confidence C distributed sparse matrices, compute ALS-WR solution for implicit feedback (Spark Bagel version).
> Following Hu-Koren-Volynsky method (stripping off any concrete methodology to build C matrix), with parameterized test for convergence.
> The computational scheme is following ALS-WR method (which should be slightly more efficient for sparser inputs). 
> The best performance will be achieved if non-sparse anomalies prefilitered (eliminated) (such as an anomalously active user which doesn't represent typical user anyway).
> the work is going here https://github.com/dlyubimov/mahout-commits/tree/dev-0.9.x-scala. I am porting away our (A1) implementation so there are a few issues associated with that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)