You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Michael Allman <ms...@allman.ms> on 2014/03/11 23:18:43 UTC

possible bug in Spark's ALS implementation...

Hi,

I'm implementing a recommender based on the algorithm described in
http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the
basis for Spark's ALS implementation for data sets with implicit features.
The data set I'm working with is proprietary and I cannot share it,
however I can say that it's based on the same kind of data in the
paper---relative viewing time of videos. (Specifically, the "rating" for
each video is defined as total viewing time across all visitors divided by
video duration).

I'm seeing counterintuitive, sometimes nonsensical recommendations. For
comparison, I've run the training data through Oryx's in-VM implementation
of implicit ALS with the same parameters. Oryx uses the same algorithm.
(Source in this file:
https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java)

The recommendations made by each system compared to one other are very
different---moreso than I think could be explained by differences in
initial state. The recommendations made by the Oryx models look much
better, especially as I increase the number of latent factors and the
iterations. The Spark models' recommendations don't improve with increases
in either latent factors or iterations. Sometimes, they get worse.

Because of the (understandably) highly-optimized and terse style of
Spark's ALS implementation, I've had a very hard time following it well
enough to debug the issue definitively. However, I have found a section of
code that looks incorrect. As described in the paper, part of the implicit
ALS algorithm involves computing a matrix product YtCuY (equation 4 in the
paper). To optimize this computation, this expression is rewritten as YtY
+ Yt(Cu - I)Y. I believe that's what should be happening here:

https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376

However, it looks like this code is in fact computing YtY + YtY(Cu - I),
which is the same as YtYCu. If so, that's a bug. Can someone familiar with
this code evaluate my claim?

Cheers,

Michael

Re: possible bug in Spark's ALS implementation...

Posted by Sean Owen <so...@cloudera.com>.

On Tue, Mar 11, 2014 at 10:18 PM, Michael Allman <ms...@allman.ms> wrote:
> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
> comparison, I've run the training data through Oryx's in-VM implementation
> of implicit ALS with the same parameters. Oryx uses the same algorithm.
> (Source in this file:
> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java)

On this note, I should say that Oryx varies from that paper in a
couple small ways. In particular it the regularization parameter that
is used in the end is not just lambda, but lambda * alpha. (There are
decent reasons for this.)

So the difference with the "same" parameters could be down to this.
What param values are you using? It might be the difference.

(There is another difference in handling of negative values, but that
is probably irrelevant to you? It is in Spark now too though. It was
not in 0.9.0 but is in HEAD.)

> However, it looks like this code is in fact computing YtY + YtY(Cu - I),
> which is the same as YtYCu. If so, that's a bug. Can someone familiar with
> this code evaluate my claim?

I too can't be 100% certain I'm not missing something, but from a look
at that line, I don't think it is computing YtY(Cu-I). It is indeed
trying to accumulate the value Yt(Cu-I)Y by building it up from
pieces, from rows of Y. For one row of Y that piece is, excusing my
notation, Y(i)t (Cu(i)-1) Y(i). The middle term is just a scalar so
it's fine to multiply it at the end as you see in that line.

You may wish to follow HEAD, which is a bit different:

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L390

The computation is actually the same as before (for positive input),
expressed a little differently.

Happy to help on this given that I know this code a little and the
code you are comparing it to a lot.

Re: possible bug in Spark's ALS implementation...

Posted by Sebastian Schelter <ss...@apache.org>.

The mahout implementation is just a straight-forward port of the paper. 
No changes have been made.

On 03/12/2014 08:36 AM, Nick Pentreath wrote:
> It would be helpful to know what parameter inputs you are using.
>
> If the regularization schemes are different (by a factor of alpha, which
> can often be quite high) this will mean that the same parameter settings
> could give very different results. A higher lambda would be required with
> Spark's version to be comparable.
>
> When I submitted the PR for this, I verified (on ml-100k, ml-1m and ml-10m
> data) that this version gives the same RMSE as Mahout's implicit model, as
> well as a separate Spark version that I wrote that was a from-scratch port
> of the Mahout algorithm (though I didn't compare vs Myrrix/Oryx). I'm
> fairly confident things are correct but if there is a bug let's definitely
> find and fix it!
>
> @Sean, would it be a good idea to look at changing the regularization in
> Spark's ALS to alpha * lambda? What is the thinking behind this? If I
> recall, the Mahout version added something like (# ratings * lambda) as
> regularization in each factor update (for explicit), but implicit it was
> just lambda (I may be wrong here).
>
>
>
> On Wed, Mar 12, 2014 at 4:57 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i
>> x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some
>> metrics to tell which recommendation is better? -Xiangrui
>>
>> On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng <me...@gmail.com> wrote:
>>> Hi Michael,
>>>
>>> I can help check the current implementation. Would you please go to
>>> https://spark-project.atlassian.net/browse/SPARK and create a ticket
>>> about this issue with component "MLlib"? Thanks!
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman <ms...@allman.ms> wrote:
>>>> Hi,
>>>>
>>>> I'm implementing a recommender based on the algorithm described in
>>>> http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms
>> the
>>>> basis for Spark's ALS implementation for data sets with implicit
>> features.
>>>> The data set I'm working with is proprietary and I cannot share it,
>> however
>>>> I can say that it's based on the same kind of data in the
>> paper---relative
>>>> viewing time of videos. (Specifically, the "rating" for each video is
>>>> defined as total viewing time across all visitors divided by video
>>>> duration).
>>>>
>>>> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
>>>> comparison, I've run the training data through Oryx's in-VM
>> implementation
>>>> of implicit ALS with the same parameters. Oryx uses the same algorithm.
>>>> (Source in this file:
>>>>
>> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java
>> )
>>>>
>>>> The recommendations made by each system compared to one other are very
>>>> different---moreso than I think could be explained by differences in
>> initial
>>>> state. The recommendations made by the Oryx models look much better,
>>>> especially as I increase the number of latent factors and the
>> iterations.
>>>> The Spark models' recommendations don't improve with increases in either
>>>> latent factors or iterations. Sometimes, they get worse.
>>>>
>>>> Because of the (understandably) highly-optimized and terse style of
>> Spark's
>>>> ALS implementation, I've had a very hard time following it well enough
>> to
>>>> debug the issue definitively. However, I have found a section of code
>> that
>>>> looks incorrect. As described in the paper, part of the implicit ALS
>>>> algorithm involves computing a matrix product YtCuY (equation 4 in the
>>>> paper). To optimize this computation, this expression is rewritten as
>> YtY +
>>>> Yt(Cu - I)Y. I believe that's what should be happening here:
>>>>
>>>>
>> https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376
>>>>
>>>> However, it looks like this code is in fact computing YtY + YtY(Cu - I),
>>>> which is the same as YtYCu. If so, that's a bug. Can someone familiar
>> with
>>>> this code evaluate my claim?
>>>>
>>>> Cheers,
>>>>
>>>> Michael
>>
>

Re: possible bug in Spark's ALS implementation...

Posted by Nick Pentreath <ni...@gmail.com>.

Great work Xiangrui thanks for the enhancement!—
Sent from Mailbox for iPhone

On Wed, Mar 19, 2014 at 12:08 AM, Xiangrui Meng <me...@gmail.com> wrote:

> Glad to hear the speed-up. Wish we can improve the implementation
> further in the future. -Xiangrui
> On Tue, Mar 18, 2014 at 1:55 PM, Michael Allman <ms...@allman.ms> wrote:
>> I just ran a runtime performance comparison between 0.9.0-incubating and your
>> als branch. I saw a 1.5x improvement in performance.
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Xiangrui Meng <me...@gmail.com>.

Glad to hear the speed-up. Wish we can improve the implementation
further in the future. -Xiangrui

On Tue, Mar 18, 2014 at 1:55 PM, Michael Allman <ms...@allman.ms> wrote:
> I just ran a runtime performance comparison between 0.9.0-incubating and your
> als branch. I saw a 1.5x improvement in performance.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Michael Allman <ms...@allman.ms>.

I just ran a runtime performance comparison between 0.9.0-incubating and your
als branch. I saw a 1.5x improvement in performance.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Xiangrui Meng <me...@gmail.com>.

Sorry, the link was wrong. Should be
https://github.com/apache/spark/pull/131 -Xiangrui


On Tue, Mar 18, 2014 at 10:20 AM, Michael Allman <ms...@allman.ms> wrote:
> Hi Xiangrui,
>
> I don't see how https://github.com/apache/spark/pull/161 relates to ALS. Can
> you explain?
>
> Also, thanks for addressing the issue with factor matrix persistence in PR
> 165. I was probably not going to get to that for a while.
>
> I will try to test your changes today for speed improvements.
>
> Cheers,
>
> Michael
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2817.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Michael Allman <ms...@allman.ms>.

Hi Xiangrui,

I don't see how https://github.com/apache/spark/pull/161 relates to ALS. Can
you explain?

Also, thanks for addressing the issue with factor matrix persistence in PR
165. I was probably not going to get to that for a while.

I will try to test your changes today for speed improvements.

Cheers,

Michael



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2817.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Xiangrui Meng <me...@gmail.com>.

Hi Michael,

I made couple changes to implicit ALS. One gives faster construction
of YtY (https://github.com/apache/spark/pull/161), which was merged
into master. The other caches intermediate matrix factors properly
(https://github.com/apache/spark/pull/165). They should give you the
same result as before, but faster (~2x in my local tests). If you have
time to try the improved version, please let me know the speed-up on
your data. Thanks!

Best,
Xiangrui

On Mon, Mar 17, 2014 at 5:07 PM, Michael Allman <ms...@allman.ms> wrote:
> I've created https://spark-project.atlassian.net/browse/SPARK-1263 to address
> the issue of the factor matrix recomputation. I'm planning to submit a
> related pull request shortly.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2785.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Michael Allman <ms...@allman.ms>.

I've created https://spark-project.atlassian.net/browse/SPARK-1263 to address
the issue of the factor matrix recomputation. I'm planning to submit a
related pull request shortly.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2785.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Xiangrui Meng <me...@gmail.com>.

The factor matrix Y is used twice in implicit ALS computation, one to
compute global Y^T Y, and another to compute local Y_i^T C_i Y_i.
-Xiangrui

On Sun, Mar 16, 2014 at 1:18 PM, Matei Zaharia <ma...@gmail.com> wrote:
> On Mar 14, 2014, at 5:52 PM, Michael Allman <ms...@allman.ms> wrote:
>
> I also found that the product and user RDDs were being rebuilt many times
> over in my tests, even for tiny data sets. By persisting the RDD returned
> from updateFeatures() I was able to avoid a raft of duplicate computations.
> Is there a reason not to do this?
>
>
> This sounds like a good thing to add, though I'd like to understand why
> these are being recomputed (it seemed that the code would only use each one
> once). Do you have any sense why that is?
>
> Matei

Re: possible bug in Spark's ALS implementation...

Posted by Matei Zaharia <ma...@gmail.com>.

On Mar 14, 2014, at 5:52 PM, Michael Allman <ms...@allman.ms> wrote:

> I also found that the product and user RDDs were being rebuilt many times
> over in my tests, even for tiny data sets. By persisting the RDD returned
> from updateFeatures() I was able to avoid a raft of duplicate computations.
> Is there a reason not to do this?

This sounds like a good thing to add, though I’d like to understand why these are being recomputed (it seemed that the code would only use each one once). Do you have any sense why that is?

Matei

Re: possible bug in Spark's ALS implementation...

Posted by Michael Allman <ms...@allman.ms>.

You are correct, in the long run it doesn't matter which matrix you begin the
iterative process with. I was thinking in terms of doing a side-by-side
comparison to Oryx.

I've posted a bug report as SPARK-1262. I described the problem I found and
the mitigation strategy I've used. I think that this problem has many
possible solutions, so I'm omitting a patch to let the community hash out
the best approach. However, I will suggest we move to a pure Java
implementation of a linear system solver to provide better assurances of
correctness across platforms (differences in java.lang.Math notwithstanding)
and to make the implementation more transparent. It is not clear exactly
what native code JBlas is linked to and using for its solver. I suggested
the QR decomposition-based solvers provided by Colt and Commons Math as
candidate replacements.

Cheers.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2783.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Xiangrui Meng <me...@gmail.com>.

Hi Michael,

Thanks for looking into the details! Computing X first and computing Y
first can deliver different results, because the initial objective
values could differ by a lot. But the algorithm should converge after
a few iterations. It is hard to tell which should go first. After all,
the definitions of "user" and "product" are arbitrary. One trick we
can do is to rescale the columns of X and Y after each iteration such
that they have the same column norms.

For the comparison, you should compute some metrics to verify the convergence.

I don't think initializing Y is necessary if we start with X. However,
if Y_0 is not used, the data is not actually generated. So the
overhead should be small.

Best,
Xiangrui

On Fri, Mar 14, 2014 at 5:52 PM, Michael Allman <ms...@allman.ms> wrote:
> I've been thoroughly investigating this issue over the past couple of days
> and have discovered quite a bit. For one thing, there is definitely (at
> least) one issue/bug in the Spark implementation that leads to incorrect
> results for models generated with rank > 1 or a large number of iterations.
> I will post a bug report with a thorough explanation this weekend or on
> Monday.
>
> I believe I've been able to track down every difference between the Spark
> and Oryx implementations that lead to difference results. I made some
> adjustments to the spark implementation so that, given the same initial
> product/item vectors, the resulting model is identical to the one produced
> by Oryx within a small numerical tolerance. I've verified this for small
> data sets and am working on verifying this with some large data sets.
>
> Aside from those already identified in this thread, another significant
> difference in the Spark implementation is that it begins the factorization
> process by computing the product matrix (Y) from the initial user matrix
> (X). Both of the papers on ALS referred to in this thread begin the process
> by computing the user matrix. I haven't done any testing comparing the
> models generated starting from Y or X, but they are very different. Is there
> a reason Spark begins the iteration by computing Y?
>
> Initializing both X and Y as is done in the Spark implementation seems
> unnecessary unless I'm overlooking some desired side-effect. Only the factor
> matrix which generates the other in the first iteration needs to be
> initialized.
>
> I also found that the product and user RDDs were being rebuilt many times
> over in my tests, even for tiny data sets. By persisting the RDD returned
> from updateFeatures() I was able to avoid a raft of duplicate computations.
> Is there a reason not to do this?
>
> Thanks.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2704.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Sean Owen <so...@cloudera.com>.

It should be kept in mind that different implementations are rarely
strictly better, and that what works well in one type of data might
not in another. It also bears keeping in mind that several of these
differences just amount to different amounts of regularization, which
need not be a difference.

#1 is just a question of 'semantics' really and not an algorithmic difference.

#4 is more subtle and I think it is a small positive to use weighted
regularization -- for all types of data. It is not specific to
explicit data since it is based on counts. That said the effect of
this is mostly "more regularization across the boards", and can be
simulated with a higher global lambda.

There might be something to #2. I suspect the QR decomposition is more
accurate and that can matter here. I have no evidence for that though.
(The symmetry wasn't actually an issue in the end no?)

#3 won't affect the result so much as when to stop. That is stopping
in a principled way after X iterations is good, but it can be
simulated with just specifying X iterations.

If I were to experiment next I would focus on #2 and maybe #4.
--
Sean Owen | Director, Data Science | London


On Wed, Apr 2, 2014 at 9:06 AM, Michael Allman <ms...@allman.ms> wrote:
> Hi Nick,
>
> I don't have my spark clone in front of me, but OTOH the major differences
> are/were:
>
> 1. Oryx multiplies lambda by alpha.
>
> 2. Oryx uses a different matrix inverse algorithm. It maintains a certain
> symmetry which the Spark algo does not, however I don't think this
> difference has a real impact on the results.
>
> 3. Oryx supports the specification of a convergence threshold for
> termination of the algorithm, based on delta rmse on a subset of the
> training set if I recall correctly. I've been using that as the
> termination criterion instead of a fixed number of iterations.
>
> 4. Oryx uses the weighted regularization scheme you alluded to below,
> multiplying lambda by the number of ratings.
>
> I've patched the spark impl to support (4) but haven't pushed it to my
> clone on github. I think it would be a valuable feature to support
> officially. I'd also like to work on (3) but don't have time now. I've
> only been using Oryx the past couple of weeks.
>
> Cheers,
>
> Michael
>
> On Tue, 1 Apr 2014, Nick Pentreath [via Apache Spark User List] wrote:
>
>> Hi Michael
>> Would you mind setting out exactly what differences you did find between
>> the
>> Spark and Oryx implementations? Would be good to be clear on them, and
>> also
>> see if there are further tricks/enhancements from the Oryx one that can be
>> ported (such as the lambda * numRatings adjustment).
>>
>> N
>>
>>
>> On Sat, Mar 15, 2014 at 2:52 AM, Michael Allman <[hidden email]> wrote:
>>       I've been thoroughly investigating this issue over the past
>>       couple of days
>>       and have discovered quite a bit. For one thing, there is
>>       definitely (at
>>       least) one issue/bug in the Spark implementation that leads to
>>       incorrect
>>       results for models generated with rank > 1 or a large number of
>>       iterations.
>>       I will post a bug report with a thorough explanation this
>>       weekend or on
>>       Monday.
>>
>>       I believe I've been able to track down every difference between
>>       the Spark
>>       and Oryx implementations that lead to difference results. I made
>>       some
>>       adjustments to the spark implementation so that, given the same
>>       initial
>>       product/item vectors, the resulting model is identical to the
>>       one produced
>>       by Oryx within a small numerical tolerance. I've verified this
>>       for small
>>       data sets and am working on verifying this with some large data
>>       sets.
>>
>>       Aside from those already identified in this thread, another
>>       significant
>>       difference in the Spark implementation is that it begins the
>>       factorization
>>       process by computing the product matrix (Y) from the initial
>>       user matrix
>>       (X). Both of the papers on ALS referred to in this thread begin
>>       the process
>>       by computing the user matrix. I haven't done any testing
>>       comparing the
>>       models generated starting from Y or X, but they are very
>>       different. Is there
>>       a reason Spark begins the iteration by computing Y?
>>
>>       Initializing both X and Y as is done in the Spark implementation
>>       seems
>>       unnecessary unless I'm overlooking some desired side-effect.
>>       Only the factor
>>       matrix which generates the other in the first iteration needs to
>>       be
>>       initialized.
>>
>>       I also found that the product and user RDDs were being rebuilt
>>       many times
>>       over in my tests, even for tiny data sets. By persisting the RDD
>>       returned
>>       from updateFeatures() I was able to avoid a raft of duplicate
>>       computations.
>>       Is there a reason not to do this?
>>
>>       Thanks.
>>
>>
>>
>>       --
>>       View this message in
>> context:http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
>>       -ALS-implementation-tp2567p2704.html
>>       Sent from the Apache Spark User List mailing list archive at
>>       Nabble.com.
>>
>>
>>
>> ____________________________________________________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
>> -ALS-implementation-tp2567p3588.html
>> To unsubscribe from possible bug in Spark's ALS implementation..., click
>> here. NAML
>>
>>
>
> ________________________________
> View this message in context: Re: possible bug in Spark's ALS
> implementation...
>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Nick Pentreath <ni...@gmail.com>.

The heuristic is multiply by number of ratings (i.e., the amount of
training examples for that user/item combo).

While the heuristic is used for CF settings, it is actually just
penalizing, as Sean said, "complex" models more, where we have more
data/connections between objects. I would say this can apply in
collaborative filtering in both explicit and implicit settings, and should
be quite generic and applicable to most settings (e.g. it is similar to
per-term learning rates and regularization in linear models).

@Sean what do you think?

We could add an additional parameter that is boolean as to whether to apply
this weighted regularization, with appropriate documentation as to why it's
there and what it's used for.


On Wed, Apr 2, 2014 at 9:19 AM, Debasish Das <de...@gmail.com>wrote:

> I think multiply by ratings is a heuristic that worked on rating related
> problems like netflix dataset or any other ratings datasets but the scope
> of NMF is much more broad than that....
>
> @Sean please correct me in case you don't agree...
>
> Definitely it's good to add all the rating dataset related optimizations
> but it should not be in the core algorithm but say RatingALS which extends
> upon core ALS...
>
>
>
> On Wed, Apr 2, 2014 at 12:06 AM, Michael Allman <ms...@allman.ms> wrote:
>
>> Hi Nick,
>>
>> I don't have my spark clone in front of me, but OTOH the major
>> differences
>> are/were:
>>
>> 1. Oryx multiplies lambda by alpha.
>>
>> 2. Oryx uses a different matrix inverse algorithm. It maintains a certain
>> symmetry which the Spark algo does not, however I don't think this
>> difference has a real impact on the results.
>>
>> 3. Oryx supports the specification of a convergence threshold for
>> termination of the algorithm, based on delta rmse on a subset of the
>> training set if I recall correctly. I've been using that as the
>> termination criterion instead of a fixed number of iterations.
>>
>> 4. Oryx uses the weighted regularization scheme you alluded to below,
>> multiplying lambda by the number of ratings.
>>
>> I've patched the spark impl to support (4) but haven't pushed it to my
>> clone on github. I think it would be a valuable feature to support
>> officially. I'd also like to work on (3) but don't have time now. I've
>> only been using Oryx the past couple of weeks.
>>
>> Cheers,
>>
>> Michael
>>
>> On Tue, 1 Apr 2014, Nick Pentreath [via Apache Spark User List] wrote:
>>
>> > Hi Michael
>> > Would you mind setting out exactly what differences you did find
>> between the
>> > Spark and Oryx implementations? Would be good to be clear on them, and
>> also
>> > see if there are further tricks/enhancements from the Oryx one that can
>> be
>> > ported (such as the lambda * numRatings adjustment).
>> >
>> > N
>> >
>> >
>> > On Sat, Mar 15, 2014 at 2:52 AM, Michael Allman <[hidden email]> wrote:
>> >       I've been thoroughly investigating this issue over the past
>> >       couple of days
>> >       and have discovered quite a bit. For one thing, there is
>> >       definitely (at
>> >       least) one issue/bug in the Spark implementation that leads to
>> >       incorrect
>> >       results for models generated with rank > 1 or a large number of
>> >       iterations.
>> >       I will post a bug report with a thorough explanation this
>> >       weekend or on
>> >       Monday.
>> >
>> >       I believe I've been able to track down every difference between
>> >       the Spark
>> >       and Oryx implementations that lead to difference results. I made
>> >       some
>> >       adjustments to the spark implementation so that, given the same
>> >       initial
>> >       product/item vectors, the resulting model is identical to the
>> >       one produced
>> >       by Oryx within a small numerical tolerance. I've verified this
>> >       for small
>> >       data sets and am working on verifying this with some large data
>> >       sets.
>> >
>> >       Aside from those already identified in this thread, another
>> >       significant
>> >       difference in the Spark implementation is that it begins the
>> >       factorization
>> >       process by computing the product matrix (Y) from the initial
>> >       user matrix
>> >       (X). Both of the papers on ALS referred to in this thread begin
>> >       the process
>> >       by computing the user matrix. I haven't done any testing
>> >       comparing the
>> >       models generated starting from Y or X, but they are very
>> >       different. Is there
>> >       a reason Spark begins the iteration by computing Y?
>> >
>> >       Initializing both X and Y as is done in the Spark implementation
>> >       seems
>> >       unnecessary unless I'm overlooking some desired side-effect.
>> >       Only the factor
>> >       matrix which generates the other in the first iteration needs to
>> >       be
>> >       initialized.
>> >
>> >       I also found that the product and user RDDs were being rebuilt
>> >       many times
>> >       over in my tests, even for tiny data sets. By persisting the RDD
>> >       returned
>> >       from updateFeatures() I was able to avoid a raft of duplicate
>> >       computations.
>> >       Is there a reason not to do this?
>> >
>> >       Thanks.
>> >
>> >
>> >
>> >       --
>> >       View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
>> >       -ALS-implementation-tp2567p2704.html
>> >       Sent from the Apache Spark User List mailing list archive at
>> >       Nabble.com.
>> >
>> >
>> >
>> ____________________________________________________________________________
>>
>> > If you reply to this email, your message will be added to the
>> discussion
>> > below:
>> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
>> > -ALS-implementation-tp2567p3588.html
>> > To unsubscribe from possible bug in Spark's ALS implementation...,
>> click
>> > here. NAML
>> >
>> >
>>
>> ------------------------------
>> View this message in context: Re: possible bug in Spark's ALS
>> implementation...<http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p3619.html>
>>
>> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>>
>
>

Re: possible bug in Spark's ALS implementation...

Posted by Debasish Das <de...@gmail.com>.

I think multiply by ratings is a heuristic that worked on rating related
problems like netflix dataset or any other ratings datasets but the scope
of NMF is much more broad than that....

@Sean please correct me in case you don't agree...

Definitely it's good to add all the rating dataset related optimizations
but it should not be in the core algorithm but say RatingALS which extends
upon core ALS...



On Wed, Apr 2, 2014 at 12:06 AM, Michael Allman <ms...@allman.ms> wrote:

> Hi Nick,
>
> I don't have my spark clone in front of me, but OTOH the major differences
> are/were:
>
> 1. Oryx multiplies lambda by alpha.
>
> 2. Oryx uses a different matrix inverse algorithm. It maintains a certain
> symmetry which the Spark algo does not, however I don't think this
> difference has a real impact on the results.
>
> 3. Oryx supports the specification of a convergence threshold for
> termination of the algorithm, based on delta rmse on a subset of the
> training set if I recall correctly. I've been using that as the
> termination criterion instead of a fixed number of iterations.
>
> 4. Oryx uses the weighted regularization scheme you alluded to below,
> multiplying lambda by the number of ratings.
>
> I've patched the spark impl to support (4) but haven't pushed it to my
> clone on github. I think it would be a valuable feature to support
> officially. I'd also like to work on (3) but don't have time now. I've
> only been using Oryx the past couple of weeks.
>
> Cheers,
>
> Michael
>
> On Tue, 1 Apr 2014, Nick Pentreath [via Apache Spark User List] wrote:
>
> > Hi Michael
> > Would you mind setting out exactly what differences you did find between
> the
> > Spark and Oryx implementations? Would be good to be clear on them, and
> also
> > see if there are further tricks/enhancements from the Oryx one that can
> be
> > ported (such as the lambda * numRatings adjustment).
> >
> > N
> >
> >
> > On Sat, Mar 15, 2014 at 2:52 AM, Michael Allman <[hidden email]> wrote:
> >       I've been thoroughly investigating this issue over the past
> >       couple of days
> >       and have discovered quite a bit. For one thing, there is
> >       definitely (at
> >       least) one issue/bug in the Spark implementation that leads to
> >       incorrect
> >       results for models generated with rank > 1 or a large number of
> >       iterations.
> >       I will post a bug report with a thorough explanation this
> >       weekend or on
> >       Monday.
> >
> >       I believe I've been able to track down every difference between
> >       the Spark
> >       and Oryx implementations that lead to difference results. I made
> >       some
> >       adjustments to the spark implementation so that, given the same
> >       initial
> >       product/item vectors, the resulting model is identical to the
> >       one produced
> >       by Oryx within a small numerical tolerance. I've verified this
> >       for small
> >       data sets and am working on verifying this with some large data
> >       sets.
> >
> >       Aside from those already identified in this thread, another
> >       significant
> >       difference in the Spark implementation is that it begins the
> >       factorization
> >       process by computing the product matrix (Y) from the initial
> >       user matrix
> >       (X). Both of the papers on ALS referred to in this thread begin
> >       the process
> >       by computing the user matrix. I haven't done any testing
> >       comparing the
> >       models generated starting from Y or X, but they are very
> >       different. Is there
> >       a reason Spark begins the iteration by computing Y?
> >
> >       Initializing both X and Y as is done in the Spark implementation
> >       seems
> >       unnecessary unless I'm overlooking some desired side-effect.
> >       Only the factor
> >       matrix which generates the other in the first iteration needs to
> >       be
> >       initialized.
> >
> >       I also found that the product and user RDDs were being rebuilt
> >       many times
> >       over in my tests, even for tiny data sets. By persisting the RDD
> >       returned
> >       from updateFeatures() I was able to avoid a raft of duplicate
> >       computations.
> >       Is there a reason not to do this?
> >
> >       Thanks.
> >
> >
> >
> >       --
> >       View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
> >       -ALS-implementation-tp2567p2704.html
> >       Sent from the Apache Spark User List mailing list archive at
> >       Nabble.com.
> >
> >
> >
> ____________________________________________________________________________
>
> > If you reply to this email, your message will be added to the discussion
> > below:
> >
> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
> > -ALS-implementation-tp2567p3588.html
> > To unsubscribe from possible bug in Spark's ALS implementation..., click
> > here. NAML
> >
> >
>
> ------------------------------
> View this message in context: Re: possible bug in Spark's ALS
> implementation...<http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p3619.html>
>
> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Re: possible bug in Spark's ALS implementation...

Posted by Michael Allman <ms...@allman.ms>.

Hi Nick,

I don't have my spark clone in front of me, but OTOH the major differences 
are/were:

1. Oryx multiplies lambda by alpha.

2. Oryx uses a different matrix inverse algorithm. It maintains a certain 
symmetry which the Spark algo does not, however I don't think this 
difference has a real impact on the results.

3. Oryx supports the specification of a convergence threshold for 
termination of the algorithm, based on delta rmse on a subset of the 
training set if I recall correctly. I've been using that as the 
termination criterion instead of a fixed number of iterations.

4. Oryx uses the weighted regularization scheme you alluded to below, 
multiplying lambda by the number of ratings.

I've patched the spark impl to support (4) but haven't pushed it to my 
clone on github. I think it would be a valuable feature to support 
officially. I'd also like to work on (3) but don't have time now. I've 
only been using Oryx the past couple of weeks.

Cheers,

Michael

On Tue, 1 Apr 2014, Nick Pentreath [via Apache Spark User List] wrote:

> Hi Michael
> Would you mind setting out exactly what differences you did find between the
> Spark and Oryx implementations? Would be good to be clear on them, and also
> see if there are further tricks/enhancements from the Oryx one that can be
> ported (such as the lambda * numRatings adjustment).
> 
> N
> 
> 
> On Sat, Mar 15, 2014 at 2:52 AM, Michael Allman <[hidden email]> wrote:
>       I've been thoroughly investigating this issue over the past
>       couple of days
>       and have discovered quite a bit. For one thing, there is
>       definitely (at
>       least) one issue/bug in the Spark implementation that leads to
>       incorrect
>       results for models generated with rank > 1 or a large number of
>       iterations.
>       I will post a bug report with a thorough explanation this
>       weekend or on
>       Monday.
>
>       I believe I've been able to track down every difference between
>       the Spark
>       and Oryx implementations that lead to difference results. I made
>       some
>       adjustments to the spark implementation so that, given the same
>       initial
>       product/item vectors, the resulting model is identical to the
>       one produced
>       by Oryx within a small numerical tolerance. I've verified this
>       for small
>       data sets and am working on verifying this with some large data
>       sets.
>
>       Aside from those already identified in this thread, another
>       significant
>       difference in the Spark implementation is that it begins the
>       factorization
>       process by computing the product matrix (Y) from the initial
>       user matrix
>       (X). Both of the papers on ALS referred to in this thread begin
>       the process
>       by computing the user matrix. I haven't done any testing
>       comparing the
>       models generated starting from Y or X, but they are very
>       different. Is there
>       a reason Spark begins the iteration by computing Y?
>
>       Initializing both X and Y as is done in the Spark implementation
>       seems
>       unnecessary unless I'm overlooking some desired side-effect.
>       Only the factor
>       matrix which generates the other in the first iteration needs to
>       be
>       initialized.
>
>       I also found that the product and user RDDs were being rebuilt
>       many times
>       over in my tests, even for tiny data sets. By persisting the RDD
>       returned
>       from updateFeatures() I was able to avoid a raft of duplicate
>       computations.
>       Is there a reason not to do this?
>
>       Thanks.
> 
> 
>
>       --
>       View this message in context:http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
>       -ALS-implementation-tp2567p2704.html
>       Sent from the Apache Spark User List mailing list archive at
>       Nabble.com.
> 
> 
> ____________________________________________________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
> -ALS-implementation-tp2567p3588.html
> To unsubscribe from possible bug in Spark's ALS implementation..., click
> here. NAML
> 
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p3619.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Nick Pentreath <ni...@gmail.com>.

Hi Michael

Would you mind setting out exactly what differences you did find between
the Spark and Oryx implementations? Would be good to be clear on them, and
also see if there are further tricks/enhancements from the Oryx one that
can be ported (such as the lambda * numRatings adjustment).

N


On Sat, Mar 15, 2014 at 2:52 AM, Michael Allman <ms...@allman.ms> wrote:

> I've been thoroughly investigating this issue over the past couple of days
> and have discovered quite a bit. For one thing, there is definitely (at
> least) one issue/bug in the Spark implementation that leads to incorrect
> results for models generated with rank > 1 or a large number of iterations.
> I will post a bug report with a thorough explanation this weekend or on
> Monday.
>
> I believe I've been able to track down every difference between the Spark
> and Oryx implementations that lead to difference results. I made some
> adjustments to the spark implementation so that, given the same initial
> product/item vectors, the resulting model is identical to the one produced
> by Oryx within a small numerical tolerance. I've verified this for small
> data sets and am working on verifying this with some large data sets.
>
> Aside from those already identified in this thread, another significant
> difference in the Spark implementation is that it begins the factorization
> process by computing the product matrix (Y) from the initial user matrix
> (X). Both of the papers on ALS referred to in this thread begin the process
> by computing the user matrix. I haven't done any testing comparing the
> models generated starting from Y or X, but they are very different. Is
> there
> a reason Spark begins the iteration by computing Y?
>
> Initializing both X and Y as is done in the Spark implementation seems
> unnecessary unless I'm overlooking some desired side-effect. Only the
> factor
> matrix which generates the other in the first iteration needs to be
> initialized.
>
> I also found that the product and user RDDs were being rebuilt many times
> over in my tests, even for tiny data sets. By persisting the RDD returned
> from updateFeatures() I was able to avoid a raft of duplicate computations.
> Is there a reason not to do this?
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2704.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: possible bug in Spark's ALS implementation...

Posted by Michael Allman <ms...@allman.ms>.

I've been thoroughly investigating this issue over the past couple of days
and have discovered quite a bit. For one thing, there is definitely (at
least) one issue/bug in the Spark implementation that leads to incorrect
results for models generated with rank > 1 or a large number of iterations.
I will post a bug report with a thorough explanation this weekend or on
Monday.

I believe I've been able to track down every difference between the Spark
and Oryx implementations that lead to difference results. I made some
adjustments to the spark implementation so that, given the same initial
product/item vectors, the resulting model is identical to the one produced
by Oryx within a small numerical tolerance. I've verified this for small
data sets and am working on verifying this with some large data sets.

Aside from those already identified in this thread, another significant
difference in the Spark implementation is that it begins the factorization
process by computing the product matrix (Y) from the initial user matrix
(X). Both of the papers on ALS referred to in this thread begin the process
by computing the user matrix. I haven't done any testing comparing the
models generated starting from Y or X, but they are very different. Is there
a reason Spark begins the iteration by computing Y?

Initializing both X and Y as is done in the Spark implementation seems
unnecessary unless I'm overlooking some desired side-effect. Only the factor
matrix which generates the other in the first iteration needs to be
initialized.

I also found that the product and user RDDs were being rebuilt many times
over in my tests, even for tiny data sets. By persisting the RDD returned
from updateFeatures() I was able to avoid a raft of duplicate computations.
Is there a reason not to do this?

Thanks.

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Sean Owen <so...@cloudera.com>.

Ah, thank you, I had actually forgotten about this and this is indeed
probably a difference. This is from the other paper I cited:
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf

It's the "WR" in "ALS-WR" -- weighted regularization. I suppose the
intuition is that you penalize complex explanations of prolific users
and items proportionally more.

The paper claims it helps and I also found it did. That could be the difference.
--
Sean Owen | Director, Data Science | London

On Thu, Mar 13, 2014 at 2:30 AM, Michael Allman <ms...@allman.ms> wrote:
> Hi Sean,
>
> Digging deeper I've found another difference between Oryx's implementation
> and Spark's. Why do you adjust lambda here?
>
> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java#L491
>
> Cheers,
>
> Michael
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2636.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Michael Allman <ms...@allman.ms>.

Hi Sean,

Digging deeper I've found another difference between Oryx's implementation
and Spark's. Why do you adjust lambda here?

https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java#L491

Cheers,

Michael



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2636.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Michael Allman <ms...@allman.ms>.

Thank you everyone for your feedback. It's been very helpful, and though I
still haven't found the cause of the difference between Spark and Oryx, I
feel I'm making progress.

Xiangrui asked me to create a ticket for this issue. The reason I didn't do
this originally is because it's not clear to me yet that this is a bug or a
mistake on my part. I'd like to see where this conversation goes and then
file a more clearcut issue if applicable.

Sean pointed out that Oryx differs in its use of the regularization
parameter lambda. I'm aware of this and have been compensating for this
difference from the start. Also, the handling of negative values is indeed
irrelevant as I have none in my data.

After reviewing Sean's analysis and running some calculations in the
console, I agree that the Spark code does compute YtCuY correctly.

Regarding testing, I'm computing EPR on a test set as outlined in the paper.
I'm training on three weeks of data and testing on the following week. I
recently updated my data sets and rebuilt and tested the new models. The
results were inconclusive in that both models scored about the same.

I'm continuing to investigate the source of the wide difference in
recommendations between implementations. I will reply with my findings when
I have something more definitive.

Cheers and thanks again.

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2632.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: possible bug in Spark's ALS implementation...

Posted by Sean Owen <so...@cloudera.com>.

On Wed, Mar 12, 2014 at 7:36 AM, Nick Pentreath
<ni...@gmail.com> wrote:
> @Sean, would it be a good idea to look at changing the regularization in
> Spark's ALS to alpha * lambda? What is the thinking behind this? If I
> recall, the Mahout version added something like (# ratings * lambda) as
> regularization in each factor update (for explicit), but implicit it was
> just lambda (I may be wrong here).

I also used a different default alpha than the one suggested in the
paper: 1, instead of 40. But so does MLlib. And if alpha = 1, the
variation I mention here has no effect.

The idea was that alpha "is supposed to" control how much more weight
a known user-item value gets in the factorization. The weight is "1 +
alpha*r" for nonzero r, and of course "1" otherwise, and alpha can
make the difference larger.

But large alpha has the side-effect of making the regularization terms
relatively smaller in the cost function. This dual effect seemed
undesirable. So: multiply the regularization term by alpha too to
disconnect these effects.

Other ALS papers like
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
again use a different definition of lambda by stuffing something else
into it. So the absolute value of lambda is already different in
different contexts.

So depending on Michael's settings this could be a red herring but
worth checking. The only other variation was in choosing the random
initial state but that too is the same now in both implementations (at
least in HEAD). The initial state really shouldn't matter so much. I
can't think of other variations.

Michael what was your eval metric?

Re: possible bug in Spark's ALS implementation...

Posted by Nick Pentreath <ni...@gmail.com>.

It would be helpful to know what parameter inputs you are using.

If the regularization schemes are different (by a factor of alpha, which
can often be quite high) this will mean that the same parameter settings
could give very different results. A higher lambda would be required with
Spark's version to be comparable.

When I submitted the PR for this, I verified (on ml-100k, ml-1m and ml-10m
data) that this version gives the same RMSE as Mahout's implicit model, as
well as a separate Spark version that I wrote that was a from-scratch port
of the Mahout algorithm (though I didn't compare vs Myrrix/Oryx). I'm
fairly confident things are correct but if there is a bug let's definitely
find and fix it!

@Sean, would it be a good idea to look at changing the regularization in
Spark's ALS to alpha * lambda? What is the thinking behind this? If I
recall, the Mahout version added something like (# ratings * lambda) as
regularization in each factor update (for explicit), but implicit it was
just lambda (I may be wrong here).



On Wed, Mar 12, 2014 at 4:57 AM, Xiangrui Meng <me...@gmail.com> wrote:

> Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i
> x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some
> metrics to tell which recommendation is better? -Xiangrui
>
> On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng <me...@gmail.com> wrote:
> > Hi Michael,
> >
> > I can help check the current implementation. Would you please go to
> > https://spark-project.atlassian.net/browse/SPARK and create a ticket
> > about this issue with component "MLlib"? Thanks!
> >
> > Best,
> > Xiangrui
> >
> > On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman <ms...@allman.ms> wrote:
> >> Hi,
> >>
> >> I'm implementing a recommender based on the algorithm described in
> >> http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms
> the
> >> basis for Spark's ALS implementation for data sets with implicit
> features.
> >> The data set I'm working with is proprietary and I cannot share it,
> however
> >> I can say that it's based on the same kind of data in the
> paper---relative
> >> viewing time of videos. (Specifically, the "rating" for each video is
> >> defined as total viewing time across all visitors divided by video
> >> duration).
> >>
> >> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
> >> comparison, I've run the training data through Oryx's in-VM
> implementation
> >> of implicit ALS with the same parameters. Oryx uses the same algorithm.
> >> (Source in this file:
> >>
> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java
> )
> >>
> >> The recommendations made by each system compared to one other are very
> >> different---moreso than I think could be explained by differences in
> initial
> >> state. The recommendations made by the Oryx models look much better,
> >> especially as I increase the number of latent factors and the
> iterations.
> >> The Spark models' recommendations don't improve with increases in either
> >> latent factors or iterations. Sometimes, they get worse.
> >>
> >> Because of the (understandably) highly-optimized and terse style of
> Spark's
> >> ALS implementation, I've had a very hard time following it well enough
> to
> >> debug the issue definitively. However, I have found a section of code
> that
> >> looks incorrect. As described in the paper, part of the implicit ALS
> >> algorithm involves computing a matrix product YtCuY (equation 4 in the
> >> paper). To optimize this computation, this expression is rewritten as
> YtY +
> >> Yt(Cu - I)Y. I believe that's what should be happening here:
> >>
> >>
> https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376
> >>
> >> However, it looks like this code is in fact computing YtY + YtY(Cu - I),
> >> which is the same as YtYCu. If so, that's a bug. Can someone familiar
> with
> >> this code evaluate my claim?
> >>
> >> Cheers,
> >>
> >> Michael
>

Re: possible bug in Spark's ALS implementation...

Posted by Xiangrui Meng <me...@gmail.com>.

Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i
x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some
metrics to tell which recommendation is better? -Xiangrui

On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng <me...@gmail.com> wrote:
> Hi Michael,
>
> I can help check the current implementation. Would you please go to
> https://spark-project.atlassian.net/browse/SPARK and create a ticket
> about this issue with component "MLlib"? Thanks!
>
> Best,
> Xiangrui
>
> On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman <ms...@allman.ms> wrote:
>> Hi,
>>
>> I'm implementing a recommender based on the algorithm described in
>> http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the
>> basis for Spark's ALS implementation for data sets with implicit features.
>> The data set I'm working with is proprietary and I cannot share it, however
>> I can say that it's based on the same kind of data in the paper---relative
>> viewing time of videos. (Specifically, the "rating" for each video is
>> defined as total viewing time across all visitors divided by video
>> duration).
>>
>> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
>> comparison, I've run the training data through Oryx's in-VM implementation
>> of implicit ALS with the same parameters. Oryx uses the same algorithm.
>> (Source in this file:
>> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java)
>>
>> The recommendations made by each system compared to one other are very
>> different---moreso than I think could be explained by differences in initial
>> state. The recommendations made by the Oryx models look much better,
>> especially as I increase the number of latent factors and the iterations.
>> The Spark models' recommendations don't improve with increases in either
>> latent factors or iterations. Sometimes, they get worse.
>>
>> Because of the (understandably) highly-optimized and terse style of Spark's
>> ALS implementation, I've had a very hard time following it well enough to
>> debug the issue definitively. However, I have found a section of code that
>> looks incorrect. As described in the paper, part of the implicit ALS
>> algorithm involves computing a matrix product YtCuY (equation 4 in the
>> paper). To optimize this computation, this expression is rewritten as YtY +
>> Yt(Cu - I)Y. I believe that's what should be happening here:
>>
>> https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376
>>
>> However, it looks like this code is in fact computing YtY + YtY(Cu - I),
>> which is the same as YtYCu. If so, that's a bug. Can someone familiar with
>> this code evaluate my claim?
>>
>> Cheers,
>>
>> Michael

Re: possible bug in Spark's ALS implementation...

Posted by Xiangrui Meng <me...@gmail.com>.

Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i
x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some
metrics to tell which recommendation is better? -Xiangrui

On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng <me...@gmail.com> wrote:
> Hi Michael,
>
> I can help check the current implementation. Would you please go to
> https://spark-project.atlassian.net/browse/SPARK and create a ticket
> about this issue with component "MLlib"? Thanks!
>
> Best,
> Xiangrui
>
> On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman <ms...@allman.ms> wrote:
>> Hi,
>>
>> I'm implementing a recommender based on the algorithm described in
>> http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the
>> basis for Spark's ALS implementation for data sets with implicit features.
>> The data set I'm working with is proprietary and I cannot share it, however
>> I can say that it's based on the same kind of data in the paper---relative
>> viewing time of videos. (Specifically, the "rating" for each video is
>> defined as total viewing time across all visitors divided by video
>> duration).
>>
>> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
>> comparison, I've run the training data through Oryx's in-VM implementation
>> of implicit ALS with the same parameters. Oryx uses the same algorithm.
>> (Source in this file:
>> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java)
>>
>> The recommendations made by each system compared to one other are very
>> different---moreso than I think could be explained by differences in initial
>> state. The recommendations made by the Oryx models look much better,
>> especially as I increase the number of latent factors and the iterations.
>> The Spark models' recommendations don't improve with increases in either
>> latent factors or iterations. Sometimes, they get worse.
>>
>> Because of the (understandably) highly-optimized and terse style of Spark's
>> ALS implementation, I've had a very hard time following it well enough to
>> debug the issue definitively. However, I have found a section of code that
>> looks incorrect. As described in the paper, part of the implicit ALS
>> algorithm involves computing a matrix product YtCuY (equation 4 in the
>> paper). To optimize this computation, this expression is rewritten as YtY +
>> Yt(Cu - I)Y. I believe that's what should be happening here:
>>
>> https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376
>>
>> However, it looks like this code is in fact computing YtY + YtY(Cu - I),
>> which is the same as YtYCu. If so, that's a bug. Can someone familiar with
>> this code evaluate my claim?
>>
>> Cheers,
>>
>> Michael

Re: possible bug in Spark's ALS implementation...

Posted by Xiangrui Meng <me...@gmail.com>.

Hi Michael,

I can help check the current implementation. Would you please go to
https://spark-project.atlassian.net/browse/SPARK and create a ticket
about this issue with component "MLlib"? Thanks!

Best,
Xiangrui

On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman <ms...@allman.ms> wrote:
> Hi,
>
> I'm implementing a recommender based on the algorithm described in
> http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the
> basis for Spark's ALS implementation for data sets with implicit features.
> The data set I'm working with is proprietary and I cannot share it, however
> I can say that it's based on the same kind of data in the paper---relative
> viewing time of videos. (Specifically, the "rating" for each video is
> defined as total viewing time across all visitors divided by video
> duration).
>
> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
> comparison, I've run the training data through Oryx's in-VM implementation
> of implicit ALS with the same parameters. Oryx uses the same algorithm.
> (Source in this file:
> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java)
>
> The recommendations made by each system compared to one other are very
> different---moreso than I think could be explained by differences in initial
> state. The recommendations made by the Oryx models look much better,
> especially as I increase the number of latent factors and the iterations.
> The Spark models' recommendations don't improve with increases in either
> latent factors or iterations. Sometimes, they get worse.
>
> Because of the (understandably) highly-optimized and terse style of Spark's
> ALS implementation, I've had a very hard time following it well enough to
> debug the issue definitively. However, I have found a section of code that
> looks incorrect. As described in the paper, part of the implicit ALS
> algorithm involves computing a matrix product YtCuY (equation 4 in the
> paper). To optimize this computation, this expression is rewritten as YtY +
> Yt(Cu - I)Y. I believe that's what should be happening here:
>
> https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376
>
> However, it looks like this code is in fact computing YtY + YtY(Cu - I),
> which is the same as YtYCu. If so, that's a bug. Can someone familiar with
> this code evaluate my claim?
>
> Cheers,
>
> Michael

Re: possible bug in Spark's ALS implementation...

Posted by Xiangrui Meng <me...@gmail.com>.

Hi Michael,

I can help check the current implementation. Would you please go to
https://spark-project.atlassian.net/browse/SPARK and create a ticket
about this issue with component "MLlib"? Thanks!

Best,
Xiangrui

On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman <ms...@allman.ms> wrote:
> Hi,
>
> I'm implementing a recommender based on the algorithm described in
> http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the
> basis for Spark's ALS implementation for data sets with implicit features.
> The data set I'm working with is proprietary and I cannot share it, however
> I can say that it's based on the same kind of data in the paper---relative
> viewing time of videos. (Specifically, the "rating" for each video is
> defined as total viewing time across all visitors divided by video
> duration).
>
> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
> comparison, I've run the training data through Oryx's in-VM implementation
> of implicit ALS with the same parameters. Oryx uses the same algorithm.
> (Source in this file:
> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java)
>
> The recommendations made by each system compared to one other are very
> different---moreso than I think could be explained by differences in initial
> state. The recommendations made by the Oryx models look much better,
> especially as I increase the number of latent factors and the iterations.
> The Spark models' recommendations don't improve with increases in either
> latent factors or iterations. Sometimes, they get worse.
>
> Because of the (understandably) highly-optimized and terse style of Spark's
> ALS implementation, I've had a very hard time following it well enough to
> debug the issue definitively. However, I have found a section of code that
> looks incorrect. As described in the paper, part of the implicit ALS
> algorithm involves computing a matrix product YtCuY (equation 4 in the
> paper). To optimize this computation, this expression is rewritten as YtY +
> Yt(Cu - I)Y. I believe that's what should be happening here:
>
> https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376
>
> However, it looks like this code is in fact computing YtY + YtY(Cu - I),
> which is the same as YtYCu. If so, that's a bug. Can someone familiar with
> this code evaluate my claim?
>
> Cheers,
>
> Michael