You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Nate Wendt <na...@curalate.com> on 2017/11/30 20:03:34 UTC

[Spark ML] : Implement the Conjugate Gradient method for ALS

The conjugate gradient method has been shown to be very efficient at
solving the least squares error problem in matrix factorization:
http://www.benfrederickson.com/fast-implicit-matrix-factorization/.

This post is motivated by:
https://pdfs.semanticscholar.org/bfdf/7af6cf7fd7bb5e6b6db5bbd91be11597eaf0.pdf

Implementing this in Spark could mean a significant speedup in ALS solving
as the order of growth is smaller than the default solver (Cholesky). This
has the potential to improve the training phase of collaborative filtering
significantly.

I've opened a JIRA ticket
<https://issues.apache.org/jira/browse/SPARK-22619> but thought I'd reach
out here as well since I've implemented the algorithm (for implicit
feedback) and demonstrated it's correctness but I am having trouble
actually seeing a performance speedup, likely due to incorrect handling of
RDD persistence/checkpointing. I wasn't sure the best way to reach out to
see if there were dev cycles available to collaborate on completing this
solution but I figure it has the potential to have a big impact within
Spark and MLLib. If there is interest, I can open a pull request with the
functionally correct code I have as of now.

*Note, we are seeing collaborative filtering training times of over 3 hours
within Spark (4 large node instances) compared to ~8 minutes on a single
machine running the Implicit library cited above.  It would be great to get
this kind of speedup within Spark and potentially benefit from the added
parallelism.*

Thanks,
Nathaniel