You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mikael Ståldal <mi...@magine.com> on 2016/11/02 16:53:57 UTC

Load whole ALS MatrixFactorizationModel into memory

import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel


I build a MatrixFactorizationModel with ALS.trainImplicit(), then I save it
with its save method.

Later, in an other process on another machine, I load the model with
MatrixFactorizationModel.load(). Now I want to make a lot of
recommendProducts() calls on the loaded model, and I want them to be quick,
without any I/O. However, they are slow and seem to to I/O each time.

Is there any way to force loading the whole model into memory (that step
can take some time and do I/O) and then be able to do recommendProducts()
on it multiple times, quickly without I/O?

-- 
[image: MagineTV]

*Mikael Ståldal*
Senior software developer

*Magine TV*
mikael.staldal@magine.com
Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com

Privileged and/or Confidential Information may be contained in this
message. If you are not the addressee indicated in this message
(or responsible for delivery of the message to such a person), you may not
copy or deliver this message to anyone. In such case,
you should destroy this message and kindly notify the sender by reply
email.

Re: Load whole ALS MatrixFactorizationModel into memory

Posted by Sean Owen <so...@cloudera.com>.

You can cause the underlying RDDs in the model to be cached in memory. That
would be necessary but not sufficient to make it go fast; it should at
least get rid of a lot of I/O. I think making recommendations one at a time
is never going to scale to moderate load this way; one request means one
entire job to schedule with multiple tasks. Fine for the occasional query
or smallish data, but not a thousand queries per second. For that I think
you'd have to build some custom scoring infrastructure. At least, that's
what I did, so I would say that.

On Wed, Nov 2, 2016 at 4:54 PM Mikael Ståldal <mi...@magine.com>
wrote:

> import org.apache.spark.mllib.recommendation.ALS
> import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
>
>
> I build a MatrixFactorizationModel with ALS.trainImplicit(), then I save
> it with its save method.
>
> Later, in an other process on another machine, I load the model with
> MatrixFactorizationModel.load(). Now I want to make a lot of
> recommendProducts() calls on the loaded model, and I want them to be quick,
> without any I/O. However, they are slow and seem to to I/O each time.
>
> Is there any way to force loading the whole model into memory (that step
> can take some time and do I/O) and then be able to do recommendProducts()
> on it multiple times, quickly without I/O?
>
> --
> [image: MagineTV]
>
> *Mikael Ståldal*
> Senior software developer
>
> *Magine TV*
> mikael.staldal@magine.com
> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com
>
> Privileged and/or Confidential Information may be contained in this
> message. If you are not the addressee indicated in this message
> (or responsible for delivery of the message to such a person), you may not
> copy or deliver this message to anyone. In such case,
> you should destroy this message and kindly notify the sender by reply
> email.
>