You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Albert Manyà <al...@eml.cc> on 2014/12/15 17:33:36 UTC

Serialize mllib's MatrixFactorizationModel

Hi all.

I'm willing to serialize and later load a model trained using mllib's
ALS.

I've tried usign Java serialization with something like:

    val model = ALS.trainImplicit(training, rank, numIter, lambda, 1)
    val fos = new FileOutputStream("model.bin")
    val oos = new ObjectOutputStream(fos)
    oos.writeObject(bestModel.get)

But when I try to deserialize it using:

    val fos = new FileInputStream("model.bin")
    val oos = new ObjectInputStream(fos)
    val model = oos.readObject().asInstanceOf[MatrixFactorizationModel]

 I get the error:

Exception in thread "main" java.io.IOException: PARSING_ERROR(2)

I've also tried to serialize MatrixFactorizationModel's both RDDs
(products and users) and later create the MatrixFactorizationModel by
hand passing the RDDs by constructor but I get an error cause its
private:

Error:(58, 17) constructor MatrixFactorizationModel in class
MatrixFactorizationModel cannot be accessed in object RecommendALS
    val model = new MatrixFactorizationModel (8, userFeatures,
    productFeatures)

Any ideas?

Thanks!

-- 
  Albert Manyà
  albertmp@eml.cc

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Serialize mllib's MatrixFactorizationModel

Posted by sourabh chaki <ch...@gmail.com>.
Hi Albert,
There is some discussion going on here:
http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tc20324.html#a20674
I am also looking for this solution.But looks like until mllib pmml export
is ready, there is no full proof solution to export the mllib trained model
to a different system.

Thanks
Sourabh

On Mon, Dec 15, 2014 at 10:39 PM, Albert Manyà <al...@eml.cc> wrote:
>
> In that case, what is the strategy to train a model in some background
> batch process and make recommendations for some other service in real
> time? Run both processes in the same spark cluster?
>
> Thanks.
>
> --
>   Albert Manyà
>   albertmp@eml.cc
>
> On Mon, Dec 15, 2014, at 05:58 PM, Sean Owen wrote:
> > This class is not going to be serializable, as it contains huge RDDs.
> > Even if the right constructor existed the RDDs inside would not
> > serialize.
> >
> > On Mon, Dec 15, 2014 at 4:33 PM, Albert Manyà <al...@eml.cc> wrote:
> > > Hi all.
> > >
> > > I'm willing to serialize and later load a model trained using mllib's
> > > ALS.
> > >
> > > I've tried usign Java serialization with something like:
> > >
> > >     val model = ALS.trainImplicit(training, rank, numIter, lambda, 1)
> > >     val fos = new FileOutputStream("model.bin")
> > >     val oos = new ObjectOutputStream(fos)
> > >     oos.writeObject(bestModel.get)
> > >
> > > But when I try to deserialize it using:
> > >
> > >     val fos = new FileInputStream("model.bin")
> > >     val oos = new ObjectInputStream(fos)
> > >     val model = oos.readObject().asInstanceOf[MatrixFactorizationModel]
> > >
> > >  I get the error:
> > >
> > > Exception in thread "main" java.io.IOException: PARSING_ERROR(2)
> > >
> > > I've also tried to serialize MatrixFactorizationModel's both RDDs
> > > (products and users) and later create the MatrixFactorizationModel by
> > > hand passing the RDDs by constructor but I get an error cause its
> > > private:
> > >
> > > Error:(58, 17) constructor MatrixFactorizationModel in class
> > > MatrixFactorizationModel cannot be accessed in object RecommendALS
> > >     val model = new MatrixFactorizationModel (8, userFeatures,
> > >     productFeatures)
> > >
> > > Any ideas?
> > >
> > > Thanks!
> > >
> > > --
> > >   Albert Manyà
> > >   albertmp@eml.cc
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > > For additional commands, e-mail: user-help@spark.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Serialize mllib's MatrixFactorizationModel

Posted by Sean Owen <so...@cloudera.com>.
The thing about MatrixFactorizationModel, compared to other models, is
that it is huge. It's not just a few coefficients, but whole RDDs of
coefficients. I think you could save these RDDs of user/product
factors to persistent storage, load them, then recreate the
MatrixFactorizationModel that way. It's a bit manual, but works.

This is probably why there is no standard PMML representation for this
type of model. It is different from classic regression/classification
models, and too big for XML. So efforts to export/import PMML are not
relevant IMHO.

On Mon, Dec 15, 2014 at 5:09 PM, Albert Manyà <al...@eml.cc> wrote:
> In that case, what is the strategy to train a model in some background
> batch process and make recommendations for some other service in real
> time? Run both processes in the same spark cluster?
>
> Thanks.
>
> --
>   Albert Manyà
>   albertmp@eml.cc
>
> On Mon, Dec 15, 2014, at 05:58 PM, Sean Owen wrote:
>> This class is not going to be serializable, as it contains huge RDDs.
>> Even if the right constructor existed the RDDs inside would not
>> serialize.
>>
>> On Mon, Dec 15, 2014 at 4:33 PM, Albert Manyà <al...@eml.cc> wrote:
>> > Hi all.
>> >
>> > I'm willing to serialize and later load a model trained using mllib's
>> > ALS.
>> >
>> > I've tried usign Java serialization with something like:
>> >
>> >     val model = ALS.trainImplicit(training, rank, numIter, lambda, 1)
>> >     val fos = new FileOutputStream("model.bin")
>> >     val oos = new ObjectOutputStream(fos)
>> >     oos.writeObject(bestModel.get)
>> >
>> > But when I try to deserialize it using:
>> >
>> >     val fos = new FileInputStream("model.bin")
>> >     val oos = new ObjectInputStream(fos)
>> >     val model = oos.readObject().asInstanceOf[MatrixFactorizationModel]
>> >
>> >  I get the error:
>> >
>> > Exception in thread "main" java.io.IOException: PARSING_ERROR(2)
>> >
>> > I've also tried to serialize MatrixFactorizationModel's both RDDs
>> > (products and users) and later create the MatrixFactorizationModel by
>> > hand passing the RDDs by constructor but I get an error cause its
>> > private:
>> >
>> > Error:(58, 17) constructor MatrixFactorizationModel in class
>> > MatrixFactorizationModel cannot be accessed in object RecommendALS
>> >     val model = new MatrixFactorizationModel (8, userFeatures,
>> >     productFeatures)
>> >
>> > Any ideas?
>> >
>> > Thanks!
>> >
>> > --
>> >   Albert Manyà
>> >   albertmp@eml.cc
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: user-help@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Serialize mllib's MatrixFactorizationModel

Posted by Albert Manyà <al...@eml.cc>.
In that case, what is the strategy to train a model in some background
batch process and make recommendations for some other service in real
time? Run both processes in the same spark cluster?

Thanks.

-- 
  Albert Manyà
  albertmp@eml.cc

On Mon, Dec 15, 2014, at 05:58 PM, Sean Owen wrote:
> This class is not going to be serializable, as it contains huge RDDs.
> Even if the right constructor existed the RDDs inside would not
> serialize.
> 
> On Mon, Dec 15, 2014 at 4:33 PM, Albert Manyà <al...@eml.cc> wrote:
> > Hi all.
> >
> > I'm willing to serialize and later load a model trained using mllib's
> > ALS.
> >
> > I've tried usign Java serialization with something like:
> >
> >     val model = ALS.trainImplicit(training, rank, numIter, lambda, 1)
> >     val fos = new FileOutputStream("model.bin")
> >     val oos = new ObjectOutputStream(fos)
> >     oos.writeObject(bestModel.get)
> >
> > But when I try to deserialize it using:
> >
> >     val fos = new FileInputStream("model.bin")
> >     val oos = new ObjectInputStream(fos)
> >     val model = oos.readObject().asInstanceOf[MatrixFactorizationModel]
> >
> >  I get the error:
> >
> > Exception in thread "main" java.io.IOException: PARSING_ERROR(2)
> >
> > I've also tried to serialize MatrixFactorizationModel's both RDDs
> > (products and users) and later create the MatrixFactorizationModel by
> > hand passing the RDDs by constructor but I get an error cause its
> > private:
> >
> > Error:(58, 17) constructor MatrixFactorizationModel in class
> > MatrixFactorizationModel cannot be accessed in object RecommendALS
> >     val model = new MatrixFactorizationModel (8, userFeatures,
> >     productFeatures)
> >
> > Any ideas?
> >
> > Thanks!
> >
> > --
> >   Albert Manyà
> >   albertmp@eml.cc
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Serialize mllib's MatrixFactorizationModel

Posted by Sean Owen <so...@cloudera.com>.
This class is not going to be serializable, as it contains huge RDDs.
Even if the right constructor existed the RDDs inside would not
serialize.

On Mon, Dec 15, 2014 at 4:33 PM, Albert Manyà <al...@eml.cc> wrote:
> Hi all.
>
> I'm willing to serialize and later load a model trained using mllib's
> ALS.
>
> I've tried usign Java serialization with something like:
>
>     val model = ALS.trainImplicit(training, rank, numIter, lambda, 1)
>     val fos = new FileOutputStream("model.bin")
>     val oos = new ObjectOutputStream(fos)
>     oos.writeObject(bestModel.get)
>
> But when I try to deserialize it using:
>
>     val fos = new FileInputStream("model.bin")
>     val oos = new ObjectInputStream(fos)
>     val model = oos.readObject().asInstanceOf[MatrixFactorizationModel]
>
>  I get the error:
>
> Exception in thread "main" java.io.IOException: PARSING_ERROR(2)
>
> I've also tried to serialize MatrixFactorizationModel's both RDDs
> (products and users) and later create the MatrixFactorizationModel by
> hand passing the RDDs by constructor but I get an error cause its
> private:
>
> Error:(58, 17) constructor MatrixFactorizationModel in class
> MatrixFactorizationModel cannot be accessed in object RecommendALS
>     val model = new MatrixFactorizationModel (8, userFeatures,
>     productFeatures)
>
> Any ideas?
>
> Thanks!
>
> --
>   Albert Manyà
>   albertmp@eml.cc
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org