You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by sourabh <ch...@gmail.com> on 2014/12/04 07:11:19 UTC

MLLIB model export: PMML vs MLLIB serialization

Hi All,
I am doing model training using Spark MLLIB inside our hadoop cluster. But
prediction happens in a different realtime synchronous system(Web
application). I am currently exploring different options to export the
trained Mllib models from spark.

   1. *Export model as PMML:* I found the projects under  JPMML: Java PMML
API <https://github.com/jpmml>   is quite interesting. Use  JPMML
<https://github.com/jpmml/jpmml>   to convert the mllib model entity to
PMML. And use  PMML evaluator <https://github.com/jpmml/jpmml-evaluator>  
for prediction in a different system. Or we can also explore  openscoring
rest api <https://github.com/jpmml/openscoring>   for model deployment and
prediction.

This could be standard approach if we need to port models across different
systems. But converting non linear Mllib models to PMML might be a complex
task. Apart from that I need to keep on updating my Mllib to PMML conversion
code for any new Mllib models or any change in Mllib entities.
I have not evaluated any of these JPMML projects personally and I see there
is only single contributor for these projects. Just wondering if enough
people have already started using these projects. Please share if any of you
have any points on this.

   2. *Export MLLIB model as serialized form:* Mllib models can be
serialized using  Kryo serialization
<http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAFRXrqdpkfCX41=JyTSmmtt8aNWrSdpJvxE3FmYVZ=uUepejGg@mail.gmail.com%3E>  
or normal  java serialization
<http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-td11953.html> 
. And the same model can be deserialized by different other standalone
applications and use the mllib entity for prediction.  This blog
<http://blog.knoldus.com/2014/07/21/play-with-spark-building-spark-mllib-in-a-play-spark-application/>  
shows an example how spark mllib can be used inside Play web application. I
am expecting, I can use spark mllib in any other JVM based web application
in the same way(?). Please share if any one has any experience on this.
  Advantage of this approach is :
     -> No recurring effort to support any new model or any change in Mllib
model entity in future version.
     -> Less dependency on any other tools
  Disadvantages:
     -> Model can not be ported to non JVM system
     -> Model serialized using one version of Mllib entity, may not be
deserializable using a different version of mllib entity(?).

I think this is a quite common problem.I am really interested to hear from
you people how you are solving this and what are the approaches and pros and
cons.

Thanks
Sourabh




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: MLLIB model export: PMML vs MLLIB serialization

Posted by selvinsource <vs...@gmail.com>.
I am going to try to export decision tree next, so far I focused on linear
models and k-means.

Regards,
Vincenzo





sourabh wrote
> Thanks Vincenzo.
> Are you trying out all the models implemented in mllib? Actually I don't
> see decision tree there. Sorry if I missed it. When are you planning to
> merge this to spark branch?
> 
> Thanks
> Sourabh





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20693.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: MLLIB model export: PMML vs MLLIB serialization

Posted by sourabh <ch...@gmail.com>.
Thanks Vincenzo.
Are you trying out all the models implemented in mllib? Actually I don't
see decision tree there. Sorry if I missed it. When are you planning to
merge this to spark branch?

Thanks
Sourabh

On Sun, Dec 14, 2014 at 5:54 PM, selvinsource [via Apache Spark User List] <
ml-node+s1001560n20674h11@n3.nabble.com> wrote:
>
> Hi Sourabh,
>
> have a look at https://issues.apache.org/jira/browse/SPARK-1406, I am
> looking into exporting models in PMML using JPMML.
>
> Regards,
> Vincenzo
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20674.html
>  To unsubscribe from MLLIB model export: PMML vs MLLIB serialization, click
> here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=20324&code=Y2hha2kuc291cmFiaEBnbWFpbC5jb218MjAzMjR8LTY5MzQzMTU5OQ==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20688.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: MLLIB model export: PMML vs MLLIB serialization

Posted by selvinsource <vs...@gmail.com>.
Hi Sourabh,

have a look at https://issues.apache.org/jira/browse/SPARK-1406, I am
looking into exporting models in PMML using JPMML.

Regards,
Vincenzo



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20674.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: MLLIB model export: PMML vs MLLIB serialization

Posted by manish_k <ma...@sigmoidanalytics.com>.
Hi Sourabh,

I came across same problem as you. One workable solution for me was to
serialize the parts of model that can be used again to recreate it. I
serialize RDD's in my model using saveAsObjectFile with a time stamp
attached to it in HDFS. My other spark application read from the latest
stored dir from HDFS using sc.ObjectFile and recreate the recently trained
model for prediction. 

I think this is not the best solution but it worked for me. I am also
looking for other efficient approaches for such problem where exporting of
model to some other application is required.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20348.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org