You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by philippe v <gl...@gmail.com> on 2016/06/23 11:40:26 UTC

Performance issue with spark ml model to make single predictions on server side

Hello, 

I trained a linear regression model with spark-ml. I serialized the model
pipeline with classical java serialization. Then I loaded it in a webservice
to compute predictions.

For each request recieved by the webservice I create a 1 row dataframe to
compute that prediction.

Probleme is that it take too much time....

Is there some good practices to do that kind of stuff ?

I could export all model's coeffs with PMML and make computations in pure
java but I keep it in last resort.

Does any one have some hints to increase performances ?

Philippe





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-issue-with-spark-ml-model-to-make-single-predictions-on-server-side-tp27217.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Performance issue with spark ml model to make single predictions on server side

Posted by Nick Pentreath <ni...@gmail.com>.

Currently, spark-ml models and pipelines are only usable in Spark. This
means you must use Spark's machinery (and pull in all its dependencies) to
do model serving. Also currently there is no fast "predict" method for a
single Vector instance.

So for now, you are best off going with PMML, or exporting your model in
your own custom format, and re-loading it into your own custom format for
serving. You can also take a look at PredictionIO (https://prediction.io/)
for another serving option, or TensorFlow serving (
https://tensorflow.github.io/serving/).

On Thu, 23 Jun 2016 at 13:40 philippe v <gl...@gmail.com> wrote:

> Hello,
>
> I trained a linear regression model with spark-ml. I serialized the model
> pipeline with classical java serialization. Then I loaded it in a
> webservice
> to compute predictions.
>
> For each request recieved by the webservice I create a 1 row dataframe to
> compute that prediction.
>
> Probleme is that it take too much time....
>
> Is there some good practices to do that kind of stuff ?
>
> I could export all model's coeffs with PMML and make computations in pure
> java but I keep it in last resort.
>
> Does any one have some hints to increase performances ?
>
> Philippe
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Performance-issue-with-spark-ml-model-to-make-single-predictions-on-server-side-tp27217.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>