You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shashidhar Rao <ra...@gmail.com> on 2015/03/21 18:40:35 UTC

Model deployment help

Hi,

Apologies for the generic question.

As I am developing predictive models for the first time and soon model will
be deployed in production very soon.

Could somebody help me with the  model deployment in production , I have
read quite a few on model deployment and have read some books on Database
deployment .

My queries relate to how  updates to model happen when current model
degenerates without any downtime and how others are deploying in production
servers and a few lines on adoption of PMML currently in production.

Please provide me with some good links  or some forums  so that I can learn
as most of the books do not cover it extensively except for 'Mahout in
action' where it is explained in some detail and have also checked
stackoverflow but have not got any relevant answers.

What I understand:
1. Build model using current training set and test the model.
2. Deploy the model,put it in some location and load it and predict when
request comes for scoring.
3. Model degenerates , now build new model with new data.(Here some
confusion , whether the old data is discarded completely or it is done with
purely new data or a mix)
4. Here I am stuck , how to update the model without any downtime, the
transition period when old model and new model happens.

My naive solution would be, build the new model , save it in a new location
and update the new path in some properties file or update the location in
database when the saving is done. Is this correct or some best practices
are available.
Database is unlikely in my case.

Thanks in advance.

Re: Model deployment help

Posted by Donald Szeto <do...@prediction.io>.
Hi Shashidhar,

Our team at PredictionIO is trying to solve the production deployment of
model. We built a powered-by-Spark framework (also certified on Spark by
Databricks) that allows a user to build models with everything available
from the Spark API, persist the model automatically with versioning, and
deploy as a REST service using simple CLI commands.

Regarding model degeneration and updates, if having a half to couple
seconds downtime is acceptable, with PIO one could simply run "pio train"
and "pio deploy" periodically with a cronjob. To achieve virtually zero
downtime, a load balancer could be setup in front of 2 "pio deploy"
instances.

Porting your current algorithm / model generation to PredictionIO should
just be a copy-and-paste procedure. We would be very grateful for any
feedback that would improve the deployment process.

We do not support PMML at the moment, but definitely are interested in your
use case.

You may get started with the documentation (http://docs.prediction.io/).
You could also visit the engine template gallery (
https://templates.prediction.io/) for quick, ready-to-use examples.
Prediction is open source software under APL2 on
https://github.com/PredictionIO/PredictionIO.

Looking forward to hearing your feedback!


Best Regards,
Donald
ᐧ

On Sat, Mar 21, 2015 at 10:40 AM, Shashidhar Rao <raoshashidhar123@gmail.com
> wrote:

> Hi,
>
> Apologies for the generic question.
>
> As I am developing predictive models for the first time and soon model
> will be deployed in production very soon.
>
> Could somebody help me with the  model deployment in production , I have
> read quite a few on model deployment and have read some books on Database
> deployment .
>
> My queries relate to how  updates to model happen when current model
> degenerates without any downtime and how others are deploying in production
> servers and a few lines on adoption of PMML currently in production.
>
> Please provide me with some good links  or some forums  so that I can
> learn as most of the books do not cover it extensively except for 'Mahout
> in action' where it is explained in some detail and have also checked
> stackoverflow but have not got any relevant answers.
>
> What I understand:
> 1. Build model using current training set and test the model.
> 2. Deploy the model,put it in some location and load it and predict when
> request comes for scoring.
> 3. Model degenerates , now build new model with new data.(Here some
> confusion , whether the old data is discarded completely or it is done with
> purely new data or a mix)
> 4. Here I am stuck , how to update the model without any downtime, the
> transition period when old model and new model happens.
>
> My naive solution would be, build the new model , save it in a new
> location and update the new path in some properties file or update the
> location in database when the saving is done. Is this correct or some best
> practices are available.
> Database is unlikely in my case.
>
> Thanks in advance.
>
>
>
>


-- 
Donald Szeto
PredictionIO