You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@predictionio.apache.org by Ulavapalle Meghamala <ul...@myntra.com> on 2018/08/07 11:13:39 UTC

PredictionIO spark deployment in Production

Hi,

Are there any templates in PredictionIO where "spark" is used even in "pio
deploy" ? How are you handling such cases ? Will you create a spark context
every time you run a prediction ?

I have gone through then documentation here:
http://actionml.com/docs/single_driver_machine. But, it only talks about
"pio train". Please guide me to any documentation that is available on the
"pio deploy" with spark ?

Thanks,
Megha

Re: PredictionIO spark deployment in Production

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Oh and no it does not need a new context for every query, only for the
deploy.

From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Date: August 7, 2018 at 10:00:49 AM
To: Ulavapalle Meghamala <ul...@myntra.com>
<ul...@myntra.com>
Cc: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>, actionml-user
<ac...@googlegroups.com> <ac...@googlegroups.com>
Subject:  Re: PredictionIO spark deployment in Production

The answers to your question illustrate why IMHO it is bad to have Spark
required for predictions.

Any of the MLlib ALS recommenders use Spark to predict and so run Spark
during the time they are deployed.. They can use one machine or use the
entire cluster. This is one case where using the cluster slows down
predictions since part of the model may be spread across nodes. Spark is
not designed to scale in this manner for real-time queries but I believe
those are your options out of the box for the ALS recommenders.

To be both fast and scalable you would load the model entirely into memory
on one machine for fast queries then spread queries across many identical
machines to scale load. I don’t think any templates do this—it requires a
load balancer at very least, not to mention custom deployment code that
interferes with using the same machines for training.

The UR loads the model into Elasticsearch for serving independently
scalable queries.

I always advise you keep Spark out of serving for the reasons mentioned
above.

From: Ulavapalle Meghamala <ul...@myntra.com>
<ul...@myntra.com>
Date: August 7, 2018 at 9:27:46 AM
To: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Cc: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>, actionml-user
<ac...@googlegroups.com> <ac...@googlegroups.com>
Subject:  Re: PredictionIO spark deployment in Production

Thanks Pat for getting back.

Are there any PredictionIO models/templates which really use Spark in "pio
deploy" ? (not just loading the Spark Context for loading the 'pio deploy'
driver and then dropping the Spark Context), but a running Spark Context
through out the Prediction Server life cycle ? Or How does Prediction IO
handle this case ? Does it create a new Spark Context every time a
prediction has to be done ?

Also, in the production deployments(where Spark is not really used), how do
you scale Prediction Server ? Do you just deploy same model on multiple
machines and have a LB/HA Proxy to handle requests?

Thanks,
Megha

On Tue, Aug 7, 2018 at 9:35 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> PIO is designed to use Spark in train and deploy. But the Universal
> Recommender removes the need for Spark to make predictions. This IMO is a
> key to use Spark well—remove it from serving results. PIO creates a Spark
> context to launch the `pio deploy' driver but Spark is never used and the
> context is dropped.
>
> The UR also does not need to be re-deployed after each train. It hot swaps
> the new model into use outside of Spark and so if you never shut down the
>  PredictionServer you never need to re-deploy.
>
> The confusion comes from reading Apache PIO docs which may not do things
> this way—don’t read them. Each template defines it’s own requirements. To
> use the UR stick with it’s documentation.
>
> That means Spark is used to “train” only and you never re-deploy. Deploy
> once—train periodically.
>
>
> From: Ulavapalle Meghamala <ul...@myntra.com>
> <ul...@myntra.com>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: August 7, 2018 at 4:13:39 AM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  PredictionIO spark deployment in Production
>
> Hi,
>
> Are there any templates in PredictionIO where "spark" is used even in "pio
> deploy" ? How are you handling such cases ? Will you create a spark context
> every time you run a prediction ?
>
> I have gone through then documentation here: http://actionml.com/docs/
> single_driver_machine. But, it only talks about "pio train". Please guide
> me to any documentation that is available on the "pio deploy" with spark ?
>
> Thanks,
> Megha
>
>
--
You received this message because you are subscribed to the Google Groups
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to actionml-user+unsubscribe@googlegroups.com.
To post to this group, send email to actionml-user@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/actionml-user/CAOtZQD-KRpqz-Po6%3D%2BL2WhUh7kKa64yGihP44iSNdqb9nFE0Dg%40mail.gmail.com
<https://groups.google.com/d/msgid/actionml-user/CAOtZQD-KRpqz-Po6%3D%2BL2WhUh7kKa64yGihP44iSNdqb9nFE0Dg%40mail.gmail.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.

Re: PredictionIO spark deployment in Production

Posted by Pat Ferrel <pa...@occamsmachete.com>.

The answers to your question illustrate why IMHO it is bad to have Spark
required for predictions.

Any of the MLlib ALS recommenders use Spark to predict and so run Spark
during the time they are deployed.. They can use one machine or use the
entire cluster. This is one case where using the cluster slows down
predictions since part of the model may be spread across nodes. Spark is
not designed to scale in this manner for real-time queries but I believe
those are your options out of the box for the ALS recommenders.

To be both fast and scalable you would load the model entirely into memory
on one machine for fast queries then spread queries across many identical
machines to scale load. I don’t think any templates do this—it requires a
load balancer at very least, not to mention custom deployment code that
interferes with using the same machines for training.

The UR loads the model into Elasticsearch for serving independently
scalable queries.

I always advise you keep Spark out of serving for the reasons mentioned
above.

From: Ulavapalle Meghamala <ul...@myntra.com>
<ul...@myntra.com>
Date: August 7, 2018 at 9:27:46 AM
To: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Cc: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>, actionml-user
<ac...@googlegroups.com> <ac...@googlegroups.com>
Subject:  Re: PredictionIO spark deployment in Production

Thanks Pat for getting back.

Are there any PredictionIO models/templates which really use Spark in "pio
deploy" ? (not just loading the Spark Context for loading the 'pio deploy'
driver and then dropping the Spark Context), but a running Spark Context
through out the Prediction Server life cycle ? Or How does Prediction IO
handle this case ? Does it create a new Spark Context every time a
prediction has to be done ?

Also, in the production deployments(where Spark is not really used), how do
you scale Prediction Server ? Do you just deploy same model on multiple
machines and have a LB/HA Proxy to handle requests?

Thanks,
Megha

On Tue, Aug 7, 2018 at 9:35 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> PIO is designed to use Spark in train and deploy. But the Universal
> Recommender removes the need for Spark to make predictions. This IMO is a
> key to use Spark well—remove it from serving results. PIO creates a Spark
> context to launch the `pio deploy' driver but Spark is never used and the
> context is dropped.
>
> The UR also does not need to be re-deployed after each train. It hot swaps
> the new model into use outside of Spark and so if you never shut down the
>  PredictionServer you never need to re-deploy.
>
> The confusion comes from reading Apache PIO docs which may not do things
> this way—don’t read them. Each template defines it’s own requirements. To
> use the UR stick with it’s documentation.
>
> That means Spark is used to “train” only and you never re-deploy. Deploy
> once—train periodically.
>
>
> From: Ulavapalle Meghamala <ul...@myntra.com>
> <ul...@myntra.com>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: August 7, 2018 at 4:13:39 AM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  PredictionIO spark deployment in Production
>
> Hi,
>
> Are there any templates in PredictionIO where "spark" is used even in "pio
> deploy" ? How are you handling such cases ? Will you create a spark context
> every time you run a prediction ?
>
> I have gone through then documentation here: http://actionml.com/docs/
> single_driver_machine. But, it only talks about "pio train". Please guide
> me to any documentation that is available on the "pio deploy" with spark ?
>
> Thanks,
> Megha
>
>

Re: PredictionIO spark deployment in Production

Posted by Ulavapalle Meghamala <ul...@myntra.com>.

Thanks Pat for getting back.

Are there any PredictionIO models/templates which really use Spark in "pio
deploy" ? (not just loading the Spark Context for loading the 'pio deploy'
driver and then dropping the Spark Context), but a running Spark Context
through out the Prediction Server life cycle ? Or How does Prediction IO
handle this case ? Does it create a new Spark Context every time a
prediction has to be done ?

Also, in the production deployments(where Spark is not really used), how do
you scale Prediction Server ? Do you just deploy same model on multiple
machines and have a LB/HA Proxy to handle requests?

Thanks,
Megha



On Tue, Aug 7, 2018 at 9:35 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> PIO is designed to use Spark in train and deploy. But the Universal
> Recommender removes the need for Spark to make predictions. This IMO is a
> key to use Spark well—remove it from serving results. PIO creates a Spark
> context to launch the `pio deploy' driver but Spark is never used and the
> context is dropped.
>
> The UR also does not need to be re-deployed after each train. It hot swaps
> the new model into use outside of Spark and so if you never shut down the
>  PredictionServer you never need to re-deploy.
>
> The confusion comes from reading Apache PIO docs which may not do things
> this way—don’t read them. Each template defines it’s own requirements. To
> use the UR stick with it’s documentation.
>
> That means Spark is used to “train” only and you never re-deploy. Deploy
> once—train periodically.
>
>
> From: Ulavapalle Meghamala <ul...@myntra.com>
> <ul...@myntra.com>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: August 7, 2018 at 4:13:39 AM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  PredictionIO spark deployment in Production
>
> Hi,
>
> Are there any templates in PredictionIO where "spark" is used even in "pio
> deploy" ? How are you handling such cases ? Will you create a spark context
> every time you run a prediction ?
>
> I have gone through then documentation here: http://actionml.com/docs/
> single_driver_machine. But, it only talks about "pio train". Please guide
> me to any documentation that is available on the "pio deploy" with spark ?
>
> Thanks,
> Megha
>
>

Re: PredictionIO spark deployment in Production

Posted by Pat Ferrel <pa...@occamsmachete.com>.

PIO is designed to use Spark in train and deploy. But the Universal
Recommender removes the need for Spark to make predictions. This IMO is a
key to use Spark well—remove it from serving results. PIO creates a Spark
context to launch the `pio deploy' driver but Spark is never used and the
context is dropped.

The UR also does not need to be re-deployed after each train. It hot swaps
the new model into use outside of Spark and so if you never shut down the
 PredictionServer you never need to re-deploy.

The confusion comes from reading Apache PIO docs which may not do things
this way—don’t read them. Each template defines it’s own requirements. To
use the UR stick with it’s documentation.

That means Spark is used to “train” only and you never re-deploy. Deploy
once—train periodically.


From: Ulavapalle Meghamala <ul...@myntra.com>
<ul...@myntra.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: August 7, 2018 at 4:13:39 AM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  PredictionIO spark deployment in Production

Hi,

Are there any templates in PredictionIO where "spark" is used even in "pio
deploy" ? How are you handling such cases ? Will you create a spark context
every time you run a prediction ?

I have gone through then documentation here:
http://actionml.com/docs/single_driver_machine. But, it only talks about
"pio train". Please guide me to any documentation that is available on the
"pio deploy" with spark ?

Thanks,
Megha