You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Mattz <ea...@gmail.com> on 2017/07/14 09:38:15 UTC

Does Universal Recommender need Spark for Serving?

Hello,

Is Spark required only for "PIO TRAIN" or is it needed for serving the
recommendations as well?

I am planning to run PredictionIO on AWS. So, thinking to run PredictionIO
with Elastic search service and EMR. Wanted to know if we can use EMR only
during the training phase and then serve the recommendations from another
smaller instance running PredictionIO talking to the Elastic Search
service. Is this possible?

Please let me know.

Thanks.

Re: Does Universal Recommender need Spark for Serving?

Posted by Pat Ferrel <pa...@occamsmachete.com>.
A Spark cluster is only needed for `pio train`. Spark must be installed on the machine that runs `pio deploy` but is only used for local client APIs and never needs to communicate with the cluster.

However the last I checked EMR is will not work. EMR was designed for Hadoop Mapreduce and Spark does not use files for intermediate storage, it needs memory and lots of it. Also remember that that the machine that runs `pio train` is the Spark Driver machine and needs nearly the same resources (memory and cores) as a Spark Executor. The only way to run the Driver in EMR is using Yarn-cluster mode, and the last time I checked this was either impossible or very difficult. So we have never been able to use EMR.

For larger installations we (ActionML) do something very similar with scripts in Terraform. You can start all machines for Spark including a pre-configured pio train machine, then train, then stop them when training is done. This will insure you don’t pay for Spark when you aren’t using it.


On Jul 14, 2017, at 2:38 AM, Mattz <ea...@gmail.com> wrote:

Hello,

Is Spark required only for "PIO TRAIN" or is it needed for serving the recommendations as well? 

I am planning to run PredictionIO on AWS. So, thinking to run PredictionIO with Elastic search service and EMR. Wanted to know if we can use EMR only during the training phase and then serve the recommendations from another smaller instance running PredictionIO talking to the Elastic Search service. Is this possible?

Please let me know.

Thanks.