You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Nicolas Long <ni...@gmail.com> on 2016/10/11 15:53:18 UTC

mllib model in production web API

Hi all,

so I have a model which has been stored in S3. And I have a Scala webapp
which for certain requests loads the model and transforms submitted data
against it.

I'm not sure how to run this quickly on a single instance though. At the
moment Spark is being bundled up with the web app in an uberjar (sbt
assembly).

But the process is quite slow. I'm aiming for responses < 1 sec so that the
webapp can respond quickly to requests. When I look the Spark UI I see:

Summary Metrics for 1 Completed Tasks

Metric    Min    25th percentile    Median    75th percentile    Max
Duration    94 ms    94 ms    94 ms    94 ms    94 ms
Scheduler Delay    0 ms    0 ms    0 ms    0 ms    0 ms
Task Deserialization Time    3 s    3 s    3 s    3 s    3 s
GC Time    2 s    2 s    2 s    2 s    2 s
Result Serialization Time    0 ms    0 ms    0 ms    0 ms    0 ms
Getting Result Time    0 ms    0 ms    0 ms    0 ms    0 ms
Peak Execution Memory    0.0 B    0.0 B    0.0 B    0.0 B    0.0 B

I don't really understand why deserialization and GC should take so long
when the models are already loaded. Is this evidence I am doing something
wrong? And where can I get a better understanding on how Spark works under
the hood here, and how best to do a standalone/bundled jar deployment?

Thanks!

Nic

Re: mllib model in production web API

Posted by Aseem Bansal <as...@gmail.com>.

Hi Vincent

I am not sure whether you are asking me or Nicolas. If me, then no we
didn't. Never used Akka and wasn't even aware that it has such
capabilities. Using Java API so we don't have Akka as a dependency right
now.

On Tue, Oct 18, 2016 at 12:47 PM, vincent gromakowski <
vincent.gromakowski@gmail.com> wrote:

> Hi
> Did you try applying the model with akka instead of spark ?
> https://spark-summit.org/eu-2015/events/real-time-anomaly-
> detection-with-spark-ml-and-akka/
>
> Le 18 oct. 2016 5:58 AM, "Aseem Bansal" <as...@gmail.com> a écrit :
>
>> @Nicolas
>>
>> No, ours is different. We required predictions within 10ms time frame so
>> we needed much less latency than that.
>>
>> Every algorithm has some parameters. Correct? We took the parameters from
>> the mllib and used them to create ml package's model. ml package's model's
>> prediction time was much faster compared to mllib package's transformation.
>> So essentially use spark's distributed machine learning library to train
>> the model, save to S3, load from S3 in a different system and then convert
>> it into the vector based API model for actual predictions.
>>
>> There were obviously some transformations involved but we didn't use
>> Pipeline for those transformations. Instead, we re-wrote them for the
>> Vector based API. I know it's not perfect but if we had used the
>> transformations within the pipeline that would make us dependent on spark's
>> distributed API and we didn't see how we will really reach our latency
>> requirements. Would have been much simpler and more DRY if the
>> PipelineModel had a predict method based on vectors and was not distributed.
>>
>> As you can guess it is very much model-specific and more work. If we
>> decide to use another type of Model we will have to add conversion
>> code/transformation code for that also. Only if spark exposed a prediction
>> method which is as fast as the old machine learning package.
>>
>> On Sat, Oct 15, 2016 at 8:42 PM, Nicolas Long <ni...@gmail.com>
>> wrote:
>>
>>> Hi Sean and Aseem,
>>>
>>> thanks both. A simple thing which sped things up greatly was simply to
>>> load our sql (for one record effectively) directly and then convert to a
>>> dataframe, rather than using Spark to load it. Sounds stupid, but this took
>>> us from > 5 seconds to ~1 second on a very small instance.
>>>
>>> Aseem: can you explain your solution a bit more? I'm not sure I
>>> understand it. At the moment we load our models from S3
>>> (RandomForestClassificationModel.load(..) ) and then store that in an
>>> object property so that it persists across requests - this is in Scala. Is
>>> this essentially what you mean?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 12 October 2016 at 10:52, Aseem Bansal <as...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Faced a similar issue. Our solution was to load the model, cache it
>>>> after converting it to a model from mllib and then use that instead of ml
>>>> model.
>>>>
>>>> On Tue, Oct 11, 2016 at 10:22 PM, Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>>> I don't believe it will ever scale to spin up a whole distributed job
>>>>> to serve one request. You can look possibly at the bits in mllib-local. You
>>>>> might do well to export as something like PMML either with Spark's export
>>>>> or JPMML and then load it into a web container and score it, without Spark
>>>>> (possibly also with JPMML, OpenScoring)
>>>>>
>>>>>
>>>>> On Tue, Oct 11, 2016, 17:53 Nicolas Long <ni...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> so I have a model which has been stored in S3. And I have a Scala
>>>>>> webapp which for certain requests loads the model and transforms submitted
>>>>>> data against it.
>>>>>>
>>>>>> I'm not sure how to run this quickly on a single instance though. At
>>>>>> the moment Spark is being bundled up with the web app in an uberjar (sbt
>>>>>> assembly).
>>>>>>
>>>>>> But the process is quite slow. I'm aiming for responses < 1 sec so
>>>>>> that the webapp can respond quickly to requests. When I look the Spark UI I
>>>>>> see:
>>>>>>
>>>>>> Summary Metrics for 1 Completed Tasks
>>>>>>
>>>>>> Metric    Min    25th percentile    Median    75th percentile    Max
>>>>>> Duration    94 ms    94 ms    94 ms    94 ms    94 ms
>>>>>> Scheduler Delay    0 ms    0 ms    0 ms    0 ms    0 ms
>>>>>> Task Deserialization Time    3 s    3 s    3 s    3 s    3 s
>>>>>> GC Time    2 s    2 s    2 s    2 s    2 s
>>>>>> Result Serialization Time    0 ms    0 ms    0 ms    0 ms    0 ms
>>>>>> Getting Result Time    0 ms    0 ms    0 ms    0 ms    0 ms
>>>>>> Peak Execution Memory    0.0 B    0.0 B    0.0 B    0.0 B    0.0 B
>>>>>>
>>>>>> I don't really understand why deserialization and GC should take so
>>>>>> long when the models are already loaded. Is this evidence I am doing
>>>>>> something wrong? And where can I get a better understanding on how Spark
>>>>>> works under the hood here, and how best to do a standalone/bundled jar
>>>>>> deployment?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Nic
>>>>>>
>>>>>
>>>>
>>>
>>

Re: mllib model in production web API

Posted by vincent gromakowski <vi...@gmail.com>.

Hi
Did you try applying the model with akka instead of spark ?
https://spark-summit.org/eu-2015/events/real-time-anomaly-detection-with-spark-ml-and-akka/

Le 18 oct. 2016 5:58 AM, "Aseem Bansal" <as...@gmail.com> a écrit :

> @Nicolas
>
> No, ours is different. We required predictions within 10ms time frame so
> we needed much less latency than that.
>
> Every algorithm has some parameters. Correct? We took the parameters from
> the mllib and used them to create ml package's model. ml package's model's
> prediction time was much faster compared to mllib package's transformation.
> So essentially use spark's distributed machine learning library to train
> the model, save to S3, load from S3 in a different system and then convert
> it into the vector based API model for actual predictions.
>
> There were obviously some transformations involved but we didn't use
> Pipeline for those transformations. Instead, we re-wrote them for the
> Vector based API. I know it's not perfect but if we had used the
> transformations within the pipeline that would make us dependent on spark's
> distributed API and we didn't see how we will really reach our latency
> requirements. Would have been much simpler and more DRY if the
> PipelineModel had a predict method based on vectors and was not distributed.
>
> As you can guess it is very much model-specific and more work. If we
> decide to use another type of Model we will have to add conversion
> code/transformation code for that also. Only if spark exposed a prediction
> method which is as fast as the old machine learning package.
>
> On Sat, Oct 15, 2016 at 8:42 PM, Nicolas Long <ni...@gmail.com>
> wrote:
>
>> Hi Sean and Aseem,
>>
>> thanks both. A simple thing which sped things up greatly was simply to
>> load our sql (for one record effectively) directly and then convert to a
>> dataframe, rather than using Spark to load it. Sounds stupid, but this took
>> us from > 5 seconds to ~1 second on a very small instance.
>>
>> Aseem: can you explain your solution a bit more? I'm not sure I
>> understand it. At the moment we load our models from S3
>> (RandomForestClassificationModel.load(..) ) and then store that in an
>> object property so that it persists across requests - this is in Scala. Is
>> this essentially what you mean?
>>
>>
>>
>>
>>
>>
>> On 12 October 2016 at 10:52, Aseem Bansal <as...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> Faced a similar issue. Our solution was to load the model, cache it
>>> after converting it to a model from mllib and then use that instead of ml
>>> model.
>>>
>>> On Tue, Oct 11, 2016 at 10:22 PM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> I don't believe it will ever scale to spin up a whole distributed job
>>>> to serve one request. You can look possibly at the bits in mllib-local. You
>>>> might do well to export as something like PMML either with Spark's export
>>>> or JPMML and then load it into a web container and score it, without Spark
>>>> (possibly also with JPMML, OpenScoring)
>>>>
>>>>
>>>> On Tue, Oct 11, 2016, 17:53 Nicolas Long <ni...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> so I have a model which has been stored in S3. And I have a Scala
>>>>> webapp which for certain requests loads the model and transforms submitted
>>>>> data against it.
>>>>>
>>>>> I'm not sure how to run this quickly on a single instance though. At
>>>>> the moment Spark is being bundled up with the web app in an uberjar (sbt
>>>>> assembly).
>>>>>
>>>>> But the process is quite slow. I'm aiming for responses < 1 sec so
>>>>> that the webapp can respond quickly to requests. When I look the Spark UI I
>>>>> see:
>>>>>
>>>>> Summary Metrics for 1 Completed Tasks
>>>>>
>>>>> Metric    Min    25th percentile    Median    75th percentile    Max
>>>>> Duration    94 ms    94 ms    94 ms    94 ms    94 ms
>>>>> Scheduler Delay    0 ms    0 ms    0 ms    0 ms    0 ms
>>>>> Task Deserialization Time    3 s    3 s    3 s    3 s    3 s
>>>>> GC Time    2 s    2 s    2 s    2 s    2 s
>>>>> Result Serialization Time    0 ms    0 ms    0 ms    0 ms    0 ms
>>>>> Getting Result Time    0 ms    0 ms    0 ms    0 ms    0 ms
>>>>> Peak Execution Memory    0.0 B    0.0 B    0.0 B    0.0 B    0.0 B
>>>>>
>>>>> I don't really understand why deserialization and GC should take so
>>>>> long when the models are already loaded. Is this evidence I am doing
>>>>> something wrong? And where can I get a better understanding on how Spark
>>>>> works under the hood here, and how best to do a standalone/bundled jar
>>>>> deployment?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Nic
>>>>>
>>>>
>>>
>>
>

Re: mllib model in production web API

Posted by Aseem Bansal <as...@gmail.com>.

@Nicolas

No, ours is different. We required predictions within 10ms time frame so we
needed much less latency than that.

Every algorithm has some parameters. Correct? We took the parameters from
the mllib and used them to create ml package's model. ml package's model's
prediction time was much faster compared to mllib package's transformation.
So essentially use spark's distributed machine learning library to train
the model, save to S3, load from S3 in a different system and then convert
it into the vector based API model for actual predictions.

There were obviously some transformations involved but we didn't use
Pipeline for those transformations. Instead, we re-wrote them for the
Vector based API. I know it's not perfect but if we had used the
transformations within the pipeline that would make us dependent on spark's
distributed API and we didn't see how we will really reach our latency
requirements. Would have been much simpler and more DRY if the
PipelineModel had a predict method based on vectors and was not distributed.

As you can guess it is very much model-specific and more work. If we decide
to use another type of Model we will have to add conversion
code/transformation code for that also. Only if spark exposed a prediction
method which is as fast as the old machine learning package.

On Sat, Oct 15, 2016 at 8:42 PM, Nicolas Long <ni...@gmail.com> wrote:

> Hi Sean and Aseem,
>
> thanks both. A simple thing which sped things up greatly was simply to
> load our sql (for one record effectively) directly and then convert to a
> dataframe, rather than using Spark to load it. Sounds stupid, but this took
> us from > 5 seconds to ~1 second on a very small instance.
>
> Aseem: can you explain your solution a bit more? I'm not sure I understand
> it. At the moment we load our models from S3 (
> RandomForestClassificationModel.load(..) ) and then store that in an
> object property so that it persists across requests - this is in Scala. Is
> this essentially what you mean?
>
>
>
>
>
>
> On 12 October 2016 at 10:52, Aseem Bansal <as...@gmail.com> wrote:
>
>> Hi
>>
>> Faced a similar issue. Our solution was to load the model, cache it after
>> converting it to a model from mllib and then use that instead of ml model.
>>
>> On Tue, Oct 11, 2016 at 10:22 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> I don't believe it will ever scale to spin up a whole distributed job to
>>> serve one request. You can look possibly at the bits in mllib-local. You
>>> might do well to export as something like PMML either with Spark's export
>>> or JPMML and then load it into a web container and score it, without Spark
>>> (possibly also with JPMML, OpenScoring)
>>>
>>>
>>> On Tue, Oct 11, 2016, 17:53 Nicolas Long <ni...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> so I have a model which has been stored in S3. And I have a Scala
>>>> webapp which for certain requests loads the model and transforms submitted
>>>> data against it.
>>>>
>>>> I'm not sure how to run this quickly on a single instance though. At
>>>> the moment Spark is being bundled up with the web app in an uberjar (sbt
>>>> assembly).
>>>>
>>>> But the process is quite slow. I'm aiming for responses < 1 sec so that
>>>> the webapp can respond quickly to requests. When I look the Spark UI I see:
>>>>
>>>> Summary Metrics for 1 Completed Tasks
>>>>
>>>> Metric    Min    25th percentile    Median    75th percentile    Max
>>>> Duration    94 ms    94 ms    94 ms    94 ms    94 ms
>>>> Scheduler Delay    0 ms    0 ms    0 ms    0 ms    0 ms
>>>> Task Deserialization Time    3 s    3 s    3 s    3 s    3 s
>>>> GC Time    2 s    2 s    2 s    2 s    2 s
>>>> Result Serialization Time    0 ms    0 ms    0 ms    0 ms    0 ms
>>>> Getting Result Time    0 ms    0 ms    0 ms    0 ms    0 ms
>>>> Peak Execution Memory    0.0 B    0.0 B    0.0 B    0.0 B    0.0 B
>>>>
>>>> I don't really understand why deserialization and GC should take so
>>>> long when the models are already loaded. Is this evidence I am doing
>>>> something wrong? And where can I get a better understanding on how Spark
>>>> works under the hood here, and how best to do a standalone/bundled jar
>>>> deployment?
>>>>
>>>> Thanks!
>>>>
>>>> Nic
>>>>
>>>
>>
>

Re: mllib model in production web API

Posted by Nicolas Long <ni...@gmail.com>.

Hi Sean and Aseem,

thanks both. A simple thing which sped things up greatly was simply to load
our sql (for one record effectively) directly and then convert to a
dataframe, rather than using Spark to load it. Sounds stupid, but this took
us from > 5 seconds to ~1 second on a very small instance.

Aseem: can you explain your solution a bit more? I'm not sure I understand
it. At the moment we load our models from S3
(RandomForestClassificationModel.load(..) ) and then store that in an
object property so that it persists across requests - this is in Scala. Is
this essentially what you mean?






On 12 October 2016 at 10:52, Aseem Bansal <as...@gmail.com> wrote:

> Hi
>
> Faced a similar issue. Our solution was to load the model, cache it after
> converting it to a model from mllib and then use that instead of ml model.
>
> On Tue, Oct 11, 2016 at 10:22 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> I don't believe it will ever scale to spin up a whole distributed job to
>> serve one request. You can look possibly at the bits in mllib-local. You
>> might do well to export as something like PMML either with Spark's export
>> or JPMML and then load it into a web container and score it, without Spark
>> (possibly also with JPMML, OpenScoring)
>>
>>
>> On Tue, Oct 11, 2016, 17:53 Nicolas Long <ni...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> so I have a model which has been stored in S3. And I have a Scala webapp
>>> which for certain requests loads the model and transforms submitted data
>>> against it.
>>>
>>> I'm not sure how to run this quickly on a single instance though. At the
>>> moment Spark is being bundled up with the web app in an uberjar (sbt
>>> assembly).
>>>
>>> But the process is quite slow. I'm aiming for responses < 1 sec so that
>>> the webapp can respond quickly to requests. When I look the Spark UI I see:
>>>
>>> Summary Metrics for 1 Completed Tasks
>>>
>>> Metric    Min    25th percentile    Median    75th percentile    Max
>>> Duration    94 ms    94 ms    94 ms    94 ms    94 ms
>>> Scheduler Delay    0 ms    0 ms    0 ms    0 ms    0 ms
>>> Task Deserialization Time    3 s    3 s    3 s    3 s    3 s
>>> GC Time    2 s    2 s    2 s    2 s    2 s
>>> Result Serialization Time    0 ms    0 ms    0 ms    0 ms    0 ms
>>> Getting Result Time    0 ms    0 ms    0 ms    0 ms    0 ms
>>> Peak Execution Memory    0.0 B    0.0 B    0.0 B    0.0 B    0.0 B
>>>
>>> I don't really understand why deserialization and GC should take so long
>>> when the models are already loaded. Is this evidence I am doing something
>>> wrong? And where can I get a better understanding on how Spark works under
>>> the hood here, and how best to do a standalone/bundled jar deployment?
>>>
>>> Thanks!
>>>
>>> Nic
>>>
>>
>

Re: mllib model in production web API

Posted by Aseem Bansal <as...@gmail.com>.

Hi

Faced a similar issue. Our solution was to load the model, cache it after
converting it to a model from mllib and then use that instead of ml model.

On Tue, Oct 11, 2016 at 10:22 PM, Sean Owen <so...@cloudera.com> wrote:

> I don't believe it will ever scale to spin up a whole distributed job to
> serve one request. You can look possibly at the bits in mllib-local. You
> might do well to export as something like PMML either with Spark's export
> or JPMML and then load it into a web container and score it, without Spark
> (possibly also with JPMML, OpenScoring)
>
>
> On Tue, Oct 11, 2016, 17:53 Nicolas Long <ni...@gmail.com> wrote:
>
>> Hi all,
>>
>> so I have a model which has been stored in S3. And I have a Scala webapp
>> which for certain requests loads the model and transforms submitted data
>> against it.
>>
>> I'm not sure how to run this quickly on a single instance though. At the
>> moment Spark is being bundled up with the web app in an uberjar (sbt
>> assembly).
>>
>> But the process is quite slow. I'm aiming for responses < 1 sec so that
>> the webapp can respond quickly to requests. When I look the Spark UI I see:
>>
>> Summary Metrics for 1 Completed Tasks
>>
>> Metric    Min    25th percentile    Median    75th percentile    Max
>> Duration    94 ms    94 ms    94 ms    94 ms    94 ms
>> Scheduler Delay    0 ms    0 ms    0 ms    0 ms    0 ms
>> Task Deserialization Time    3 s    3 s    3 s    3 s    3 s
>> GC Time    2 s    2 s    2 s    2 s    2 s
>> Result Serialization Time    0 ms    0 ms    0 ms    0 ms    0 ms
>> Getting Result Time    0 ms    0 ms    0 ms    0 ms    0 ms
>> Peak Execution Memory    0.0 B    0.0 B    0.0 B    0.0 B    0.0 B
>>
>> I don't really understand why deserialization and GC should take so long
>> when the models are already loaded. Is this evidence I am doing something
>> wrong? And where can I get a better understanding on how Spark works under
>> the hood here, and how best to do a standalone/bundled jar deployment?
>>
>> Thanks!
>>
>> Nic
>>
>

Re: mllib model in production web API

Posted by Sean Owen <so...@cloudera.com>.

I don't believe it will ever scale to spin up a whole distributed job to
serve one request. You can look possibly at the bits in mllib-local. You
might do well to export as something like PMML either with Spark's export
or JPMML and then load it into a web container and score it, without Spark
(possibly also with JPMML, OpenScoring)

On Tue, Oct 11, 2016, 17:53 Nicolas Long <ni...@gmail.com> wrote:

> Hi all,
>
> so I have a model which has been stored in S3. And I have a Scala webapp
> which for certain requests loads the model and transforms submitted data
> against it.
>
> I'm not sure how to run this quickly on a single instance though. At the
> moment Spark is being bundled up with the web app in an uberjar (sbt
> assembly).
>
> But the process is quite slow. I'm aiming for responses < 1 sec so that
> the webapp can respond quickly to requests. When I look the Spark UI I see:
>
> Summary Metrics for 1 Completed Tasks
>
> Metric    Min    25th percentile    Median    75th percentile    Max
> Duration    94 ms    94 ms    94 ms    94 ms    94 ms
> Scheduler Delay    0 ms    0 ms    0 ms    0 ms    0 ms
> Task Deserialization Time    3 s    3 s    3 s    3 s    3 s
> GC Time    2 s    2 s    2 s    2 s    2 s
> Result Serialization Time    0 ms    0 ms    0 ms    0 ms    0 ms
> Getting Result Time    0 ms    0 ms    0 ms    0 ms    0 ms
> Peak Execution Memory    0.0 B    0.0 B    0.0 B    0.0 B    0.0 B
>
> I don't really understand why deserialization and GC should take so long
> when the models are already loaded. Is this evidence I am doing something
> wrong? And where can I get a better understanding on how Spark works under
> the hood here, and how best to do a standalone/bundled jar deployment?
>
> Thanks!
>
> Nic
>