You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aris Vlasakakis <ar...@vlasakakis.com> on 2014/07/14 22:27:19 UTC

Client application that calls Spark and receives an MLlib *model* Scala Object, not just result

Hello Spark community,

I would like to write an application in Scala that i a model server. It
should have an MLlib Linear Regression model that is already trained on
some big set of data, and then is able to repeatedly call
myLinearRegressionModel.predict() many times and return the result.

Now, I want this client application to submit a job to Spark and tell the
Spark cluster job to

1) train its particular MLlib model, which produces a LinearRegression
model, and then

2) take the produced Scala
org.apache.spark.mllib.regression.LinearRegressionModel *object*, serialize
that object, and return this serialized object over the wire to my calling
application.

3) My client application receives the serialized Scala (model) object, and
can call .predict() on it over and over.

I am separating the heavy lifting of training the model and doing model
predictions; the client application will only do predictions using the
MLlib model it received from the Spark application.

The confusion I have is that I only know how to "submit jobs to Spark" by
using the bin/spark-submit script, and then the only output I receive is
stdout (as in, text). I want my scala appliction to hopefully submit the
spark model-training programmatically, and for the Spark application to
return a SERIALIZED MLLIB OBJECT, not just some stdout text!

How can I do this? I think my use case of separating long-running jobs to
Spark and using it's libraries in another application should be a pretty
common design pattern.

Thanks!

-- 
Άρης Βλασακάκης
Aris Vlasakakis

Re: Client application that calls Spark and receives an MLlib *model* Scala Object, not just result

Posted by Aris <ar...@gmail.com>.
Thanks Soumya - I guess the next step from here is to move the MLlib model
from the Spark application with simply does the training, and giving to the
client application which simply does the predictions. I will try the Kryo
library to physically serialize the object and trade it across machines /
applications.

Rather than writing it to file, I will send it over the network - any
thoughts on that?

Thanks!


On Mon, Jul 14, 2014 at 1:43 PM, Soumya Simanta <so...@gmail.com>
wrote:

> Please look at the following.
>
> https://github.com/ooyala/spark-jobserver
> http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language
> https://github.com/EsotericSoftware/kryo
>
> You can train your model convert it to PMML and return that to your client
> OR
>
> You can train your model and write that model (serialized object) to the
> file system (local, HDFS, S3 etc) or a datastore and return a location back
> to the client on a successful write.
>
>
>
>
>
> On Mon, Jul 14, 2014 at 4:27 PM, Aris Vlasakakis <ar...@vlasakakis.com>
> wrote:
>
>> Hello Spark community,
>>
>> I would like to write an application in Scala that i a model server. It
>> should have an MLlib Linear Regression model that is already trained on
>> some big set of data, and then is able to repeatedly call
>> myLinearRegressionModel.predict() many times and return the result.
>>
>> Now, I want this client application to submit a job to Spark and tell the
>> Spark cluster job to
>>
>> 1) train its particular MLlib model, which produces a LinearRegression
>> model, and then
>>
>> 2) take the produced Scala
>> org.apache.spark.mllib.regression.LinearRegressionModel *object*, serialize
>> that object, and return this serialized object over the wire to my calling
>> application.
>>
>> 3) My client application receives the serialized Scala (model) object,
>> and can call .predict() on it over and over.
>>
>> I am separating the heavy lifting of training the model and doing model
>> predictions; the client application will only do predictions using the
>> MLlib model it received from the Spark application.
>>
>> The confusion I have is that I only know how to "submit jobs to Spark" by
>> using the bin/spark-submit script, and then the only output I receive is
>> stdout (as in, text). I want my scala appliction to hopefully submit the
>> spark model-training programmatically, and for the Spark application to
>> return a SERIALIZED MLLIB OBJECT, not just some stdout text!
>>
>> How can I do this? I think my use case of separating long-running jobs to
>> Spark and using it's libraries in another application should be a pretty
>> common design pattern.
>>
>> Thanks!
>>
>> --
>> Άρης Βλασακάκης
>> Aris Vlasakakis
>>
>
>

Re: Client application that calls Spark and receives an MLlib *model* Scala Object, not just result

Posted by Soumya Simanta <so...@gmail.com>.
Please look at the following.

https://github.com/ooyala/spark-jobserver
http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language
https://github.com/EsotericSoftware/kryo

You can train your model convert it to PMML and return that to your client
OR

You can train your model and write that model (serialized object) to the
file system (local, HDFS, S3 etc) or a datastore and return a location back
to the client on a successful write.





On Mon, Jul 14, 2014 at 4:27 PM, Aris Vlasakakis <ar...@vlasakakis.com>
wrote:

> Hello Spark community,
>
> I would like to write an application in Scala that i a model server. It
> should have an MLlib Linear Regression model that is already trained on
> some big set of data, and then is able to repeatedly call
> myLinearRegressionModel.predict() many times and return the result.
>
> Now, I want this client application to submit a job to Spark and tell the
> Spark cluster job to
>
> 1) train its particular MLlib model, which produces a LinearRegression
> model, and then
>
> 2) take the produced Scala
> org.apache.spark.mllib.regression.LinearRegressionModel *object*, serialize
> that object, and return this serialized object over the wire to my calling
> application.
>
> 3) My client application receives the serialized Scala (model) object, and
> can call .predict() on it over and over.
>
> I am separating the heavy lifting of training the model and doing model
> predictions; the client application will only do predictions using the
> MLlib model it received from the Spark application.
>
> The confusion I have is that I only know how to "submit jobs to Spark" by
> using the bin/spark-submit script, and then the only output I receive is
> stdout (as in, text). I want my scala appliction to hopefully submit the
> spark model-training programmatically, and for the Spark application to
> return a SERIALIZED MLLIB OBJECT, not just some stdout text!
>
> How can I do this? I think my use case of separating long-running jobs to
> Spark and using it's libraries in another application should be a pretty
> common design pattern.
>
> Thanks!
>
> --
> Άρης Βλασακάκης
> Aris Vlasakakis
>