You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Xi Shen <da...@gmail.com> on 2015/03/07 11:10:45 UTC

How to reuse a ML trained model?

Hi,

I checked a few ML algorithms in MLLib.

https://spark.apache.org/docs/0.8.1/api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegressionModel

I could not find a way to save the trained model. Does this means I have to
train my model every time? Is there a more economic way to do this?

I am thinking about something like:

model.run(...)
model.save("hdfs://path/to/hdfs")

Then, next I can do:

val model = Model.createFrom("hdfs://...")
model.predict(vector)

I am new to spark, maybe there are other ways to persistent the model?


Thanks,
David

Re: How to reuse a ML trained model?

Posted by Simon Chan <si...@gmail.com>.
You may also take a look at PredictionIO, which can persist and then deploy
MLlib models as web services.

Simon

On Sunday, March 8, 2015, Sean Owen <so...@cloudera.com> wrote:

> You dont need SparkContext to simply serialize and deserialize objects. It
> is Java mechanism.
> On Mar 8, 2015 10:29 AM, "Xi Shen" <davidshen84@gmail.com
> <javascript:_e(%7B%7D,'cvml','davidshen84@gmail.com');>> wrote:
>
>> errr...do you have any suggestions for me before 1.3 release?
>>
>> I can't believe there's no ML model serialize method in Spark. I think
>> training the models are quite expensive, isn't it?
>>
>>
>> Thanks,
>> David
>>
>>
>> On Sun, Mar 8, 2015 at 5:14 AM Burak Yavuz <brkyvz@gmail.com
>> <javascript:_e(%7B%7D,'cvml','brkyvz@gmail.com');>> wrote:
>>
>>> Hi,
>>>
>>> There is model import/export for some of the ML algorithms on the
>>> current master (and they'll be shipped with the 1.3 release).
>>>
>>> Burak
>>> On Mar 7, 2015 4:17 AM, "Xi Shen" <davidshen84@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','davidshen84@gmail.com');>> wrote:
>>>
>>>> Wait...it seem SparkContext does not provide a way to save/load object
>>>> files. It can only save/load RDD. What do I missed here?
>>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>> On Sat, Mar 7, 2015 at 11:05 PM Xi Shen <davidshen84@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','davidshen84@gmail.com');>> wrote:
>>>>
>>>>> Ah~it is serializable. Thanks!
>>>>>
>>>>>
>>>>> On Sat, Mar 7, 2015 at 10:59 PM Ekrem Aksoy <ekremaksoy@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','ekremaksoy@gmail.com');>> wrote:
>>>>>
>>>>>> You can serialize your trained model to persist somewhere.
>>>>>>
>>>>>> Ekrem Aksoy
>>>>>>
>>>>>> On Sat, Mar 7, 2015 at 12:10 PM, Xi Shen <davidshen84@gmail.com
>>>>>> <javascript:_e(%7B%7D,'cvml','davidshen84@gmail.com');>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I checked a few ML algorithms in MLLib.
>>>>>>>
>>>>>>> https://spark.apache.org/docs/0.8.1/api/mllib/index.html#
>>>>>>> org.apache.spark.mllib.classification.LogisticRegressionModel
>>>>>>>
>>>>>>> I could not find a way to save the trained model. Does this means I
>>>>>>> have to train my model every time? Is there a more economic way to do this?
>>>>>>>
>>>>>>> I am thinking about something like:
>>>>>>>
>>>>>>> model.run(...)
>>>>>>> model.save("hdfs://path/to/hdfs")
>>>>>>>
>>>>>>> Then, next I can do:
>>>>>>>
>>>>>>> val model = Model.createFrom("hdfs://...")
>>>>>>> model.predict(vector)
>>>>>>>
>>>>>>> I am new to spark, maybe there are other ways to persistent the
>>>>>>> model?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>

Re: How to reuse a ML trained model?

Posted by Sean Owen <so...@cloudera.com>.
You dont need SparkContext to simply serialize and deserialize objects. It
is Java mechanism.
On Mar 8, 2015 10:29 AM, "Xi Shen" <da...@gmail.com> wrote:

> errr...do you have any suggestions for me before 1.3 release?
>
> I can't believe there's no ML model serialize method in Spark. I think
> training the models are quite expensive, isn't it?
>
>
> Thanks,
> David
>
>
> On Sun, Mar 8, 2015 at 5:14 AM Burak Yavuz <br...@gmail.com> wrote:
>
>> Hi,
>>
>> There is model import/export for some of the ML algorithms on the current
>> master (and they'll be shipped with the 1.3 release).
>>
>> Burak
>> On Mar 7, 2015 4:17 AM, "Xi Shen" <da...@gmail.com> wrote:
>>
>>> Wait...it seem SparkContext does not provide a way to save/load object
>>> files. It can only save/load RDD. What do I missed here?
>>>
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On Sat, Mar 7, 2015 at 11:05 PM Xi Shen <da...@gmail.com> wrote:
>>>
>>>> Ah~it is serializable. Thanks!
>>>>
>>>>
>>>> On Sat, Mar 7, 2015 at 10:59 PM Ekrem Aksoy <ek...@gmail.com>
>>>> wrote:
>>>>
>>>>> You can serialize your trained model to persist somewhere.
>>>>>
>>>>> Ekrem Aksoy
>>>>>
>>>>> On Sat, Mar 7, 2015 at 12:10 PM, Xi Shen <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I checked a few ML algorithms in MLLib.
>>>>>>
>>>>>> https://spark.apache.org/docs/0.8.1/api/mllib/index.html#
>>>>>> org.apache.spark.mllib.classification.LogisticRegressionModel
>>>>>>
>>>>>> I could not find a way to save the trained model. Does this means I
>>>>>> have to train my model every time? Is there a more economic way to do this?
>>>>>>
>>>>>> I am thinking about something like:
>>>>>>
>>>>>> model.run(...)
>>>>>> model.save("hdfs://path/to/hdfs")
>>>>>>
>>>>>> Then, next I can do:
>>>>>>
>>>>>> val model = Model.createFrom("hdfs://...")
>>>>>> model.predict(vector)
>>>>>>
>>>>>> I am new to spark, maybe there are other ways to persistent the model?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>

Re: How to reuse a ML trained model?

Posted by Xi Shen <da...@gmail.com>.
errr...do you have any suggestions for me before 1.3 release?

I can't believe there's no ML model serialize method in Spark. I think
training the models are quite expensive, isn't it?


Thanks,
David


On Sun, Mar 8, 2015 at 5:14 AM Burak Yavuz <br...@gmail.com> wrote:

> Hi,
>
> There is model import/export for some of the ML algorithms on the current
> master (and they'll be shipped with the 1.3 release).
>
> Burak
> On Mar 7, 2015 4:17 AM, "Xi Shen" <da...@gmail.com> wrote:
>
>> Wait...it seem SparkContext does not provide a way to save/load object
>> files. It can only save/load RDD. What do I missed here?
>>
>>
>> Thanks,
>> David
>>
>>
>> On Sat, Mar 7, 2015 at 11:05 PM Xi Shen <da...@gmail.com> wrote:
>>
>>> Ah~it is serializable. Thanks!
>>>
>>>
>>> On Sat, Mar 7, 2015 at 10:59 PM Ekrem Aksoy <ek...@gmail.com>
>>> wrote:
>>>
>>>> You can serialize your trained model to persist somewhere.
>>>>
>>>> Ekrem Aksoy
>>>>
>>>> On Sat, Mar 7, 2015 at 12:10 PM, Xi Shen <da...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I checked a few ML algorithms in MLLib.
>>>>>
>>>>> https://spark.apache.org/docs/0.8.1/api/mllib/index.html#
>>>>> org.apache.spark.mllib.classification.LogisticRegressionModel
>>>>>
>>>>> I could not find a way to save the trained model. Does this means I
>>>>> have to train my model every time? Is there a more economic way to do this?
>>>>>
>>>>> I am thinking about something like:
>>>>>
>>>>> model.run(...)
>>>>> model.save("hdfs://path/to/hdfs")
>>>>>
>>>>> Then, next I can do:
>>>>>
>>>>> val model = Model.createFrom("hdfs://...")
>>>>> model.predict(vector)
>>>>>
>>>>> I am new to spark, maybe there are other ways to persistent the model?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>

Re: How to reuse a ML trained model?

Posted by Burak Yavuz <br...@gmail.com>.
Hi,

There is model import/export for some of the ML algorithms on the current
master (and they'll be shipped with the 1.3 release).

Burak
On Mar 7, 2015 4:17 AM, "Xi Shen" <da...@gmail.com> wrote:

> Wait...it seem SparkContext does not provide a way to save/load object
> files. It can only save/load RDD. What do I missed here?
>
>
> Thanks,
> David
>
>
> On Sat, Mar 7, 2015 at 11:05 PM Xi Shen <da...@gmail.com> wrote:
>
>> Ah~it is serializable. Thanks!
>>
>>
>> On Sat, Mar 7, 2015 at 10:59 PM Ekrem Aksoy <ek...@gmail.com> wrote:
>>
>>> You can serialize your trained model to persist somewhere.
>>>
>>> Ekrem Aksoy
>>>
>>> On Sat, Mar 7, 2015 at 12:10 PM, Xi Shen <da...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I checked a few ML algorithms in MLLib.
>>>>
>>>> https://spark.apache.org/docs/0.8.1/api/mllib/index.html#
>>>> org.apache.spark.mllib.classification.LogisticRegressionModel
>>>>
>>>> I could not find a way to save the trained model. Does this means I
>>>> have to train my model every time? Is there a more economic way to do this?
>>>>
>>>> I am thinking about something like:
>>>>
>>>> model.run(...)
>>>> model.save("hdfs://path/to/hdfs")
>>>>
>>>> Then, next I can do:
>>>>
>>>> val model = Model.createFrom("hdfs://...")
>>>> model.predict(vector)
>>>>
>>>> I am new to spark, maybe there are other ways to persistent the model?
>>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>

Re: How to reuse a ML trained model?

Posted by Xi Shen <da...@gmail.com>.
Wait...it seem SparkContext does not provide a way to save/load object
files. It can only save/load RDD. What do I missed here?


Thanks,
David


On Sat, Mar 7, 2015 at 11:05 PM Xi Shen <da...@gmail.com> wrote:

> Ah~it is serializable. Thanks!
>
>
> On Sat, Mar 7, 2015 at 10:59 PM Ekrem Aksoy <ek...@gmail.com> wrote:
>
>> You can serialize your trained model to persist somewhere.
>>
>> Ekrem Aksoy
>>
>> On Sat, Mar 7, 2015 at 12:10 PM, Xi Shen <da...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I checked a few ML algorithms in MLLib.
>>>
>>> https://spark.apache.org/docs/0.8.1/api/mllib/index.html#
>>> org.apache.spark.mllib.classification.LogisticRegressionModel
>>>
>>> I could not find a way to save the trained model. Does this means I have
>>> to train my model every time? Is there a more economic way to do this?
>>>
>>> I am thinking about something like:
>>>
>>> model.run(...)
>>> model.save("hdfs://path/to/hdfs")
>>>
>>> Then, next I can do:
>>>
>>> val model = Model.createFrom("hdfs://...")
>>> model.predict(vector)
>>>
>>> I am new to spark, maybe there are other ways to persistent the model?
>>>
>>>
>>> Thanks,
>>> David
>>>
>>>
>>

Re: How to reuse a ML trained model?

Posted by Xi Shen <da...@gmail.com>.
Ah~it is serializable. Thanks!


On Sat, Mar 7, 2015 at 10:59 PM Ekrem Aksoy <ek...@gmail.com> wrote:

> You can serialize your trained model to persist somewhere.
>
> Ekrem Aksoy
>
> On Sat, Mar 7, 2015 at 12:10 PM, Xi Shen <da...@gmail.com> wrote:
>
>> Hi,
>>
>> I checked a few ML algorithms in MLLib.
>>
>>
>> https://spark.apache.org/docs/0.8.1/api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegressionModel
>>
>> I could not find a way to save the trained model. Does this means I have
>> to train my model every time? Is there a more economic way to do this?
>>
>> I am thinking about something like:
>>
>> model.run(...)
>> model.save("hdfs://path/to/hdfs")
>>
>> Then, next I can do:
>>
>> val model = Model.createFrom("hdfs://...")
>> model.predict(vector)
>>
>> I am new to spark, maybe there are other ways to persistent the model?
>>
>>
>> Thanks,
>> David
>>
>>
>

Re: How to reuse a ML trained model?

Posted by Ekrem Aksoy <ek...@gmail.com>.
You can serialize your trained model to persist somewhere.

Ekrem Aksoy

On Sat, Mar 7, 2015 at 12:10 PM, Xi Shen <da...@gmail.com> wrote:

> Hi,
>
> I checked a few ML algorithms in MLLib.
>
>
> https://spark.apache.org/docs/0.8.1/api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegressionModel
>
> I could not find a way to save the trained model. Does this means I have
> to train my model every time? Is there a more economic way to do this?
>
> I am thinking about something like:
>
> model.run(...)
> model.save("hdfs://path/to/hdfs")
>
> Then, next I can do:
>
> val model = Model.createFrom("hdfs://...")
> model.predict(vector)
>
> I am new to spark, maybe there are other ways to persistent the model?
>
>
> Thanks,
> David
>
>