You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "aka.fe2s" <ak...@gmail.com> on 2016/07/12 14:40:36 UTC

ml and mllib persistence

What is the reason Spark has an individual implementations of read/write
routines for every model in mllib and ml? (Saveable and MLWritable trait
impls)

Wouldn't a generic implementation via Java serialization mechanism work? I
would like to use it to store the models to a custom storage.

--
Oleksiy

Re: ml and mllib persistence

Posted by Salih Gedik <me...@salih.xyz>.
I see. I could not find any explanation about this. Could you tell me 
what sort of portability issue is this? Isn't JVM supposed to give the 
abstraction of that?

Thanks!


On 12.07.2016 20:04, Reynold Xin wrote:
> Platform as a general word, eg language platforms, OS, different JVM 
> versions, different JVM vendors, different Spark versions...
>
> On Tuesday, July 12, 2016, Salih Gedik <me@salih.xyz 
> <ma...@salih.xyz>> wrote:
>
>     Hi Reynold,
>
>     I was wondering if you meant cross language or cross platform? Thanks
>     On 12.07.2016 19:57, Reynold Xin wrote:
>>     Also Java serialization isn't great for cross platform
>>     compatibility.
>>
>>     On Tuesday, July 12, 2016, aka.fe2s <aka.fe2s@gmail.com
>>     <javascript:_e(%7B%7D,'cvml','aka.fe2s@gmail.com');>> wrote:
>>
>>         Okay, I think I found an answer on my question. Some models
>>         (for instance
>>         org.apache.spark.mllib.recommendation.MatrixFactorizationModel)
>>         hold RDDs, so just serializing these objects will not work.
>>
>>         --
>>         Oleksiy Dyagilev
>>
>>         On Tue, Jul 12, 2016 at 5:40 PM, aka.fe2s
>>         <ak...@gmail.com> wrote:
>>
>>             What is the reason Spark has an individual
>>             implementations of read/write routines for every model in
>>             mllib and ml? (Saveable and MLWritable trait impls)
>>
>>             Wouldn't a generic implementation via Java serialization
>>             mechanism work? I would like to use it to store the
>>             models to a custom storage.
>>
>>             --
>>             Oleksiy
>>
>>
>
>     -- 
>     Salih Gedik
>

-- 
Salih Gedik


Re: ml and mllib persistence

Posted by Reynold Xin <rx...@databricks.com>.
Platform as a general word, eg language platforms, OS, different JVM
versions, different JVM vendors, different Spark versions...

On Tuesday, July 12, 2016, Salih Gedik <me...@salih.xyz> wrote:

> Hi Reynold,
>
> I was wondering if you meant cross language or cross platform? Thanks
> On 12.07.2016 19:57, Reynold Xin wrote:
>
> Also Java serialization isn't great for cross platform compatibility.
>
> On Tuesday, July 12, 2016, aka.fe2s <aka.fe2s@gmail.com
> <javascript:_e(%7B%7D,'cvml','aka.fe2s@gmail.com');>> wrote:
>
>> Okay, I think I found an answer on my question. Some models (for instance
>> org.apache.spark.mllib.recommendation.MatrixFactorizationModel) hold RDDs,
>> so just serializing these objects will not work.
>>
>> --
>> Oleksiy Dyagilev
>>
>> On Tue, Jul 12, 2016 at 5:40 PM, aka.fe2s <ak...@gmail.com> wrote:
>>
>>> What is the reason Spark has an individual implementations of read/write
>>> routines for every model in mllib and ml? (Saveable and MLWritable trait
>>> impls)
>>>
>>> Wouldn't a generic implementation via Java serialization mechanism work?
>>> I would like to use it to store the models to a custom storage.
>>>
>>> --
>>> Oleksiy
>>>
>>
>>
> --
> Salih Gedik
>
>

Re: ml and mllib persistence

Posted by Salih Gedik <me...@salih.xyz>.
Hi Reynold,

I was wondering if you meant cross language or cross platform? Thanks
On 12.07.2016 19:57, Reynold Xin wrote:
> Also Java serialization isn't great for cross platform compatibility.
>
> On Tuesday, July 12, 2016, aka.fe2s <aka.fe2s@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Okay, I think I found an answer on my question. Some models (for
>     instance
>     org.apache.spark.mllib.recommendation.MatrixFactorizationModel)
>     hold RDDs, so just serializing these objects will not work.
>
>     --
>     Oleksiy Dyagilev
>
>     On Tue, Jul 12, 2016 at 5:40 PM, aka.fe2s <aka.fe2s@gmail.com
>     <javascript:_e(%7B%7D,'cvml','aka.fe2s@gmail.com');>> wrote:
>
>         What is the reason Spark has an individual implementations of
>         read/write routines for every model in mllib and ml? (Saveable
>         and MLWritable trait impls)
>
>         Wouldn't a generic implementation via Java serialization
>         mechanism work? I would like to use it to store the models to
>         a custom storage.
>
>         --
>         Oleksiy
>
>

-- 
Salih Gedik


Re: ml and mllib persistence

Posted by Reynold Xin <rx...@databricks.com>.
Also Java serialization isn't great for cross platform compatibility.

On Tuesday, July 12, 2016, aka.fe2s <ak...@gmail.com> wrote:

> Okay, I think I found an answer on my question. Some models (for instance
> org.apache.spark.mllib.recommendation.MatrixFactorizationModel) hold RDDs,
> so just serializing these objects will not work.
>
> --
> Oleksiy Dyagilev
>
> On Tue, Jul 12, 2016 at 5:40 PM, aka.fe2s <aka.fe2s@gmail.com
> <javascript:_e(%7B%7D,'cvml','aka.fe2s@gmail.com');>> wrote:
>
>> What is the reason Spark has an individual implementations of read/write
>> routines for every model in mllib and ml? (Saveable and MLWritable trait
>> impls)
>>
>> Wouldn't a generic implementation via Java serialization mechanism work?
>> I would like to use it to store the models to a custom storage.
>>
>> --
>> Oleksiy
>>
>
>

Re: ml and mllib persistence

Posted by Reynold Xin <rx...@databricks.com>.
Also Java serialization isn't great for cross platform compatibility.

On Tuesday, July 12, 2016, aka.fe2s <ak...@gmail.com> wrote:

> Okay, I think I found an answer on my question. Some models (for instance
> org.apache.spark.mllib.recommendation.MatrixFactorizationModel) hold RDDs,
> so just serializing these objects will not work.
>
> --
> Oleksiy Dyagilev
>
> On Tue, Jul 12, 2016 at 5:40 PM, aka.fe2s <aka.fe2s@gmail.com
> <javascript:_e(%7B%7D,'cvml','aka.fe2s@gmail.com');>> wrote:
>
>> What is the reason Spark has an individual implementations of read/write
>> routines for every model in mllib and ml? (Saveable and MLWritable trait
>> impls)
>>
>> Wouldn't a generic implementation via Java serialization mechanism work?
>> I would like to use it to store the models to a custom storage.
>>
>> --
>> Oleksiy
>>
>
>

Re: ml and mllib persistence

Posted by "aka.fe2s" <ak...@gmail.com>.
Okay, I think I found an answer on my question. Some models (for instance
org.apache.spark.mllib.recommendation.MatrixFactorizationModel) hold RDDs,
so just serializing these objects will not work.

--
Oleksiy Dyagilev

On Tue, Jul 12, 2016 at 5:40 PM, aka.fe2s <ak...@gmail.com> wrote:

> What is the reason Spark has an individual implementations of read/write
> routines for every model in mllib and ml? (Saveable and MLWritable trait
> impls)
>
> Wouldn't a generic implementation via Java serialization mechanism work? I
> would like to use it to store the models to a custom storage.
>
> --
> Oleksiy
>

Re: ml and mllib persistence

Posted by "aka.fe2s" <ak...@gmail.com>.
Okay, I think I found an answer on my question. Some models (for instance
org.apache.spark.mllib.recommendation.MatrixFactorizationModel) hold RDDs,
so just serializing these objects will not work.

--
Oleksiy Dyagilev

On Tue, Jul 12, 2016 at 5:40 PM, aka.fe2s <ak...@gmail.com> wrote:

> What is the reason Spark has an individual implementations of read/write
> routines for every model in mllib and ml? (Saveable and MLWritable trait
> impls)
>
> Wouldn't a generic implementation via Java serialization mechanism work? I
> would like to use it to store the models to a custom storage.
>
> --
> Oleksiy
>