You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Alejandro Abdelnur <tu...@cloudera.com> on 2013/05/20 20:04:31 UTC

serializing/deserializing wrapped protobuf generated classes

[I know it has been discussed before the need (or not) of having the
current wrappers hiding the protobuf generated classes].

I order to do things like AMs failover and checkpointing I need to
serialize app IDs, app attempt IDs, containers and/or IDs,  resource
requests, etc.

This means that the current wrapping hides the PB impl, thus hiding the
provided ser/deser capabilities.

I could force-cast a record to ProtoBase (which is private) and then get
the PROTO Message and then do the ser/deser with that.

But this, IMO, is a no no.

Thoughts?

Thanks

-- 
Alejandro

Re: serializing/deserializing wrapped protobuf generated classes

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Sid,

I've opened YARN-710 for this, and posted a patch following Bobby's
suggestion.

Yes, this ought to be part of the public API.

Thanks.



On Tue, May 21, 2013 at 12:48 AM, Siddharth Seth
<se...@gmail.com>wrote:

> I'd agree. Using ProtoBase in application code is not a good idea.
>
> The MR AM currently uses the string form of the IDs to write to the history
> file, and constructs them back from this. This would likely not be
> practical for some of the more complicated records like Container.
>
> I initially thought a serialization/deserialization interface on all
> records would be useful (especially in the context of RM state store), but
> have since been convinced otherwise - YARN does not need to expose it's
> underlying record serialization mechanism to user code - not via the
> individual records itself.  That said, should YARN be providing a utility
> library to help app developers with this functionality. This would end up
> becoming part of the public API, along with the data format. What do others
> think ?
>
> Thanks
> - Sid
>
>
>
> On Mon, May 20, 2013 at 11:04 AM, Alejandro Abdelnur <tucu@cloudera.com
> >wrote:
>
> > [I know it has been discussed before the need (or not) of having the
> > current wrappers hiding the protobuf generated classes].
> >
> > I order to do things like AMs failover and checkpointing I need to
> > serialize app IDs, app attempt IDs, containers and/or IDs,  resource
> > requests, etc.
> >
> > This means that the current wrapping hides the PB impl, thus hiding the
> > provided ser/deser capabilities.
> >
> > I could force-cast a record to ProtoBase (which is private) and then get
> > the PROTO Message and then do the ser/deser with that.
> >
> > But this, IMO, is a no no.
> >
> > Thoughts?
> >
> > Thanks
> >
> > --
> > Alejandro
> >
>



-- 
Alejandro

Re: serializing/deserializing wrapped protobuf generated classes

Posted by Siddharth Seth <se...@gmail.com>.
I'd agree. Using ProtoBase in application code is not a good idea.

The MR AM currently uses the string form of the IDs to write to the history
file, and constructs them back from this. This would likely not be
practical for some of the more complicated records like Container.

I initially thought a serialization/deserialization interface on all
records would be useful (especially in the context of RM state store), but
have since been convinced otherwise - YARN does not need to expose it's
underlying record serialization mechanism to user code - not via the
individual records itself.  That said, should YARN be providing a utility
library to help app developers with this functionality. This would end up
becoming part of the public API, along with the data format. What do others
think ?

Thanks
- Sid



On Mon, May 20, 2013 at 11:04 AM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> [I know it has been discussed before the need (or not) of having the
> current wrappers hiding the protobuf generated classes].
>
> I order to do things like AMs failover and checkpointing I need to
> serialize app IDs, app attempt IDs, containers and/or IDs,  resource
> requests, etc.
>
> This means that the current wrapping hides the PB impl, thus hiding the
> provided ser/deser capabilities.
>
> I could force-cast a record to ProtoBase (which is private) and then get
> the PROTO Message and then do the ser/deser with that.
>
> But this, IMO, is a no no.
>
> Thoughts?
>
> Thanks
>
> --
> Alejandro
>

Re: serializing/deserializing wrapped protobuf generated classes

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
how about sending everybody on vacations for a few days :)


On Tue, May 21, 2013 at 10:54 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> I totally agree.  The biggest problem is that some of the data structures
> we do not treat as immutable data.  This make the port a bit too complex
> to use Eclipse or sed to fully accomplish.  Also it touches every part of
> the code.  The up merges were killing me.
>
> --Bobby
>
> On 5/21/13 12:07 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>
> >Thanks Bobby. Yep, argh, I was thinking something along those lines (on a
> >side note, it is a pity that we hide the PB interfaces/builders).
> >
> >
> >On Tue, May 21, 2013 at 8:32 AM, Robert Evans <ev...@yahoo-inc.com>
> wrote:
> >
> >> I agree with you, and I started down the path of ripping out that
> >> abstraction layer because, at this point, we are only going to support
> >>pro
> >> to buffers, but it turned out to be too much work to get done before the
> >> we wanted to lock down the APIs so I stopped.  I would suggest that you
> >>do
> >> the minimum to keep the abstraction in place, but don't do a lot of work
> >> beyond that.  Perhaps something similar to the RecordFactory API that
> >> would allow you to serialize/deserialize a blob into an instance of a
> >> known class.  Perhaps you could even extend RecordFactory to look
> >> something like
> >>
> >> public interface RecordFactory {
> >>   public <T> T newRecordInstance(Class<T> clazz) throws YarnException;
> >>
> >> public <T> T newRecordInstance(Class<T> clazz, byte[] data) throws
> >> YarnException;
> >>
> >> public <T> byte[] fromRecordInstance(T t) throws YarnException;
> >> }
> >>
> >> --Bobby
> >>
> >>
> >> On 5/21/13 8:54 AM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> >>
> >> >ping
> >> >
> >> >Alejandro
> >> >(phone typing)
> >> >
> >> >On May 20, 2013, at 11:04, Alejandro Abdelnur <tu...@cloudera.com>
> >>wrote:
> >> >
> >> >> [I know it has been discussed before the need (or not) of having the
> >> >>current wrappers hiding the protobuf generated classes].
> >> >>
> >> >> I order to do things like AMs failover and checkpointing I need to
> >> >>serialize app IDs, app attempt IDs, containers and/or IDs,  resource
> >> >>requests, etc.
> >> >>
> >> >> This means that the current wrapping hides the PB impl, thus hiding
> >>the
> >> >>provided ser/deser capabilities.
> >> >>
> >> >> I could force-cast a record to ProtoBase (which is private) and then
> >> >>get the PROTO Message and then do the ser/deser with that.
> >> >>
> >> >> But this, IMO, is a no no.
> >> >>
> >> >> Thoughts?
> >> >>
> >> >> Thanks
> >> >>
> >> >> --
> >> >> Alejandro
> >>
> >>
> >
> >
> >--
> >Alejandro
>
>


-- 
Alejandro

Re: serializing/deserializing wrapped protobuf generated classes

Posted by Robert Evans <ev...@yahoo-inc.com>.
I totally agree.  The biggest problem is that some of the data structures
we do not treat as immutable data.  This make the port a bit too complex
to use Eclipse or sed to fully accomplish.  Also it touches every part of
the code.  The up merges were killing me.

--Bobby

On 5/21/13 12:07 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:

>Thanks Bobby. Yep, argh, I was thinking something along those lines (on a
>side note, it is a pity that we hide the PB interfaces/builders).
>
>
>On Tue, May 21, 2013 at 8:32 AM, Robert Evans <ev...@yahoo-inc.com> wrote:
>
>> I agree with you, and I started down the path of ripping out that
>> abstraction layer because, at this point, we are only going to support
>>pro
>> to buffers, but it turned out to be too much work to get done before the
>> we wanted to lock down the APIs so I stopped.  I would suggest that you
>>do
>> the minimum to keep the abstraction in place, but don't do a lot of work
>> beyond that.  Perhaps something similar to the RecordFactory API that
>> would allow you to serialize/deserialize a blob into an instance of a
>> known class.  Perhaps you could even extend RecordFactory to look
>> something like
>>
>> public interface RecordFactory {
>>   public <T> T newRecordInstance(Class<T> clazz) throws YarnException;
>>
>> public <T> T newRecordInstance(Class<T> clazz, byte[] data) throws
>> YarnException;
>>
>> public <T> byte[] fromRecordInstance(T t) throws YarnException;
>> }
>>
>> --Bobby
>>
>>
>> On 5/21/13 8:54 AM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>>
>> >ping
>> >
>> >Alejandro
>> >(phone typing)
>> >
>> >On May 20, 2013, at 11:04, Alejandro Abdelnur <tu...@cloudera.com>
>>wrote:
>> >
>> >> [I know it has been discussed before the need (or not) of having the
>> >>current wrappers hiding the protobuf generated classes].
>> >>
>> >> I order to do things like AMs failover and checkpointing I need to
>> >>serialize app IDs, app attempt IDs, containers and/or IDs,  resource
>> >>requests, etc.
>> >>
>> >> This means that the current wrapping hides the PB impl, thus hiding
>>the
>> >>provided ser/deser capabilities.
>> >>
>> >> I could force-cast a record to ProtoBase (which is private) and then
>> >>get the PROTO Message and then do the ser/deser with that.
>> >>
>> >> But this, IMO, is a no no.
>> >>
>> >> Thoughts?
>> >>
>> >> Thanks
>> >>
>> >> --
>> >> Alejandro
>>
>>
>
>
>-- 
>Alejandro


Re: serializing/deserializing wrapped protobuf generated classes

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Thanks Bobby. Yep, argh, I was thinking something along those lines (on a
side note, it is a pity that we hide the PB interfaces/builders).


On Tue, May 21, 2013 at 8:32 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> I agree with you, and I started down the path of ripping out that
> abstraction layer because, at this point, we are only going to support pro
> to buffers, but it turned out to be too much work to get done before the
> we wanted to lock down the APIs so I stopped.  I would suggest that you do
> the minimum to keep the abstraction in place, but don't do a lot of work
> beyond that.  Perhaps something similar to the RecordFactory API that
> would allow you to serialize/deserialize a blob into an instance of a
> known class.  Perhaps you could even extend RecordFactory to look
> something like
>
> public interface RecordFactory {
>   public <T> T newRecordInstance(Class<T> clazz) throws YarnException;
>
> public <T> T newRecordInstance(Class<T> clazz, byte[] data) throws
> YarnException;
>
> public <T> byte[] fromRecordInstance(T t) throws YarnException;
> }
>
> --Bobby
>
>
> On 5/21/13 8:54 AM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>
> >ping
> >
> >Alejandro
> >(phone typing)
> >
> >On May 20, 2013, at 11:04, Alejandro Abdelnur <tu...@cloudera.com> wrote:
> >
> >> [I know it has been discussed before the need (or not) of having the
> >>current wrappers hiding the protobuf generated classes].
> >>
> >> I order to do things like AMs failover and checkpointing I need to
> >>serialize app IDs, app attempt IDs, containers and/or IDs,  resource
> >>requests, etc.
> >>
> >> This means that the current wrapping hides the PB impl, thus hiding the
> >>provided ser/deser capabilities.
> >>
> >> I could force-cast a record to ProtoBase (which is private) and then
> >>get the PROTO Message and then do the ser/deser with that.
> >>
> >> But this, IMO, is a no no.
> >>
> >> Thoughts?
> >>
> >> Thanks
> >>
> >> --
> >> Alejandro
>
>


-- 
Alejandro

Re: serializing/deserializing wrapped protobuf generated classes

Posted by Robert Evans <ev...@yahoo-inc.com>.
I agree with you, and I started down the path of ripping out that
abstraction layer because, at this point, we are only going to support pro
to buffers, but it turned out to be too much work to get done before the
we wanted to lock down the APIs so I stopped.  I would suggest that you do
the minimum to keep the abstraction in place, but don't do a lot of work
beyond that.  Perhaps something similar to the RecordFactory API that
would allow you to serialize/deserialize a blob into an instance of a
known class.  Perhaps you could even extend RecordFactory to look
something like

public interface RecordFactory {
  public <T> T newRecordInstance(Class<T> clazz) throws YarnException;

public <T> T newRecordInstance(Class<T> clazz, byte[] data) throws
YarnException;

public <T> byte[] fromRecordInstance(T t) throws YarnException;
}

--Bobby
 

On 5/21/13 8:54 AM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:

>ping
>
>Alejandro
>(phone typing)
>
>On May 20, 2013, at 11:04, Alejandro Abdelnur <tu...@cloudera.com> wrote:
>
>> [I know it has been discussed before the need (or not) of having the
>>current wrappers hiding the protobuf generated classes].
>> 
>> I order to do things like AMs failover and checkpointing I need to
>>serialize app IDs, app attempt IDs, containers and/or IDs,  resource
>>requests, etc.
>> 
>> This means that the current wrapping hides the PB impl, thus hiding the
>>provided ser/deser capabilities.
>> 
>> I could force-cast a record to ProtoBase (which is private) and then
>>get the PROTO Message and then do the ser/deser with that.
>> 
>> But this, IMO, is a no no.
>> 
>> Thoughts?
>> 
>> Thanks
>> 
>> -- 
>> Alejandro


Re: serializing/deserializing wrapped protobuf generated classes

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
ping

Alejandro
(phone typing)

On May 20, 2013, at 11:04, Alejandro Abdelnur <tu...@cloudera.com> wrote:

> [I know it has been discussed before the need (or not) of having the current wrappers hiding the protobuf generated classes].
> 
> I order to do things like AMs failover and checkpointing I need to serialize app IDs, app attempt IDs, containers and/or IDs,  resource requests, etc.
> 
> This means that the current wrapping hides the PB impl, thus hiding the provided ser/deser capabilities.
> 
> I could force-cast a record to ProtoBase (which is private) and then get the PROTO Message and then do the ser/deser with that.
> 
> But this, IMO, is a no no.
> 
> Thoughts?
> 
> Thanks
> 
> -- 
> Alejandro