You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Devaraj Das <dd...@hortonworks.com> on 2012/07/31 02:47:26 UTC

Handling protocol versions

Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
3. HBase installations have happened with both the protocol implementations.
4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).

(4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method... 

Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works.. 

The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.

Any thoughts on whether we should invest time in the second approach yet?

Thanks,
Devaraj.

Re: Handling protocol versions

Posted by Devaraj Das <dd...@hortonworks.com>.
On Fri, Dec 28, 2012 at 8:59 AM, Stack <st...@duboce.net> wrote:

> On Fri, Dec 28, 2012 at 1:31 AM, Devaraj Das <dd...@hortonworks.com> wrote:
>
> > Now thinking more about it, if a server implements a method more
> > efficiently, we probably could have new fields in the method argument to
> > indicate the client is willing to accept the new semantics. A new server
> > could detect that (by checking for existence of such a field), and an old
> > server would simply ignore that field. The new server could do a
> different
> > processing of the request, and the response, although the same message
> > type, might have new fields to capture the response under the new
> > semantics.
> >
> > Over time, the method code might evolve, and might become unmaintainable
> > ... that's the worry. It might make sense to just break up the method
> into
> > multiple implementations..
> >
> >
> Yes.  Protobufs gives us wiggle-room.
>
>
>
> > I am +1 for getting a PB'ed description of the protocol, the client
> caching
> > it, and then deciding which method to invoke based on what's supported in
> > the server. This will also address the orthogonal case of the server
> > letting the client know all its capabilities.
> >
>
>
> This is how a client would learn of completely new functionality that has
> been added to the server?
>
> On client setup of proxy, as first request, instead of asking server for
> the version of the protocol it is serving, instead it could ask the server
> for the pb'd description of the protocol [1] and the client could look at
> this to see if the server supported new functionality?
>
> The returned descriptor would be much fatter than a bitmap.
>
>
Bitmap is fine as well if the PB'ed representation is too verbose.


> St.Ack
>
> 1.
>
> https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Descriptors.ServiceDescriptor
>

Re: Handling protocol versions

Posted by Stack <st...@duboce.net>.
On Fri, Dec 28, 2012 at 1:31 AM, Devaraj Das <dd...@hortonworks.com> wrote:

> Now thinking more about it, if a server implements a method more
> efficiently, we probably could have new fields in the method argument to
> indicate the client is willing to accept the new semantics. A new server
> could detect that (by checking for existence of such a field), and an old
> server would simply ignore that field. The new server could do a different
> processing of the request, and the response, although the same message
> type, might have new fields to capture the response under the new
> semantics.
>
> Over time, the method code might evolve, and might become unmaintainable
> ... that's the worry. It might make sense to just break up the method into
> multiple implementations..
>
>
Yes.  Protobufs gives us wiggle-room.



> I am +1 for getting a PB'ed description of the protocol, the client caching
> it, and then deciding which method to invoke based on what's supported in
> the server. This will also address the orthogonal case of the server
> letting the client know all its capabilities.
>


This is how a client would learn of completely new functionality that has
been added to the server?

On client setup of proxy, as first request, instead of asking server for
the version of the protocol it is serving, instead it could ask the server
for the pb'd description of the protocol [1] and the client could look at
this to see if the server supported new functionality?

The returned descriptor would be much fatter than a bitmap.

St.Ack

1.
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Descriptors.ServiceDescriptor

Re: Handling protocol versions

Posted by Devaraj Das <dd...@hortonworks.com>.
Now thinking more about it, if a server implements a method more
efficiently, we probably could have new fields in the method argument to
indicate the client is willing to accept the new semantics. A new server
could detect that (by checking for existence of such a field), and an old
server would simply ignore that field. The new server could do a different
processing of the request, and the response, although the same message
type, might have new fields to capture the response under the new semantics.

Over time, the method code might evolve, and might become unmaintainable
... that's the worry. It might make sense to just break up the method into
multiple implementations..

I am +1 for getting a PB'ed description of the protocol, the client caching
it, and then deciding which method to invoke based on what's supported in
the server. This will also address the orthogonal case of the server
letting the client know all its capabilities.

Thoughts?


On Thu, Dec 27, 2012 at 7:11 PM, Stack <st...@duboce.net> wrote:

> On Thu, Dec 27, 2012 at 5:37 PM, Enis Söztutar <en...@gmail.com> wrote:
>
> > I think what Devaraj describes is a valid use case, and I am sure we will
> > need it a few times. However, I suspect each of these might be unique,
> and
> > we have to deal with how to handle backwards-forwards compat from the
> > client differently (image META moving to zk, after 0.96). So we cannot
> > easily generalize, and we may still have to drop support for features
> > gradually.
> >
> >
> I agree.  Just trying to make sure we have some facility in place to help
> us over some of the humps.
>
>
> > If we still keep the version, do we bump it every time a parameter is
> added
> > to a method, or only when a new method is added? It does not sound very
> > maintainable.
> >
> >
> Version alone won't work.
>
> The 0.94 branch might be version 100.
>
> The 0.96 branch might be 105.
>
> If we want to backport the method that cuts CO2 emissions by 25% but only
> this method, what version do we give 0.94's protocol?  We could make it 101
> but maybe 0.96.3 was 101?  We could give it a version that has not been
> seen before but then it gets a little awkward to manage and understand.
>  Regardless, client would have to keep a dictionary of methods per version
> number, a pain.
>
> The suggestion above was that the server gives off a list of features
> written in shorthand, a bitmap, where bits are set when a feature is added.
>  This way a client can look at the bitmap and see if the C02 saving feature
> is available in the 0.94 server and if so, use that method.
>
>
>
> > Not knowing much about the recent changes, why don't we go full PB, and
> > define actual rpc methods as services? (as in
> > https://developers.google.com/protocol-buffers/docs/proto#services)
> >
> >
> I thought about it.  It has some nice facility that comes for free.  For
> example, you can get an aforementioned pb'd description of the "protocol"
> and actually used the return to compose an invocation against the server.
>  Nice.  Our 'protocols' actually already implement Service.Interface from
> pb (actually Service.BlockingInterface).  I'm not sure why as it looks to
> complicate things going by a quick examination today (I started stripping
> it out to see what would break).  So it would not take too much to get a
> Stub on clientside and have servers implement the Service.  We could try
> shoehorning our RPC so it implemented the necessary RpcController, etc.
> Interfaces.
>
> But it would seem Service is deprecated with a good while now [1] and folks
> are encouraged to do otherwise because as is, the generated code makes for
> too much "indirection" [1].
>
> I could try playing around some more w/ using Service to learn more about
> this 'indirection'.  We could use the long-hand service descriptor in place
> of the above suggested bitmap figuring what the server provides.
>
> St.Ack
>
> 1.
>
> https://developers.google.com/protocol-buffers/docs/reference/java-generated#service
>

Re: Handling protocol versions

Posted by Stack <st...@duboce.net>.
On Wed, Jan 2, 2013 at 2:34 PM, Elliott Clark <ec...@apache.org> wrote:

> Removing the versioning altogether seems good.  That leads to much less
> coupling between the client and the server.
>
> I would vote to use BlockingInterface (to replace our versioned protocol
> class) everywhere and just write our own rpc/ipc.  Stack walked me through
> some of the code that is needed for using all of the Protobuf Service and
> Protobuf Blocking Channels; That route seems to have lots of it's own
> cruft.  So if we're going to have a clean up, we shouldn't start out with
> something knowing the result will be crufty.
>
>
Let me try doing the above (Removing versioning and not going the pb
Service route).  We can't use BlockingInterface to replace
VersionedProtocol... BIs do not have a common ancestor.   Let me play
around....  I'll be back.


> Additionally we should move the exception responses into either the header
> or the body.  As it currently stands having to conditionally cast the next
> message into either a response or an error just seems like we're
> re-implementing protobuf's optional.
>
>
I think this a good idea.  Will try this too.

Thanks E,
St.Ack

Re: Handling protocol versions

Posted by Elliott Clark <ec...@apache.org>.
Removing the versioning altogether seems good.  That leads to much less
coupling between the client and the server.

I would vote to use BlockingInterface (to replace our versioned protocol
class) everywhere and just write our own rpc/ipc.  Stack walked me through
some of the code that is needed for using all of the Protobuf Service and
Protobuf Blocking Channels; That route seems to have lots of it's own
cruft.  So if we're going to have a clean up, we shouldn't start out with
something knowing the result will be crufty.

Additionally we should move the exception responses into either the header
or the body.  As it currently stands having to conditionally cast the next
message into either a response or an error just seems like we're
re-implementing protobuf's optional.




On Tue, Jan 1, 2013 at 4:44 PM, Stack <st...@duboce.net> wrote:

> On Thu, Dec 27, 2012 at 7:11 PM, Stack <st...@duboce.net> wrote:
>
> >  Not knowing much about the recent changes, why don't we go full PB, and
> >> define actual rpc methods as services? (as in
> >> https://developers.google.com/protocol-buffers/docs/proto#services)
> >>
> >>
> > I thought about it.  It has some nice facility that comes for free.  For
> > example, you can get an aforementioned pb'd description of the "protocol"
> > and actually used the return to compose an invocation against the server.
> >  Nice.  Our 'protocols' actually already implement Service.Interface from
> > pb (actually Service.BlockingInterface).  I'm not sure why as it looks to
> > complicate things going by a quick examination today (I started stripping
> > it out to see what would break).  So it would not take too much to get a
> > Stub on clientside and have servers implement the Service.  We could try
> > shoehorning our RPC so it implemented the necessary RpcController, etc.
> > Interfaces.
> >
> > But it would seem Service is deprecated with a good while now [1] and
> > folks are encouraged to do otherwise because as is, the generated code
> > makes for too much "indirection" [1].
> >
> > I could try playing around some more w/ using Service to learn more about
> > this 'indirection'.  We could use the long-hand service descriptor in
> place
> > of the above suggested bitmap figuring what the server provides.
> >
> >
> I experimented hooking up protobuf Service to our RPC.  I put up a patch
> over on https://issues.apache.org/jira/browse/HBASE-6521 along w/ some
> notes made while messing.
>
> The main 'pro' is that our rpc would get a much needed spring cleaning.
>  Main 'con' is that we would be changing code (smile).  The main TODO is
> making sure no performance degradation (should be none server-side, need to
> make sure same is true client-side).
>
> This experiment has made me change my opinion regards 'versioning'.  Above
> I suggest we remove VersionedProtocol and add in instead a protobuf
> ProtocolDescriptor that would have a 'version' as well as a short and long
> form description of server 'features'.  Now I think we should just punt on
> version/descriptors altogether.  Lets just go the route where a method is
> supported or not.  That methods take a protobuf request and returns a
> protobuf response, as has been said already, gives us some wriggle room to
> evolve methods as time goes by.  For protocol migrations that require more
> this 'vocabulary', lets deal w/ them on a case by case basis (As per Enis
> above).
>
> St.Ack
>

Re: Handling protocol versions

Posted by Stack <st...@duboce.net>.
On Thu, Dec 27, 2012 at 7:11 PM, Stack <st...@duboce.net> wrote:

>  Not knowing much about the recent changes, why don't we go full PB, and
>> define actual rpc methods as services? (as in
>> https://developers.google.com/protocol-buffers/docs/proto#services)
>>
>>
> I thought about it.  It has some nice facility that comes for free.  For
> example, you can get an aforementioned pb'd description of the "protocol"
> and actually used the return to compose an invocation against the server.
>  Nice.  Our 'protocols' actually already implement Service.Interface from
> pb (actually Service.BlockingInterface).  I'm not sure why as it looks to
> complicate things going by a quick examination today (I started stripping
> it out to see what would break).  So it would not take too much to get a
> Stub on clientside and have servers implement the Service.  We could try
> shoehorning our RPC so it implemented the necessary RpcController, etc.
> Interfaces.
>
> But it would seem Service is deprecated with a good while now [1] and
> folks are encouraged to do otherwise because as is, the generated code
> makes for too much "indirection" [1].
>
> I could try playing around some more w/ using Service to learn more about
> this 'indirection'.  We could use the long-hand service descriptor in place
> of the above suggested bitmap figuring what the server provides.
>
>
I experimented hooking up protobuf Service to our RPC.  I put up a patch
over on https://issues.apache.org/jira/browse/HBASE-6521 along w/ some
notes made while messing.

The main 'pro' is that our rpc would get a much needed spring cleaning.
 Main 'con' is that we would be changing code (smile).  The main TODO is
making sure no performance degradation (should be none server-side, need to
make sure same is true client-side).

This experiment has made me change my opinion regards 'versioning'.  Above
I suggest we remove VersionedProtocol and add in instead a protobuf
ProtocolDescriptor that would have a 'version' as well as a short and long
form description of server 'features'.  Now I think we should just punt on
version/descriptors altogether.  Lets just go the route where a method is
supported or not.  That methods take a protobuf request and returns a
protobuf response, as has been said already, gives us some wriggle room to
evolve methods as time goes by.  For protocol migrations that require more
this 'vocabulary', lets deal w/ them on a case by case basis (As per Enis
above).

St.Ack

Re: Handling protocol versions

Posted by Stack <st...@duboce.net>.
On Thu, Dec 27, 2012 at 5:37 PM, Enis Söztutar <en...@gmail.com> wrote:

> I think what Devaraj describes is a valid use case, and I am sure we will
> need it a few times. However, I suspect each of these might be unique, and
> we have to deal with how to handle backwards-forwards compat from the
> client differently (image META moving to zk, after 0.96). So we cannot
> easily generalize, and we may still have to drop support for features
> gradually.
>
>
I agree.  Just trying to make sure we have some facility in place to help
us over some of the humps.


> If we still keep the version, do we bump it every time a parameter is added
> to a method, or only when a new method is added? It does not sound very
> maintainable.
>
>
Version alone won't work.

The 0.94 branch might be version 100.

The 0.96 branch might be 105.

If we want to backport the method that cuts CO2 emissions by 25% but only
this method, what version do we give 0.94's protocol?  We could make it 101
but maybe 0.96.3 was 101?  We could give it a version that has not been
seen before but then it gets a little awkward to manage and understand.
 Regardless, client would have to keep a dictionary of methods per version
number, a pain.

The suggestion above was that the server gives off a list of features
written in shorthand, a bitmap, where bits are set when a feature is added.
 This way a client can look at the bitmap and see if the C02 saving feature
is available in the 0.94 server and if so, use that method.



> Not knowing much about the recent changes, why don't we go full PB, and
> define actual rpc methods as services? (as in
> https://developers.google.com/protocol-buffers/docs/proto#services)
>
>
I thought about it.  It has some nice facility that comes for free.  For
example, you can get an aforementioned pb'd description of the "protocol"
and actually used the return to compose an invocation against the server.
 Nice.  Our 'protocols' actually already implement Service.Interface from
pb (actually Service.BlockingInterface).  I'm not sure why as it looks to
complicate things going by a quick examination today (I started stripping
it out to see what would break).  So it would not take too much to get a
Stub on clientside and have servers implement the Service.  We could try
shoehorning our RPC so it implemented the necessary RpcController, etc.
Interfaces.

But it would seem Service is deprecated with a good while now [1] and folks
are encouraged to do otherwise because as is, the generated code makes for
too much "indirection" [1].

I could try playing around some more w/ using Service to learn more about
this 'indirection'.  We could use the long-hand service descriptor in place
of the above suggested bitmap figuring what the server provides.

St.Ack

1.
https://developers.google.com/protocol-buffers/docs/reference/java-generated#service

Re: Handling protocol versions

Posted by Enis Söztutar <en...@gmail.com>.
I think what Devaraj describes is a valid use case, and I am sure we will
need it a few times. However, I suspect each of these might be unique, and
we have to deal with how to handle backwards-forwards compat from the
client differently (image META moving to zk, after 0.96). So we cannot
easily generalize, and we may still have to drop support for features
gradually.

If we still keep the version, do we bump it every time a parameter is added
to a method, or only when a new method is added? It does not sound very
maintainable.

Not knowing much about the recent changes, why don't we go full PB, and
define actual rpc methods as services? (as in
https://developers.google.com/protocol-buffers/docs/proto#services)


On Thu, Dec 27, 2012 at 1:13 PM, Jimmy Xiang <jx...@cloudera.com> wrote:

> +1 for removing VersionedProtocol and SignatureProtocol
> +0 for VersionedService/ProtocolDescriptor
>
> If we do have VersionedService/ProtocolDesscriptor, it will most likely be
> used in some
> mixed environment (most likely, new client and mixed versions of HBase
> servers, since old client doesn't
> know any new feature, old client doesn't assume an existing feature will be
> gone in the future either).
>
> With PB,  I think we are going to support a rolling-upgrade path.  That
> means, some mixed
> versions of HBase servers can be compatible. For enterprise, I think it is
> not that hard to
> maintain compatible HBase clusters.  So I don't think it is absolutely
> needed.
>
> Thanks,
> Jimmy
>
> On Thu, Dec 27, 2012 at 12:05 PM, Stack <st...@duboce.net> wrote:
>
> > So, picking up this thread again because I'm working on
> > https://issues.apache.org/jira/browse/HBASE-6521 "
> > Address the handling of multiple versions of a protocol"Address the
> > handling of multiple versions of a protocol", the original question was
> >  two-fold as I read it.
> >
> > 1. Should we keep VersionedProtocol.
> > 2. How does a client figure if a server supports a particular capability
> >
> > On question 1:
> >
> > VersionedProtocol [1] does two things.  It returns the server version of
> > the protocol and separately, a "ProtocolSignature" Writable which allows
> > you get a 'hash' of the server's protocol method signatures.   There is
> an
> > implication that the server will give out different versions of the
> > protocol dependent on what version the client volunteers (not the case)
> and
> > it is implied that the client does something with these method hash
> > signatures.  It doesn't.
> >
> > So, VP is a Writable that returns Writables we don't make use of
> implying a
> > functionality unrealized.
> >
> > Thats how I read it.  Objections? [3]
> >
> > It sounds like at least ProtocolSignature can go.  If we did want to go
> the
> > route ProtocolSignature implies, we should probably do the native
> protobuf
> > thing and make use of ServiceDescriptors, protobuf descriptions of what a
> > protobuf Service exposes [2].
> >
> > That leaves the VPs return of the server protocol version as all that
> > remains 'useful'.
> >
> > But is it? Is version going to be useful going forward?  If we lean on
> > version, clients will have to keep a registry of versions to available
> > methods.  Or ask the server what it has and somehow sort though the
> return
> > to figure what it can and cannot make sense of by method.  Sounds like a
> > bunch of work.
> >
> > At a minimum, VP will have to be protobuf'd so it is going to have to
> > change.  And we should probably add a bit more info to the return since
> we
> > are going to the trouble of an RPC anyways.
> >
> > This serves as a lead in to question 2:
> >
> > Protobuf as is helps in the case where an ipc takes an extra parameter or
> > adds extra info to the return; the majority of the evolutions that will
> be
> > happening in the ipc interface.  But what to do about the scenario
> Devaraj
> > outlines at the head of the thread where we have shipped a method that
> > causes the server to OOME in production or we add a method to the server
> > that runs ten times faster than the old one?  Or probably more likely,
> the
> > server has a whole new 'feature' (as Todd calls it) orthogonal to the set
> > the protocol version implies?  How does the client figure the new feature
> > is available?
> >
> > We could have the client try the invocation -- as Jimmy suggests -- and
> if
> > it fails, register the fail in a client-wide map so we avoid retrying on
> > each invocation (We should just do this anyways).  The client could go
> back
> > to the server and do the above suggested query of server capabilities and
> > then adjust the call accordingly or since we are doing an ipc setup call
> > anyways, we could have the server return the list of capabilities at this
> > time.  The client could cache what is available or not and just ask the
> > server when convenient for it.
> >
> > Using the bitmap shorthand describing what is available seems like it
> would
> > be less work to do than implementing protobuf service
> > description/interrogation and then dynamically composing method calls.
> >
> > Proposal:
> >
> > + Remove VersionedProtocol and SignatureProtocol
> > + Instead of VP, add a new Interface called VersionedService or probably
> > better, ProtocolDescriptor, that all RPC Protocols implement.  It has
> > methods (getDescriptor) to return a pb Message that has the server
> version
> > of the protocol and a bitmap of feature's the server implements.  This is
> > the call we will make when we set up the ipc proxy.  Clients can cache
> the
> > result.  Every time we change a Service/Protocol, we set a particular bit
> > in the Service/Protocol bitmap.  This new Interface might also return the
> > long form pb ServiceDescriptors (the pb getDescriptorForType from Service
> > Interface).  It could be useful debugging.
> >
> > What you lot think?
> >
> > St.Ack
> >
> > 1.
> >
> >
> http://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/VersionedProtocol.java?view=markup
> > 2.
> >
> >
> https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Service
> > 3. We have VP and PS because, as I understand it, we once that we would
> > support choosing between protocol and protocol versions and that we'd
> > support both protobufs and Writables.  This is no longer an wanted.
> >
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Aug 3, 2012 at 11:40 AM, Devaraj Das <dd...@hortonworks.com>
> wrote:
> >
> > > Responses inline..
> > >
> > > > On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon <to...@cloudera.com>
> > wrote:
> > > >> One possibility:
> > > >>
> > > >> During the IPC handshake, we could send the full version string /
> > > >> source checksum. Then, have a client-wide map which caches which
> > > >> methods have been found to be supported or not supported for an
> > > >> individual version. So, we don't need to maintain the mapping
> > > >> ourselves, but we also wouldn't need to do the full retry every
> time.
> > > >>
> > >
> > > Yeah this is what I was thinking as the alternate to the current
> approach
> > > of using VersionedProtocol.
> > >
> > > >> A different idea would be to introduce a call like
> > > >> "getServerCapabilities()" which returns a bitmap, and define a bit
> per
> > > >> time that we add a new feature.
> > > >>
> > > >> The advantage of these approaches vs a single increasing version
> > > >> number is that we sometimes want to backport a new IPC to an older
> > > >> version, but not backport all of the intervening IPCs. Having a
> bitmap
> > > >> allows us to "pick and choose" on backports without having to pull
> in
> > > >> a bunch of things we didn't necessarily want.
> > > >>
> > >
> > > Good point.
> > >
> > > >> On Wed, Aug 1, 2012 at 1:41 AM, Stack <st...@duboce.net> wrote:
> > > >>> On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das <ddas@hortonworks.com
> >
> > > wrote:
> > > >>>> Wondering whether we should retain the VersionedProtocol now that
> we
> > > have protobuf implementation for most (all?) of the protocols. I think
> we
> > > still need the version checks and do them when we need to. Take this
> > case:
> > > >>>> 1. Protocol Foo has as one of the methods
> > FooMethod(FooMethodRequest).
> > > >>>> 2. Protocol Foo evolves over time, and the
> > > FooMethod(FooMethodRequest) now has a better implementation called
> > > FooMethod_improved(FooMethodRequest).
> > > >>>> 3. HBase installations have happened with both the protocol
> > > implementations.
> > > >>>> 4. Clients should be able to talk to both old and new servers (and
> > > invoke the newer implementation of FooMethod if the protocol implements
> > it).
> > > >>>>
> > > >>>> (4) is possible when the getProtocolVersion is implemented by the
> > > protocol at the server. The client could check what the version of the
> > > protocol was (assuming VersionedProtocol semantics where the protocol
> > > version number is upgraded for such significant changes) and depending
> on
> > > that invoke the appropriate method...
> > > >>>>
> > > >>>> Having to map version-numbers of protocols to the
> methods-supported
> > > is probably arcane IMO but works..
> > > >>>>
> > > >>>> The other approach (that wouldn't require the version#) is to do
> > > something like - On the client side, get the protocol methods supported
> > at
> > > the server (and cache it) and then look this map up whenever needed to
> > > decide which method to invoke.
> > > >>>>
> > > >>>> Any thoughts on whether we should invest time in the second
> approach
> > > yet?
> > > >>>>
> > > >>>
> > > >>> The VersionedProtocol w/ client being able to interrogate what
> > methods
> > > >>> a server supports strikes me as a facility that will be rarely used
> > if
> > > >>> at all and bringing it along, keeping up the directory of supported
> > > >>> methods, will take a load of work on our part that we'll do less
> than
> > > >>> perfectly so should it ever be needed, it won't work because we let
> > it
> > > >>> go stale.
> > > >>>
> > >
> > > Yeah, this won't be a common case. It'd (hopefully) be rare. The
> > directory
> > > of methods would be the methods in the protocol-interface at the server
> > > that could be figured by invoking reflection (and hence staleness issue
> > > shouldn't happen).
> > >
> > > >>> What do you reckon?
> > > >>>
> > > >>> The above painted scenario too is a little on the exotic side.  We
> > can
> > > >>> do something like Jimmy suggests in those rare cases we need to
> add a
> > > >>> new method because there is insufficient wiggle-room w/i the
> > > >>> particular PB method call (If we get into the issue Ted raises
> where
> > > >>> we'd have to go back to the server twice because there is a third
> new
> > > >>> method call, we're doing our API wrong).
> > > >>>
> > >
> > > Agree that the exception handling hack can be played here.. In general,
> > > having some solution around this might be really helpful *if* we get
> some
> > > API wrong (for e.g., indirect implication on memory by the API
> semantics)
> > > and we need to fix it without breaking compatibility.. In HDFS,
> listFile
> > > proved to be a memory killer for extremely large directories and people
> > > implemented the iterator version of the same.
> > >
> > > >>> The protocol needs a version though.  We'll be still sending that
> > > >>> 'hrpc' long in the header preamble?  Should we add a version long
> > > >>> after the 'hrpc' long?
> > > >>>
> > >
> > > The version in "hrpc" is the RPC version (as opposed to protocol
> > version).
> > > I think that's orthogonal to this discussion..
> > >
> > > >>> As to a directory of supported methods, do we need this in the
> > > >>> protocol at all?  Can't this be knowledge kept outside of the
> > > >>> on-the-wire back and forth?
> > > >>>
> > > >>> St.Ack
> > > >>
> > > >>
> > >
> > > As I answered above, and as Todd also says, it probably makes sense to
> > > have a client wide cache for protocol<->supported-methods .. and look
> up
> > > the cache when and if the client needs to decide between different
> > versions
> > > of a method, or picking a new method, based on the server it is talking
> > > to...
> >
>

Re: Handling protocol versions

Posted by Jimmy Xiang <jx...@cloudera.com>.
+1 for removing VersionedProtocol and SignatureProtocol
+0 for VersionedService/ProtocolDescriptor

If we do have VersionedService/ProtocolDesscriptor, it will most likely be
used in some
mixed environment (most likely, new client and mixed versions of HBase
servers, since old client doesn't
know any new feature, old client doesn't assume an existing feature will be
gone in the future either).

With PB,  I think we are going to support a rolling-upgrade path.  That
means, some mixed
versions of HBase servers can be compatible. For enterprise, I think it is
not that hard to
maintain compatible HBase clusters.  So I don't think it is absolutely
needed.

Thanks,
Jimmy

On Thu, Dec 27, 2012 at 12:05 PM, Stack <st...@duboce.net> wrote:

> So, picking up this thread again because I'm working on
> https://issues.apache.org/jira/browse/HBASE-6521 "
> Address the handling of multiple versions of a protocol"Address the
> handling of multiple versions of a protocol", the original question was
>  two-fold as I read it.
>
> 1. Should we keep VersionedProtocol.
> 2. How does a client figure if a server supports a particular capability
>
> On question 1:
>
> VersionedProtocol [1] does two things.  It returns the server version of
> the protocol and separately, a "ProtocolSignature" Writable which allows
> you get a 'hash' of the server's protocol method signatures.   There is an
> implication that the server will give out different versions of the
> protocol dependent on what version the client volunteers (not the case) and
> it is implied that the client does something with these method hash
> signatures.  It doesn't.
>
> So, VP is a Writable that returns Writables we don't make use of implying a
> functionality unrealized.
>
> Thats how I read it.  Objections? [3]
>
> It sounds like at least ProtocolSignature can go.  If we did want to go the
> route ProtocolSignature implies, we should probably do the native protobuf
> thing and make use of ServiceDescriptors, protobuf descriptions of what a
> protobuf Service exposes [2].
>
> That leaves the VPs return of the server protocol version as all that
> remains 'useful'.
>
> But is it? Is version going to be useful going forward?  If we lean on
> version, clients will have to keep a registry of versions to available
> methods.  Or ask the server what it has and somehow sort though the return
> to figure what it can and cannot make sense of by method.  Sounds like a
> bunch of work.
>
> At a minimum, VP will have to be protobuf'd so it is going to have to
> change.  And we should probably add a bit more info to the return since we
> are going to the trouble of an RPC anyways.
>
> This serves as a lead in to question 2:
>
> Protobuf as is helps in the case where an ipc takes an extra parameter or
> adds extra info to the return; the majority of the evolutions that will be
> happening in the ipc interface.  But what to do about the scenario Devaraj
> outlines at the head of the thread where we have shipped a method that
> causes the server to OOME in production or we add a method to the server
> that runs ten times faster than the old one?  Or probably more likely, the
> server has a whole new 'feature' (as Todd calls it) orthogonal to the set
> the protocol version implies?  How does the client figure the new feature
> is available?
>
> We could have the client try the invocation -- as Jimmy suggests -- and if
> it fails, register the fail in a client-wide map so we avoid retrying on
> each invocation (We should just do this anyways).  The client could go back
> to the server and do the above suggested query of server capabilities and
> then adjust the call accordingly or since we are doing an ipc setup call
> anyways, we could have the server return the list of capabilities at this
> time.  The client could cache what is available or not and just ask the
> server when convenient for it.
>
> Using the bitmap shorthand describing what is available seems like it would
> be less work to do than implementing protobuf service
> description/interrogation and then dynamically composing method calls.
>
> Proposal:
>
> + Remove VersionedProtocol and SignatureProtocol
> + Instead of VP, add a new Interface called VersionedService or probably
> better, ProtocolDescriptor, that all RPC Protocols implement.  It has
> methods (getDescriptor) to return a pb Message that has the server version
> of the protocol and a bitmap of feature's the server implements.  This is
> the call we will make when we set up the ipc proxy.  Clients can cache the
> result.  Every time we change a Service/Protocol, we set a particular bit
> in the Service/Protocol bitmap.  This new Interface might also return the
> long form pb ServiceDescriptors (the pb getDescriptorForType from Service
> Interface).  It could be useful debugging.
>
> What you lot think?
>
> St.Ack
>
> 1.
>
> http://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/VersionedProtocol.java?view=markup
> 2.
>
> https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Service
> 3. We have VP and PS because, as I understand it, we once that we would
> support choosing between protocol and protocol versions and that we'd
> support both protobufs and Writables.  This is no longer an wanted.
>
>
>
>
>
>
>
>
> On Fri, Aug 3, 2012 at 11:40 AM, Devaraj Das <dd...@hortonworks.com> wrote:
>
> > Responses inline..
> >
> > > On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >> One possibility:
> > >>
> > >> During the IPC handshake, we could send the full version string /
> > >> source checksum. Then, have a client-wide map which caches which
> > >> methods have been found to be supported or not supported for an
> > >> individual version. So, we don't need to maintain the mapping
> > >> ourselves, but we also wouldn't need to do the full retry every time.
> > >>
> >
> > Yeah this is what I was thinking as the alternate to the current approach
> > of using VersionedProtocol.
> >
> > >> A different idea would be to introduce a call like
> > >> "getServerCapabilities()" which returns a bitmap, and define a bit per
> > >> time that we add a new feature.
> > >>
> > >> The advantage of these approaches vs a single increasing version
> > >> number is that we sometimes want to backport a new IPC to an older
> > >> version, but not backport all of the intervening IPCs. Having a bitmap
> > >> allows us to "pick and choose" on backports without having to pull in
> > >> a bunch of things we didn't necessarily want.
> > >>
> >
> > Good point.
> >
> > >> On Wed, Aug 1, 2012 at 1:41 AM, Stack <st...@duboce.net> wrote:
> > >>> On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das <dd...@hortonworks.com>
> > wrote:
> > >>>> Wondering whether we should retain the VersionedProtocol now that we
> > have protobuf implementation for most (all?) of the protocols. I think we
> > still need the version checks and do them when we need to. Take this
> case:
> > >>>> 1. Protocol Foo has as one of the methods
> FooMethod(FooMethodRequest).
> > >>>> 2. Protocol Foo evolves over time, and the
> > FooMethod(FooMethodRequest) now has a better implementation called
> > FooMethod_improved(FooMethodRequest).
> > >>>> 3. HBase installations have happened with both the protocol
> > implementations.
> > >>>> 4. Clients should be able to talk to both old and new servers (and
> > invoke the newer implementation of FooMethod if the protocol implements
> it).
> > >>>>
> > >>>> (4) is possible when the getProtocolVersion is implemented by the
> > protocol at the server. The client could check what the version of the
> > protocol was (assuming VersionedProtocol semantics where the protocol
> > version number is upgraded for such significant changes) and depending on
> > that invoke the appropriate method...
> > >>>>
> > >>>> Having to map version-numbers of protocols to the methods-supported
> > is probably arcane IMO but works..
> > >>>>
> > >>>> The other approach (that wouldn't require the version#) is to do
> > something like - On the client side, get the protocol methods supported
> at
> > the server (and cache it) and then look this map up whenever needed to
> > decide which method to invoke.
> > >>>>
> > >>>> Any thoughts on whether we should invest time in the second approach
> > yet?
> > >>>>
> > >>>
> > >>> The VersionedProtocol w/ client being able to interrogate what
> methods
> > >>> a server supports strikes me as a facility that will be rarely used
> if
> > >>> at all and bringing it along, keeping up the directory of supported
> > >>> methods, will take a load of work on our part that we'll do less than
> > >>> perfectly so should it ever be needed, it won't work because we let
> it
> > >>> go stale.
> > >>>
> >
> > Yeah, this won't be a common case. It'd (hopefully) be rare. The
> directory
> > of methods would be the methods in the protocol-interface at the server
> > that could be figured by invoking reflection (and hence staleness issue
> > shouldn't happen).
> >
> > >>> What do you reckon?
> > >>>
> > >>> The above painted scenario too is a little on the exotic side.  We
> can
> > >>> do something like Jimmy suggests in those rare cases we need to add a
> > >>> new method because there is insufficient wiggle-room w/i the
> > >>> particular PB method call (If we get into the issue Ted raises where
> > >>> we'd have to go back to the server twice because there is a third new
> > >>> method call, we're doing our API wrong).
> > >>>
> >
> > Agree that the exception handling hack can be played here.. In general,
> > having some solution around this might be really helpful *if* we get some
> > API wrong (for e.g., indirect implication on memory by the API semantics)
> > and we need to fix it without breaking compatibility.. In HDFS, listFile
> > proved to be a memory killer for extremely large directories and people
> > implemented the iterator version of the same.
> >
> > >>> The protocol needs a version though.  We'll be still sending that
> > >>> 'hrpc' long in the header preamble?  Should we add a version long
> > >>> after the 'hrpc' long?
> > >>>
> >
> > The version in "hrpc" is the RPC version (as opposed to protocol
> version).
> > I think that's orthogonal to this discussion..
> >
> > >>> As to a directory of supported methods, do we need this in the
> > >>> protocol at all?  Can't this be knowledge kept outside of the
> > >>> on-the-wire back and forth?
> > >>>
> > >>> St.Ack
> > >>
> > >>
> >
> > As I answered above, and as Todd also says, it probably makes sense to
> > have a client wide cache for protocol<->supported-methods .. and look up
> > the cache when and if the client needs to decide between different
> versions
> > of a method, or picking a new method, based on the server it is talking
> > to...
>

Re: Handling protocol versions

Posted by Stack <st...@duboce.net>.
So, picking up this thread again because I'm working on
https://issues.apache.org/jira/browse/HBASE-6521 "
Address the handling of multiple versions of a protocol"Address the
handling of multiple versions of a protocol", the original question was
 two-fold as I read it.

1. Should we keep VersionedProtocol.
2. How does a client figure if a server supports a particular capability

On question 1:

VersionedProtocol [1] does two things.  It returns the server version of
the protocol and separately, a "ProtocolSignature" Writable which allows
you get a 'hash' of the server's protocol method signatures.   There is an
implication that the server will give out different versions of the
protocol dependent on what version the client volunteers (not the case) and
it is implied that the client does something with these method hash
signatures.  It doesn't.

So, VP is a Writable that returns Writables we don't make use of implying a
functionality unrealized.

Thats how I read it.  Objections? [3]

It sounds like at least ProtocolSignature can go.  If we did want to go the
route ProtocolSignature implies, we should probably do the native protobuf
thing and make use of ServiceDescriptors, protobuf descriptions of what a
protobuf Service exposes [2].

That leaves the VPs return of the server protocol version as all that
remains 'useful'.

But is it? Is version going to be useful going forward?  If we lean on
version, clients will have to keep a registry of versions to available
methods.  Or ask the server what it has and somehow sort though the return
to figure what it can and cannot make sense of by method.  Sounds like a
bunch of work.

At a minimum, VP will have to be protobuf'd so it is going to have to
change.  And we should probably add a bit more info to the return since we
are going to the trouble of an RPC anyways.

This serves as a lead in to question 2:

Protobuf as is helps in the case where an ipc takes an extra parameter or
adds extra info to the return; the majority of the evolutions that will be
happening in the ipc interface.  But what to do about the scenario Devaraj
outlines at the head of the thread where we have shipped a method that
causes the server to OOME in production or we add a method to the server
that runs ten times faster than the old one?  Or probably more likely, the
server has a whole new 'feature' (as Todd calls it) orthogonal to the set
the protocol version implies?  How does the client figure the new feature
is available?

We could have the client try the invocation -- as Jimmy suggests -- and if
it fails, register the fail in a client-wide map so we avoid retrying on
each invocation (We should just do this anyways).  The client could go back
to the server and do the above suggested query of server capabilities and
then adjust the call accordingly or since we are doing an ipc setup call
anyways, we could have the server return the list of capabilities at this
time.  The client could cache what is available or not and just ask the
server when convenient for it.

Using the bitmap shorthand describing what is available seems like it would
be less work to do than implementing protobuf service
description/interrogation and then dynamically composing method calls.

Proposal:

+ Remove VersionedProtocol and SignatureProtocol
+ Instead of VP, add a new Interface called VersionedService or probably
better, ProtocolDescriptor, that all RPC Protocols implement.  It has
methods (getDescriptor) to return a pb Message that has the server version
of the protocol and a bitmap of feature's the server implements.  This is
the call we will make when we set up the ipc proxy.  Clients can cache the
result.  Every time we change a Service/Protocol, we set a particular bit
in the Service/Protocol bitmap.  This new Interface might also return the
long form pb ServiceDescriptors (the pb getDescriptorForType from Service
Interface).  It could be useful debugging.

What you lot think?

St.Ack

1.
http://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/VersionedProtocol.java?view=markup
2.
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Service
3. We have VP and PS because, as I understand it, we once that we would
support choosing between protocol and protocol versions and that we'd
support both protobufs and Writables.  This is no longer an wanted.








On Fri, Aug 3, 2012 at 11:40 AM, Devaraj Das <dd...@hortonworks.com> wrote:

> Responses inline..
>
> > On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon <to...@cloudera.com> wrote:
> >> One possibility:
> >>
> >> During the IPC handshake, we could send the full version string /
> >> source checksum. Then, have a client-wide map which caches which
> >> methods have been found to be supported or not supported for an
> >> individual version. So, we don't need to maintain the mapping
> >> ourselves, but we also wouldn't need to do the full retry every time.
> >>
>
> Yeah this is what I was thinking as the alternate to the current approach
> of using VersionedProtocol.
>
> >> A different idea would be to introduce a call like
> >> "getServerCapabilities()" which returns a bitmap, and define a bit per
> >> time that we add a new feature.
> >>
> >> The advantage of these approaches vs a single increasing version
> >> number is that we sometimes want to backport a new IPC to an older
> >> version, but not backport all of the intervening IPCs. Having a bitmap
> >> allows us to "pick and choose" on backports without having to pull in
> >> a bunch of things we didn't necessarily want.
> >>
>
> Good point.
>
> >> On Wed, Aug 1, 2012 at 1:41 AM, Stack <st...@duboce.net> wrote:
> >>> On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das <dd...@hortonworks.com>
> wrote:
> >>>> Wondering whether we should retain the VersionedProtocol now that we
> have protobuf implementation for most (all?) of the protocols. I think we
> still need the version checks and do them when we need to. Take this case:
> >>>> 1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
> >>>> 2. Protocol Foo evolves over time, and the
> FooMethod(FooMethodRequest) now has a better implementation called
> FooMethod_improved(FooMethodRequest).
> >>>> 3. HBase installations have happened with both the protocol
> implementations.
> >>>> 4. Clients should be able to talk to both old and new servers (and
> invoke the newer implementation of FooMethod if the protocol implements it).
> >>>>
> >>>> (4) is possible when the getProtocolVersion is implemented by the
> protocol at the server. The client could check what the version of the
> protocol was (assuming VersionedProtocol semantics where the protocol
> version number is upgraded for such significant changes) and depending on
> that invoke the appropriate method...
> >>>>
> >>>> Having to map version-numbers of protocols to the methods-supported
> is probably arcane IMO but works..
> >>>>
> >>>> The other approach (that wouldn't require the version#) is to do
> something like - On the client side, get the protocol methods supported at
> the server (and cache it) and then look this map up whenever needed to
> decide which method to invoke.
> >>>>
> >>>> Any thoughts on whether we should invest time in the second approach
> yet?
> >>>>
> >>>
> >>> The VersionedProtocol w/ client being able to interrogate what methods
> >>> a server supports strikes me as a facility that will be rarely used if
> >>> at all and bringing it along, keeping up the directory of supported
> >>> methods, will take a load of work on our part that we'll do less than
> >>> perfectly so should it ever be needed, it won't work because we let it
> >>> go stale.
> >>>
>
> Yeah, this won't be a common case. It'd (hopefully) be rare. The directory
> of methods would be the methods in the protocol-interface at the server
> that could be figured by invoking reflection (and hence staleness issue
> shouldn't happen).
>
> >>> What do you reckon?
> >>>
> >>> The above painted scenario too is a little on the exotic side.  We can
> >>> do something like Jimmy suggests in those rare cases we need to add a
> >>> new method because there is insufficient wiggle-room w/i the
> >>> particular PB method call (If we get into the issue Ted raises where
> >>> we'd have to go back to the server twice because there is a third new
> >>> method call, we're doing our API wrong).
> >>>
>
> Agree that the exception handling hack can be played here.. In general,
> having some solution around this might be really helpful *if* we get some
> API wrong (for e.g., indirect implication on memory by the API semantics)
> and we need to fix it without breaking compatibility.. In HDFS, listFile
> proved to be a memory killer for extremely large directories and people
> implemented the iterator version of the same.
>
> >>> The protocol needs a version though.  We'll be still sending that
> >>> 'hrpc' long in the header preamble?  Should we add a version long
> >>> after the 'hrpc' long?
> >>>
>
> The version in "hrpc" is the RPC version (as opposed to protocol version).
> I think that's orthogonal to this discussion..
>
> >>> As to a directory of supported methods, do we need this in the
> >>> protocol at all?  Can't this be knowledge kept outside of the
> >>> on-the-wire back and forth?
> >>>
> >>> St.Ack
> >>
> >>
>
> As I answered above, and as Todd also says, it probably makes sense to
> have a client wide cache for protocol<->supported-methods .. and look up
> the cache when and if the client needs to decide between different versions
> of a method, or picking a new method, based on the server it is talking
> to...

Re: Handling protocol versions

Posted by Devaraj Das <dd...@hortonworks.com>.
Responses inline..

> On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon <to...@cloudera.com> wrote:
>> One possibility:
>> 
>> During the IPC handshake, we could send the full version string /
>> source checksum. Then, have a client-wide map which caches which
>> methods have been found to be supported or not supported for an
>> individual version. So, we don't need to maintain the mapping
>> ourselves, but we also wouldn't need to do the full retry every time.
>> 

Yeah this is what I was thinking as the alternate to the current approach of using VersionedProtocol.

>> A different idea would be to introduce a call like
>> "getServerCapabilities()" which returns a bitmap, and define a bit per
>> time that we add a new feature.
>> 
>> The advantage of these approaches vs a single increasing version
>> number is that we sometimes want to backport a new IPC to an older
>> version, but not backport all of the intervening IPCs. Having a bitmap
>> allows us to "pick and choose" on backports without having to pull in
>> a bunch of things we didn't necessarily want.
>> 

Good point.

>> On Wed, Aug 1, 2012 at 1:41 AM, Stack <st...@duboce.net> wrote:
>>> On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das <dd...@hortonworks.com> wrote:
>>>> Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
>>>> 1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
>>>> 2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
>>>> 3. HBase installations have happened with both the protocol implementations.
>>>> 4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).
>>>> 
>>>> (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...
>>>> 
>>>> Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..
>>>> 
>>>> The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.
>>>> 
>>>> Any thoughts on whether we should invest time in the second approach yet?
>>>> 
>>> 
>>> The VersionedProtocol w/ client being able to interrogate what methods
>>> a server supports strikes me as a facility that will be rarely used if
>>> at all and bringing it along, keeping up the directory of supported
>>> methods, will take a load of work on our part that we'll do less than
>>> perfectly so should it ever be needed, it won't work because we let it
>>> go stale.
>>> 

Yeah, this won't be a common case. It'd (hopefully) be rare. The directory of methods would be the methods in the protocol-interface at the server that could be figured by invoking reflection (and hence staleness issue shouldn't happen). 

>>> What do you reckon?
>>> 
>>> The above painted scenario too is a little on the exotic side.  We can
>>> do something like Jimmy suggests in those rare cases we need to add a
>>> new method because there is insufficient wiggle-room w/i the
>>> particular PB method call (If we get into the issue Ted raises where
>>> we'd have to go back to the server twice because there is a third new
>>> method call, we're doing our API wrong).
>>> 

Agree that the exception handling hack can be played here.. In general, having some solution around this might be really helpful *if* we get some API wrong (for e.g., indirect implication on memory by the API semantics) and we need to fix it without breaking compatibility.. In HDFS, listFile proved to be a memory killer for extremely large directories and people implemented the iterator version of the same.

>>> The protocol needs a version though.  We'll be still sending that
>>> 'hrpc' long in the header preamble?  Should we add a version long
>>> after the 'hrpc' long?
>>> 

The version in "hrpc" is the RPC version (as opposed to protocol version). I think that's orthogonal to this discussion..

>>> As to a directory of supported methods, do we need this in the
>>> protocol at all?  Can't this be knowledge kept outside of the
>>> on-the-wire back and forth?
>>> 
>>> St.Ack
>> 
>> 

As I answered above, and as Todd also says, it probably makes sense to have a client wide cache for protocol<->supported-methods .. and look up the cache when and if the client needs to decide between different versions of a method, or picking a new method, based on the server it is talking to...

Re: Handling protocol versions

Posted by Andrew Purtell <ap...@apache.org>.
I like the idea of "getServerCapabilities()" as a bitset.

   - Andy

On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon <to...@cloudera.com> wrote:
> One possibility:
>
> During the IPC handshake, we could send the full version string /
> source checksum. Then, have a client-wide map which caches which
> methods have been found to be supported or not supported for an
> individual version. So, we don't need to maintain the mapping
> ourselves, but we also wouldn't need to do the full retry every time.
>
> A different idea would be to introduce a call like
> "getServerCapabilities()" which returns a bitmap, and define a bit per
> time that we add a new feature.
>
> The advantage of these approaches vs a single increasing version
> number is that we sometimes want to backport a new IPC to an older
> version, but not backport all of the intervening IPCs. Having a bitmap
> allows us to "pick and choose" on backports without having to pull in
> a bunch of things we didn't necessarily want.
>
> -Todd
>
> On Wed, Aug 1, 2012 at 1:41 AM, Stack <st...@duboce.net> wrote:
>> On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das <dd...@hortonworks.com> wrote:
>>> Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
>>> 1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
>>> 2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
>>> 3. HBase installations have happened with both the protocol implementations.
>>> 4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).
>>>
>>> (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...
>>>
>>> Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..
>>>
>>> The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.
>>>
>>> Any thoughts on whether we should invest time in the second approach yet?
>>>
>>
>> The VersionedProtocol w/ client being able to interrogate what methods
>> a server supports strikes me as a facility that will be rarely used if
>> at all and bringing it along, keeping up the directory of supported
>> methods, will take a load of work on our part that we'll do less than
>> perfectly so should it ever be needed, it won't work because we let it
>> go stale.
>>
>> What do you reckon?
>>
>> The above painted scenario too is a little on the exotic side.  We can
>> do something like Jimmy suggests in those rare cases we need to add a
>> new method because there is insufficient wiggle-room w/i the
>> particular PB method call (If we get into the issue Ted raises where
>> we'd have to go back to the server twice because there is a third new
>> method call, we're doing our API wrong).
>>
>> The protocol needs a version though.  We'll be still sending that
>> 'hrpc' long in the header preamble?  Should we add a version long
>> after the 'hrpc' long?
>>
>> As to a directory of supported methods, do we need this in the
>> protocol at all?  Can't this be knowledge kept outside of the
>> on-the-wire back and forth?
>>
>> St.Ack
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Re: Handling protocol versions

Posted by Todd Lipcon <to...@cloudera.com>.
One possibility:

During the IPC handshake, we could send the full version string /
source checksum. Then, have a client-wide map which caches which
methods have been found to be supported or not supported for an
individual version. So, we don't need to maintain the mapping
ourselves, but we also wouldn't need to do the full retry every time.

A different idea would be to introduce a call like
"getServerCapabilities()" which returns a bitmap, and define a bit per
time that we add a new feature.

The advantage of these approaches vs a single increasing version
number is that we sometimes want to backport a new IPC to an older
version, but not backport all of the intervening IPCs. Having a bitmap
allows us to "pick and choose" on backports without having to pull in
a bunch of things we didn't necessarily want.

-Todd

On Wed, Aug 1, 2012 at 1:41 AM, Stack <st...@duboce.net> wrote:
> On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das <dd...@hortonworks.com> wrote:
>> Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
>> 1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
>> 2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
>> 3. HBase installations have happened with both the protocol implementations.
>> 4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).
>>
>> (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...
>>
>> Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..
>>
>> The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.
>>
>> Any thoughts on whether we should invest time in the second approach yet?
>>
>
> The VersionedProtocol w/ client being able to interrogate what methods
> a server supports strikes me as a facility that will be rarely used if
> at all and bringing it along, keeping up the directory of supported
> methods, will take a load of work on our part that we'll do less than
> perfectly so should it ever be needed, it won't work because we let it
> go stale.
>
> What do you reckon?
>
> The above painted scenario too is a little on the exotic side.  We can
> do something like Jimmy suggests in those rare cases we need to add a
> new method because there is insufficient wiggle-room w/i the
> particular PB method call (If we get into the issue Ted raises where
> we'd have to go back to the server twice because there is a third new
> method call, we're doing our API wrong).
>
> The protocol needs a version though.  We'll be still sending that
> 'hrpc' long in the header preamble?  Should we add a version long
> after the 'hrpc' long?
>
> As to a directory of supported methods, do we need this in the
> protocol at all?  Can't this be knowledge kept outside of the
> on-the-wire back and forth?
>
> St.Ack



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Handling protocol versions

Posted by Stack <st...@duboce.net>.
On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das <dd...@hortonworks.com> wrote:
> Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
> 1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
> 2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
> 3. HBase installations have happened with both the protocol implementations.
> 4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).
>
> (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...
>
> Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..
>
> The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.
>
> Any thoughts on whether we should invest time in the second approach yet?
>

The VersionedProtocol w/ client being able to interrogate what methods
a server supports strikes me as a facility that will be rarely used if
at all and bringing it along, keeping up the directory of supported
methods, will take a load of work on our part that we'll do less than
perfectly so should it ever be needed, it won't work because we let it
go stale.

What do you reckon?

The above painted scenario too is a little on the exotic side.  We can
do something like Jimmy suggests in those rare cases we need to add a
new method because there is insufficient wiggle-room w/i the
particular PB method call (If we get into the issue Ted raises where
we'd have to go back to the server twice because there is a third new
method call, we're doing our API wrong).

The protocol needs a version though.  We'll be still sending that
'hrpc' long in the header preamble?  Should we add a version long
after the 'hrpc' long?

As to a directory of supported methods, do we need this in the
protocol at all?  Can't this be knowledge kept outside of the
on-the-wire back and forth?

St.Ack

Re: Handling protocol versions

Posted by Ted Yu <yu...@gmail.com>.
I looked at TestMultipleProtocolServer.java from hadoop trunk.
It illustrates how VersionedProtocol is used for client to talk to servers
running various versioned protocols.

FYI

On Mon, Jul 30, 2012 at 8:11 PM, Ted Yu <yu...@gmail.com> wrote:

> If v3 of the method emerges, we might need to retry twice, right ?
>
> Cheers
>
>
> On Mon, Jul 30, 2012 at 8:09 PM, Jimmy Xiang <jx...@cloudera.com> wrote:
>
>> Another approach is to use the new call at first.  If got some
>> exception like unknown method, then fall back to the old method.
>>
>> Thanks,
>> Jimmy
>>
>> On Mon, Jul 30, 2012 at 5:47 PM, Devaraj Das <dd...@hortonworks.com>
>> wrote:
>> > Wondering whether we should retain the VersionedProtocol now that we
>> have protobuf implementation for most (all?) of the protocols. I think we
>> still need the version checks and do them when we need to. Take this case:
>> > 1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
>> > 2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest)
>> now has a better implementation called FooMethod_improved(FooMethodRequest).
>> > 3. HBase installations have happened with both the protocol
>> implementations.
>> > 4. Clients should be able to talk to both old and new servers (and
>> invoke the newer implementation of FooMethod if the protocol implements it).
>> >
>> > (4) is possible when the getProtocolVersion is implemented by the
>> protocol at the server. The client could check what the version of the
>> protocol was (assuming VersionedProtocol semantics where the protocol
>> version number is upgraded for such significant changes) and depending on
>> that invoke the appropriate method...
>> >
>> > Having to map version-numbers of protocols to the methods-supported is
>> probably arcane IMO but works..
>> >
>> > The other approach (that wouldn't require the version#) is to do
>> something like - On the client side, get the protocol methods supported at
>> the server (and cache it) and then look this map up whenever needed to
>> decide which method to invoke.
>> >
>> > Any thoughts on whether we should invest time in the second approach
>> yet?
>> >
>> > Thanks,
>> > Devaraj.
>>
>
>

Re: Handling protocol versions

Posted by Ted Yu <yu...@gmail.com>.
If v3 of the method emerges, we might need to retry twice, right ?

Cheers

On Mon, Jul 30, 2012 at 8:09 PM, Jimmy Xiang <jx...@cloudera.com> wrote:

> Another approach is to use the new call at first.  If got some
> exception like unknown method, then fall back to the old method.
>
> Thanks,
> Jimmy
>
> On Mon, Jul 30, 2012 at 5:47 PM, Devaraj Das <dd...@hortonworks.com> wrote:
> > Wondering whether we should retain the VersionedProtocol now that we
> have protobuf implementation for most (all?) of the protocols. I think we
> still need the version checks and do them when we need to. Take this case:
> > 1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
> > 2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest)
> now has a better implementation called FooMethod_improved(FooMethodRequest).
> > 3. HBase installations have happened with both the protocol
> implementations.
> > 4. Clients should be able to talk to both old and new servers (and
> invoke the newer implementation of FooMethod if the protocol implements it).
> >
> > (4) is possible when the getProtocolVersion is implemented by the
> protocol at the server. The client could check what the version of the
> protocol was (assuming VersionedProtocol semantics where the protocol
> version number is upgraded for such significant changes) and depending on
> that invoke the appropriate method...
> >
> > Having to map version-numbers of protocols to the methods-supported is
> probably arcane IMO but works..
> >
> > The other approach (that wouldn't require the version#) is to do
> something like - On the client side, get the protocol methods supported at
> the server (and cache it) and then look this map up whenever needed to
> decide which method to invoke.
> >
> > Any thoughts on whether we should invest time in the second approach yet?
> >
> > Thanks,
> > Devaraj.
>

Re: Handling protocol versions

Posted by Jimmy Xiang <jx...@cloudera.com>.
Another approach is to use the new call at first.  If got some
exception like unknown method, then fall back to the old method.

Thanks,
Jimmy

On Mon, Jul 30, 2012 at 5:47 PM, Devaraj Das <dd...@hortonworks.com> wrote:
> Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
> 1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
> 2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
> 3. HBase installations have happened with both the protocol implementations.
> 4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).
>
> (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...
>
> Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..
>
> The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.
>
> Any thoughts on whether we should invest time in the second approach yet?
>
> Thanks,
> Devaraj.