You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Sanjay Radia <sr...@yahoo-inc.com> on 2008/10/08 19:26:39 UTC
Re: RPC versioning - oops sorry

My filter was saving this thread in my "jira bucket" and I had missed  
this thread.
I asked a few questions on the hadoop requirements page earlier today  
that you have addressed or are addressing in this thread. Sorry.

sanjay



On Oct 3, 2008, at 9:37 AM, Doug Cutting wrote:

> It has been proposed in the discussions defining Hadoop 1.0 that we
> extend our back-compatibility policy.
>
> http://wiki.apache.org/hadoop/Release1.0Requirements
>
> Currently we only attempt to promise that application code will run
> without change against compatible versions of Hadoop.  If one has
> clusters running different yet compatible versions, then one must  
> use a
> different classpath for each cluster to pick up the appropriate  
> version
> of Hadoop's client libraries.
>
> The proposal is that we extend this, so that a client library from one
> version of Hadoop will operate correctly with other compatible Hadoop
> versions, i.e., one need not alter one's classpath to contain the
> identical version, only a compatible version.
>
> Question 1: Do we need to solve this problem soon, for release 1.0,
> i.e., in order to provide a release whose compatibility lifetime is ~1
> year, instead of the ~4months of 0. releases?  This is not clear to  
> me.
>   Can someone provide cases where using the same classpath when  
> talking
> to multiple clusters is critical?
>
> Assuming it is, to implement this requires RPC-level support for
> versioning.  We could add this by switching to an RPC mechanism with
> built-in, automatic versioning support, like Thrift, Etch or Protocol
> Buffers.  But none of these is a drop-in replacement for Hadoop RPC.
> They will probably not initially meet our performance and scalability
> requirements.  Their adoption will also require considerable and
> destabilizing changes to Hadoop.  Finally, it is not today clear which
> of these would be the best candidate.  If we move too soon, we might
> regret our choice and wish to move again later.
>
> So, if we answer yes to (1) above, wishing to provide RPC
> back-compatibility in 1.0, but do not want to hold up a 1.0 release,  
> is
> there an alternative to switching?  Can we provide incremental
> versioning support to Hadoop's existing RPC mechanism that will  
> suffice
> until a clear replacement is available?
>
> Below I suggest a simple versioning style that Hadoop might use to
> permit its RPC protocols to evolve compatibly until an RPC system with
> built-in versioning support is selected.  This is not intended to be a
> long-term solution, but rather something that would permit us to more
> flexibly evolve Hadoop's protocols over the next year or so.
>
> This style assumes a globally increasing Hadoop version number.  For
> example, this might be the subversion repository version of trunk when
> a change is first introduced.
>
> When an RPC client and server handshake, they exchange version
> numbers.  The lower of their two version numbers is selected as the
> version for the connection.
>
> Let's walk through an example.  We start with a class that contains
> no versioning information and a single field, 'a':
>
>    public class Foo implements Writable {
>      int a;
>
>      public void write(DataOutput out) throws IOException {
>        out.writeInt(a);
>      }
>
>      public void readFields(DataInput in) throws IOException {
>        a = in.readInt();
>      }
>    }
>
> Now, in version 1, we add a second field, 'b' to this:
>
>    public class Foo implements Writable {
>      int a;
>      float b;                                        // new field
>
>      public void write(DataOutput out) throws IOException {
>        int version = RPC.getVersion(out);
>        out.writeInt(a);
>        if (version >= 1) {                           // peer  
> supports b
>         out.writeFloat(b);                          // send it
>        }
>      }
>
>      public void readFields(DataInput in) throws IOException {
>        int version = RPC.getVersion(in);
>        a = in.readInt();
>        if (version >= 1) {                           // if supports b
>         b = in.readFloat();                         // read it
>        }
>      }
>    }
>
> Next, in version 2, we remove the first field, 'a':
>
>    public class Foo implements Writable {
>      float b;
>
>      public void write(DataOutput out) throws IOException {
>        int version = RPC.getVersion(out);
>        if (version < 2) {                            // peer wants a
>         out.writeInt(0);                            // send it
>        }
>        if (version >= 1) {
>         out.writeFloat(b);
>        }
>      }
>
>      public void readFields(DataInput in) throws IOException {
>        int version = RPC.getVersion(in);
>        if (version < 2) {                            // peer writes a
>         in.readInt();                               // ignore it
>        }
>        if (version >= 1) {
>         b = in.readFloat();
>        }
>      }
>    }
>
> Could something like this work?  It would require just some minor
> changes to Hadoop's RPC mechanism, to support the version handshake.
> Beyond that, it could be implemented incrementally as RPC protocols
> evolve.  It would require some vigilance, to make sure that versioning
> logic is added when classes change, but adding automated tests against
> prior versions would identify lapses  here.
>
> This may appear to add a lot of version-related logic, but with
> automatic versioning, in many cases, some version-related logic is  
> still
> required.  In simple cases, one adds a completely new field with a
> default value and is done, with automatic versioning handling much of
> the work.  But in many other cases an existing field is changed and  
> the
> application must translate old values to new, and vice versa.  These
> cases still require application logic, even with automatic versioning.
> So automatic versioning is certainly less intrusive, but not as much  
> as
> one might first assume.
>
> The fundamental question is how soon we need to address inter-version
> RPC compatibility.  If we wish to do it soon, I think we'd be wise to
> consider a solution that's less invasive and that does not force us  
> into
> a potentially premature decision.
>
> Doug
>