You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Sanjay Radia <sr...@yahoo-inc.com> on 2008/10/08 19:26:39 UTC
Re: RPC versioning - oops sorry
My filter was saving this thread in my "jira bucket" and I had missed
this thread.
I asked a few questions on the hadoop requirements page earlier today
that you have addressed or are addressing in this thread. Sorry.
sanjay
On Oct 3, 2008, at 9:37 AM, Doug Cutting wrote:
> It has been proposed in the discussions defining Hadoop 1.0 that we
> extend our back-compatibility policy.
>
> http://wiki.apache.org/hadoop/Release1.0Requirements
>
> Currently we only attempt to promise that application code will run
> without change against compatible versions of Hadoop. If one has
> clusters running different yet compatible versions, then one must
> use a
> different classpath for each cluster to pick up the appropriate
> version
> of Hadoop's client libraries.
>
> The proposal is that we extend this, so that a client library from one
> version of Hadoop will operate correctly with other compatible Hadoop
> versions, i.e., one need not alter one's classpath to contain the
> identical version, only a compatible version.
>
> Question 1: Do we need to solve this problem soon, for release 1.0,
> i.e., in order to provide a release whose compatibility lifetime is ~1
> year, instead of the ~4months of 0. releases? This is not clear to
> me.
> Can someone provide cases where using the same classpath when
> talking
> to multiple clusters is critical?
>
> Assuming it is, to implement this requires RPC-level support for
> versioning. We could add this by switching to an RPC mechanism with
> built-in, automatic versioning support, like Thrift, Etch or Protocol
> Buffers. But none of these is a drop-in replacement for Hadoop RPC.
> They will probably not initially meet our performance and scalability
> requirements. Their adoption will also require considerable and
> destabilizing changes to Hadoop. Finally, it is not today clear which
> of these would be the best candidate. If we move too soon, we might
> regret our choice and wish to move again later.
>
> So, if we answer yes to (1) above, wishing to provide RPC
> back-compatibility in 1.0, but do not want to hold up a 1.0 release,
> is
> there an alternative to switching? Can we provide incremental
> versioning support to Hadoop's existing RPC mechanism that will
> suffice
> until a clear replacement is available?
>
> Below I suggest a simple versioning style that Hadoop might use to
> permit its RPC protocols to evolve compatibly until an RPC system with
> built-in versioning support is selected. This is not intended to be a
> long-term solution, but rather something that would permit us to more
> flexibly evolve Hadoop's protocols over the next year or so.
>
> This style assumes a globally increasing Hadoop version number. For
> example, this might be the subversion repository version of trunk when
> a change is first introduced.
>
> When an RPC client and server handshake, they exchange version
> numbers. The lower of their two version numbers is selected as the
> version for the connection.
>
> Let's walk through an example. We start with a class that contains
> no versioning information and a single field, 'a':
>
> public class Foo implements Writable {
> int a;
>
> public void write(DataOutput out) throws IOException {
> out.writeInt(a);
> }
>
> public void readFields(DataInput in) throws IOException {
> a = in.readInt();
> }
> }
>
> Now, in version 1, we add a second field, 'b' to this:
>
> public class Foo implements Writable {
> int a;
> float b; // new field
>
> public void write(DataOutput out) throws IOException {
> int version = RPC.getVersion(out);
> out.writeInt(a);
> if (version >= 1) { // peer
> supports b
> out.writeFloat(b); // send it
> }
> }
>
> public void readFields(DataInput in) throws IOException {
> int version = RPC.getVersion(in);
> a = in.readInt();
> if (version >= 1) { // if supports b
> b = in.readFloat(); // read it
> }
> }
> }
>
> Next, in version 2, we remove the first field, 'a':
>
> public class Foo implements Writable {
> float b;
>
> public void write(DataOutput out) throws IOException {
> int version = RPC.getVersion(out);
> if (version < 2) { // peer wants a
> out.writeInt(0); // send it
> }
> if (version >= 1) {
> out.writeFloat(b);
> }
> }
>
> public void readFields(DataInput in) throws IOException {
> int version = RPC.getVersion(in);
> if (version < 2) { // peer writes a
> in.readInt(); // ignore it
> }
> if (version >= 1) {
> b = in.readFloat();
> }
> }
> }
>
> Could something like this work? It would require just some minor
> changes to Hadoop's RPC mechanism, to support the version handshake.
> Beyond that, it could be implemented incrementally as RPC protocols
> evolve. It would require some vigilance, to make sure that versioning
> logic is added when classes change, but adding automated tests against
> prior versions would identify lapses here.
>
> This may appear to add a lot of version-related logic, but with
> automatic versioning, in many cases, some version-related logic is
> still
> required. In simple cases, one adds a completely new field with a
> default value and is done, with automatic versioning handling much of
> the work. But in many other cases an existing field is changed and
> the
> application must translate old values to new, and vice versa. These
> cases still require application logic, even with automatic versioning.
> So automatic versioning is certainly less intrusive, but not as much
> as
> one might first assume.
>
> The fundamental question is how soon we need to address inter-version
> RPC compatibility. If we wish to do it soon, I think we'd be wise to
> consider a solution that's less invasive and that does not force us
> into
> a potentially premature decision.
>
> Doug
>