You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Alan Burlison <Al...@oracle.com> on 2015/05/13 14:02:31 UTC

Protocol Buffers version

The current version of Protocol Buffers is 2.6.1 but the current version 
required by Hadoop is 2.5.0. Is there any reason for this, or should I 
log a JIRA to get it updated?

-- 
Alan Burlison
--

Re: Protocol Buffers version

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Mon, Jun 15, 2015 at 8:57 AM, Andrew Purtell <ap...@apache.org> wrote:
> I can't answer the original question but can point out the protostuff (
> https://github.com/protostuff/protostuff) folks have been responsive and
> friendly in the past when we (HBase) were curious about swapping in their
> stuff. Two significant benefits of protostuff, IMHO, is ASL 2 licensing and
> everything is implemented in Java including the compiler.

Big +1 to protostuff from community, licensing and implementation perspectives.

Thanks,
Roman.

Re: Protocol Buffers version

Posted by Andrew Purtell <ap...@apache.org>.
I can't answer the original question but can point out the protostuff (
https://github.com/protostuff/protostuff) folks have been responsive and
friendly in the past when we (HBase) were curious about swapping in their
stuff. Two significant benefits of protostuff, IMHO, is ASL 2 licensing and
everything is implemented in Java including the compiler.


On Mon, Jun 15, 2015 at 8:49 AM, Sean Busbey <bu...@cloudera.com> wrote:

> Anyone have a read on how the protobuf folks would feel about that? Apache
> has a history of not accepting projects that are non-amicable forks.
>
> On Mon, Jun 15, 2015 at 9:24 AM, Allen Wittenauer <aw...@altiscale.com>
> wrote:
>
> >
> > On Jun 12, 2015, at 1:03 PM, Alan Burlison <Al...@oracle.com>
> > wrote:
> >
> > > On 14/05/2015 18:41, Chris Nauroth wrote:
> > >
> > >> As a reminder though, the community probably would want to see a
> strong
> > >> justification for the upgrade in terms of features or performance or
> > >> something else.  Right now, I'm not seeing a significant benefit for
> us
> > >> based on my reading of their release notes.  I think it's worthwhile
> to
> > >> figure this out first.  Otherwise, there is a risk that any testing
> work
> > >> turns out to be a wasted effort.
> > >
> > > One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1
> > does.
> >
> >
> >         That's a pretty good reason.
> >
> >         Some of us had a discussion at Summit about effectively forking
> > protobuf and making it an Apache TLP.  This would give us a chance to get
> > out from under Google's blind spot, guarantee better compatibility across
> > the ecosystem, etc, etc.
> >
> >         It is sounding more and more like that's really what needs to
> > happen.
>
>
>
>
> --
> Sean
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Protocol Buffers version

Posted by Sean Busbey <bu...@cloudera.com>.
Anyone have a read on how the protobuf folks would feel about that? Apache
has a history of not accepting projects that are non-amicable forks.

On Mon, Jun 15, 2015 at 9:24 AM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> On Jun 12, 2015, at 1:03 PM, Alan Burlison <Al...@oracle.com>
> wrote:
>
> > On 14/05/2015 18:41, Chris Nauroth wrote:
> >
> >> As a reminder though, the community probably would want to see a strong
> >> justification for the upgrade in terms of features or performance or
> >> something else.  Right now, I'm not seeing a significant benefit for us
> >> based on my reading of their release notes.  I think it's worthwhile to
> >> figure this out first.  Otherwise, there is a risk that any testing work
> >> turns out to be a wasted effort.
> >
> > One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1
> does.
>
>
>         That's a pretty good reason.
>
>         Some of us had a discussion at Summit about effectively forking
> protobuf and making it an Apache TLP.  This would give us a chance to get
> out from under Google's blind spot, guarantee better compatibility across
> the ecosystem, etc, etc.
>
>         It is sounding more and more like that's really what needs to
> happen.




-- 
Sean

Re: Protocol Buffers version

Posted by Alan Burlison <Al...@oracle.com>.
On 16/06/2015 10:54, Steve Loughran wrote:

>>>> One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.
>
> to be ruthless, that's not enough reason to upgrade branch-2, due to the transitive pain it makes all the way down.

I completely get your point, however we are faced with two pretty 
equally unpalatable options, either fork PB 2.5.0 and add support for 
Solaris SPARC or switch to 2.6.1.

Although as I've found out, even though 2.6.1 claims to support Solaris 
SPARC it doesn't, and needs a patch (albeit a small one) to get it to 
work :-/ From what I can gather, cross-platform support in PB breaks 
fairly regularly,

-- 
Alan Burlison
--

Re: Protocol Buffers version

Posted by Allen Wittenauer <aw...@altiscale.com>.
On Jun 16, 2015, at 2:54 AM, Steve Loughran <st...@hortonworks.com> wrote:

>>>> 
>>>> One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.
> 
> to be ruthless, that's not enough reason to upgrade branch-2, due to the transitive pain it makes all the way down.

	Not in branch-2, but certainly in trunk.  

Re: Protocol Buffers version

Posted by Steve Loughran <st...@hortonworks.com>.
> On 15 Jun 2015, at 22:31, Colin P. McCabe <cm...@apache.org> wrote:
> 
> On Mon, Jun 15, 2015 at 7:24 AM, Allen Wittenauer <aw...@altiscale.com> wrote:
>> 
>> On Jun 12, 2015, at 1:03 PM, Alan Burlison <Al...@oracle.com> wrote:
>> 
>>> On 14/05/2015 18:41, Chris Nauroth wrote:
>>> 
>>>> As a reminder though, the community probably would want to see a strong
>>>> justification for the upgrade in terms of features or performance or
>>>> something else.  Right now, I'm not seeing a significant benefit for us
>>>> based on my reading of their release notes.  I think it's worthwhile to
>>>> figure this out first.  Otherwise, there is a risk that any testing work
>>>> turns out to be a wasted effort.
>>> 
>>> One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.

to be ruthless, that's not enough reason to upgrade branch-2, due to the transitive pain it makes all the way down.

>> 
>> 
>>        That's a pretty good reason.
>> 
>>        Some of us had a discussion at Summit about effectively forking protobuf and making it an Apache TLP.  This would give us a chance to get out from under Google's blind spot, guarantee better compatibility across the ecosystem, etc, etc.
>> 
>>        It is sounding more and more like that's really what needs to happen.
> 
> I agree that it would be nice if the protobuf project avoided making
> backwards-incompatible API changes within a minor release.  But in
> practice, we have had the same issues with Jackson, Guava, jets3t, and
> other dependencies.  Nearly every important Hadoop dependency has made
> backwards-incompatible API changes within a minor release of the
> dependency... and that's one reason we are using such old versions of
> everything.  I don't think PB deserves to be singled out as much as it
> has been.

I think it does deserve as it was such an all-or-nothing change. Guava, well, we may keep it at 11.0, but we've made sure there are no classes used which aren't in the latest versions. Even where we depend on artifacts which need later versions (curator-2.7.1) we've addressed the version problem by verifying that you can actually rebuild curator with guava<-11.0 with everything working (curator-x-discovery doesn't compile, but we don't use that). So we know that unless a bit of curator uses reflection, we can run it against 11.x. And if someone wants to use a later version of Guava + hadoop-common, they can swap it in and hadoop will still work. Which is important as on Java 8u45 + you do need a recent Guava.

In contrast, protobuf needed a co-ordinate update across everything, every project which had checked in their generated protobuf files had to rebuild and check in, which guarantees they could no longer work with protobuf 2.4

Jackson? its broken-ness wasn't so obvious: if we'd known I wouldn't have let it go updated. It's now on the risk list and I don't see us updating that for a long time.

>  I think the work going on now to implement CLASSPATH
> isolation in Hadoop will really be beneficial here because we will be
> able to upgrade without worrying about these problems.


+1

Re: Protocol Buffers version

Posted by "Colin P. McCabe" <cm...@apache.org>.
On Mon, Jun 15, 2015 at 7:24 AM, Allen Wittenauer <aw...@altiscale.com> wrote:
>
> On Jun 12, 2015, at 1:03 PM, Alan Burlison <Al...@oracle.com> wrote:
>
>> On 14/05/2015 18:41, Chris Nauroth wrote:
>>
>>> As a reminder though, the community probably would want to see a strong
>>> justification for the upgrade in terms of features or performance or
>>> something else.  Right now, I'm not seeing a significant benefit for us
>>> based on my reading of their release notes.  I think it's worthwhile to
>>> figure this out first.  Otherwise, there is a risk that any testing work
>>> turns out to be a wasted effort.
>>
>> One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.
>
>
>         That's a pretty good reason.
>
>         Some of us had a discussion at Summit about effectively forking protobuf and making it an Apache TLP.  This would give us a chance to get out from under Google's blind spot, guarantee better compatibility across the ecosystem, etc, etc.
>
>         It is sounding more and more like that's really what needs to happen.

I agree that it would be nice if the protobuf project avoided making
backwards-incompatible API changes within a minor release.  But in
practice, we have had the same issues with Jackson, Guava, jets3t, and
other dependencies.  Nearly every important Hadoop dependency has made
backwards-incompatible API changes within a minor release of the
dependency... and that's one reason we are using such old versions of
everything.  I don't think PB deserves to be singled out as much as it
has been.  I think the work going on now to implement CLASSPATH
isolation in Hadoop will really be beneficial here because we will be
able to upgrade without worrying about these problems.

cheers,
Colin

Re: Protocol Buffers version

Posted by Allen Wittenauer <aw...@altiscale.com>.
On Jun 12, 2015, at 1:03 PM, Alan Burlison <Al...@oracle.com> wrote:

> On 14/05/2015 18:41, Chris Nauroth wrote:
> 
>> As a reminder though, the community probably would want to see a strong
>> justification for the upgrade in terms of features or performance or
>> something else.  Right now, I'm not seeing a significant benefit for us
>> based on my reading of their release notes.  I think it's worthwhile to
>> figure this out first.  Otherwise, there is a risk that any testing work
>> turns out to be a wasted effort.
> 
> One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.


	That's a pretty good reason.

	Some of us had a discussion at Summit about effectively forking protobuf and making it an Apache TLP.  This would give us a chance to get out from under Google's blind spot, guarantee better compatibility across the ecosystem, etc, etc.

	It is sounding more and more like that's really what needs to happen.

Re: Protocol Buffers version

Posted by Alan Burlison <Al...@oracle.com>.
On 14/05/2015 18:41, Chris Nauroth wrote:

> As a reminder though, the community probably would want to see a strong
> justification for the upgrade in terms of features or performance or
> something else.  Right now, I'm not seeing a significant benefit for us
> based on my reading of their release notes.  I think it's worthwhile to
> figure this out first.  Otherwise, there is a risk that any testing work
> turns out to be a wasted effort.

One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.

-- 
Alan Burlison
--

Re: Protocol Buffers version

Posted by Chris Nauroth <cn...@hortonworks.com>.
Thanks for that link, Alan.  That looks like a useful site!

Ideally, the Protocol Buffers project would give a clear statement about
wire compatibility between 2.5.0 and 2.6.1.  Unfortunately, I can't find
that anywhere.  If it's not documented, then it's probably worth following
up on the Protocol Buffers support lists to ask them.

One thing we could try is starting up a mix of Hadoop processes using
2.5.0 and 2.6.1 to see how it goes.  We've made a commitment to both
forward and backward compatibility within Hadoop 2.x, so we'd need a 2.5.0
client to be able to talk to a 2.6.1 server, and we'd need a 2.6.1 client
to be able to talk to a 2.5.0 server.  Even if this appears to go well, I
wouldn't consider it a substitute for a formal statement of the
compatibility policy from the Protocol Buffers project.  Otherwise, there
might be some subtle lurking issue that we miss in our initial testing.

As a reminder though, the community probably would want to see a strong
justification for the upgrade in terms of features or performance or
something else.  Right now, I'm not seeing a significant benefit for us
based on my reading of their release notes.  I think it's worthwhile to
figure this out first.  Otherwise, there is a risk that any testing work
turns out to be a wasted effort.

--Chris Nauroth




On 5/14/15, 7:23 AM, "Alan Burlison" <Al...@oracle.com> wrote:

>On 13/05/2015 17:13, Chris Nauroth wrote:
>
>> It was important to complete this upgrade before Hadoop 2.x came out of
>> beta.  After that, we committed to a policy of backwards-compatibility
>> within the 2.x release line.  I can't find a statement about whether or
>> not Protocol Buffers 2.6.1 is backwards-compatible with 2.5.0 (both at
>> compile time and on the wire).  Do you know the answer?  If it's
>> backwards-incompatible, then we wouldn't be able to do this upgrade
>>within
>> Hadoop 2.x, though we could consider it for 3.x (trunk).
>
>I'm not sure about the wire format, what's the best way of checking for
>wire format issues?
>
>http://upstream-tracker.org/versions/protobuf.html suggests there are
>are some source-level issues which will require investigation.
>
>> In general, we upgrade dependencies when a new release offers a
>>compelling
>> benefit, not solely to keep up with the latest.  In the case of 2.5.0,
>> there was a performance benefit.  Looking at the release notes for 2.6.0
>> and 2.6.1, I don't see anything particularly compelling.  (That's just
>>my
>> opinion though, and others might disagree.)
>
>I think bundling or forking is the only practical option. I was looking
>to see if we could provide ProtocolBuffers as an installable option on
>our platform, if it's a version-compatibility nightmare as you say,
>that's going to be difficult as we really don't want to have to provide
>multiple versions.
>
>> BTW, if anyone is curious, it's possible to try a custom build right now
>> linked against 2.6.1.  You'd pass -Dprotobuf.version=2.6.1 and
>> -Dprotoc.path=<path to protoc 2.6.1 binary> when you run the mvn
>>command.
>
>Once I have fixed all the other source portability issues I'll circle
>back around and take a look at this.
>
>-- 
>Alan Burlison
>--


Re: Protocol Buffers version

Posted by Steve Loughran <st...@hortonworks.com>.
> On 19 May 2015, at 17:59, Colin P. McCabe <cm...@apache.org> wrote:
> 
> I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
> handled a lot better by Google.  Specifically, since it was an
> API-breaking upgrade, it should have been a major version bump for the
> Java library version.  I also feel that removing the download links
> for the old versions of the native libraries was careless, and
> certainly burned some of our Hadoop users.
> 
> However, I don't see any reason to believe that protobuf 2.6 will not
> be wire-compatible with earlier versions.  Google has actually been
> pretty good about preserving wire-compatibility... just not about API
> compatibility.  If we want to get a formal statement from the project,
> we can, but I would be pretty shocked if they decided to change the
> protocol in a backwards-incompatible way in a minor version release.

that's what they have done well: wire formats don't break (though you have the freedom to do that by adding new non-optional fields)

Of course, they do have the standard service problems then of (a) downgrading if optional fields are omitted and (b) maintaining semantics over time. They just have that at a bigger scale than the rest of us.

the 2.4/2.5 switch showed the trouble of using code from a company capable of doing a whole-stack rebuild overnight. They can update a dependency (protobuf.jar, guava.jar) and have it picked up in the binaries. We don't have that luxury.

> 
> I do think there are some potential issues for our users of bumping
> the library version in a minor Hadoop release.  Until we implement
> full dependency isolation for Hadoop, there may be some disruptions to
> end-users from changing Java dependency versions.  Similarly, users
> will need to install a new native protobuf library version as well.
> So I think we should bump the protobuf versions in Hadoop 3.0, but not
> in 2.x.

+1, though I do fear the more things we put off until "3.0", the bigger that switch and so the harder the adoption.

FWIW, one area I do find hard with protobuf is trying to set message fields through reflection. That is, I want code that will link against, say, the Hadoop 2.6 binaries, but if there are the extra fields for a 2.7 message, to use them. Deep down in the internals, protobuf should let me do this -but not at the java API level.

Re: Protocol Buffers version

Posted by Sangjin Lee <sj...@gmail.com>.
I pushed it out to a github fork:
https://github.com/sjlee/protobuf/tree/2.5.0-incompatibility

We haven't observed other compatibility issues than these.

On Tue, May 19, 2015 at 10:05 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Thanks, Sangjin.  I'd be interested in taking a peek at a personal GitHub
> repo or even just a patch file of those changes.  If there were
> incompatibilities, then that doesn't bode well for an upgrade to 2.6.
>
> --Chris Nauroth
>
>
>
>
> On 5/19/15, 8:40 PM, "Sangjin Lee" <sj...@apache.org> wrote:
>
> >When we moved to Hadoop 2.4, the associated protobuf upgrade (2.4.1 ->
> >2.5.0) proved to be one of the bigger problems. In our case, most of our
> >users were using protobuf 2.4.x or earlier.
> >
> >We identified a couple of places where the backward compatibility was
> >broken, and patched for those issues. We've been running with that patched
> >version of protobuf 2.5.0 since. I can push out those changes to github or
> >something if others are interested FWIW.
> >
> >Regards,
> >Sangjin
> >
> >On Tue, May 19, 2015 at 9:59 AM, Colin P. McCabe <cm...@apache.org>
> >wrote:
> >
> >> I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
> >> handled a lot better by Google.  Specifically, since it was an
> >> API-breaking upgrade, it should have been a major version bump for the
> >> Java library version.  I also feel that removing the download links
> >> for the old versions of the native libraries was careless, and
> >> certainly burned some of our Hadoop users.
> >>
> >> However, I don't see any reason to believe that protobuf 2.6 will not
> >> be wire-compatible with earlier versions.  Google has actually been
> >> pretty good about preserving wire-compatibility... just not about API
> >> compatibility.  If we want to get a formal statement from the project,
> >> we can, but I would be pretty shocked if they decided to change the
> >> protocol in a backwards-incompatible way in a minor version release.
> >>
> >> I do think there are some potential issues for our users of bumping
> >> the library version in a minor Hadoop release.  Until we implement
> >> full dependency isolation for Hadoop, there may be some disruptions to
> >> end-users from changing Java dependency versions.  Similarly, users
> >> will need to install a new native protobuf library version as well.
> >> So I think we should bump the protobuf versions in Hadoop 3.0, but not
> >> in 2.x.
> >>
> >> cheers,
> >> Colin
> >>
> >> On Fri, May 15, 2015 at 4:55 AM, Alan Burlison
> >><Al...@oracle.com>
> >> wrote:
> >> > On 15/05/2015 09:44, Steve Loughran wrote:
> >> >
> >> >> Now: why do you want to use a later version of protobuf.jar? Is it
> >> >> because "it is there"? Or is there a tangible need?
> >> >
> >> >
> >> > No, it's because I'm looking at this from a platform perspective: We
> >>have
> >> > other consumers of ProtoBuf beside Hadoop and we'd obviously like to
> >> > minimise the versions of PB that we ship, and preferably just ship the
> >> > latest version. The fact that PB seems to often be incompatible across
> >> > releases is an issue as it makes upgrading and dropping older versions
> >> > problematic.
> >> >
> >> > --
> >> > Alan Burlison
> >> > --
> >>
>
>

Re: Protocol Buffers version

Posted by Chris Nauroth <cn...@hortonworks.com>.
Thanks, Sangjin.  I'd be interested in taking a peek at a personal GitHub
repo or even just a patch file of those changes.  If there were
incompatibilities, then that doesn't bode well for an upgrade to 2.6.

--Chris Nauroth




On 5/19/15, 8:40 PM, "Sangjin Lee" <sj...@apache.org> wrote:

>When we moved to Hadoop 2.4, the associated protobuf upgrade (2.4.1 ->
>2.5.0) proved to be one of the bigger problems. In our case, most of our
>users were using protobuf 2.4.x or earlier.
>
>We identified a couple of places where the backward compatibility was
>broken, and patched for those issues. We've been running with that patched
>version of protobuf 2.5.0 since. I can push out those changes to github or
>something if others are interested FWIW.
>
>Regards,
>Sangjin
>
>On Tue, May 19, 2015 at 9:59 AM, Colin P. McCabe <cm...@apache.org>
>wrote:
>
>> I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
>> handled a lot better by Google.  Specifically, since it was an
>> API-breaking upgrade, it should have been a major version bump for the
>> Java library version.  I also feel that removing the download links
>> for the old versions of the native libraries was careless, and
>> certainly burned some of our Hadoop users.
>>
>> However, I don't see any reason to believe that protobuf 2.6 will not
>> be wire-compatible with earlier versions.  Google has actually been
>> pretty good about preserving wire-compatibility... just not about API
>> compatibility.  If we want to get a formal statement from the project,
>> we can, but I would be pretty shocked if they decided to change the
>> protocol in a backwards-incompatible way in a minor version release.
>>
>> I do think there are some potential issues for our users of bumping
>> the library version in a minor Hadoop release.  Until we implement
>> full dependency isolation for Hadoop, there may be some disruptions to
>> end-users from changing Java dependency versions.  Similarly, users
>> will need to install a new native protobuf library version as well.
>> So I think we should bump the protobuf versions in Hadoop 3.0, but not
>> in 2.x.
>>
>> cheers,
>> Colin
>>
>> On Fri, May 15, 2015 at 4:55 AM, Alan Burlison
>><Al...@oracle.com>
>> wrote:
>> > On 15/05/2015 09:44, Steve Loughran wrote:
>> >
>> >> Now: why do you want to use a later version of protobuf.jar? Is it
>> >> because "it is there"? Or is there a tangible need?
>> >
>> >
>> > No, it's because I'm looking at this from a platform perspective: We
>>have
>> > other consumers of ProtoBuf beside Hadoop and we'd obviously like to
>> > minimise the versions of PB that we ship, and preferably just ship the
>> > latest version. The fact that PB seems to often be incompatible across
>> > releases is an issue as it makes upgrading and dropping older versions
>> > problematic.
>> >
>> > --
>> > Alan Burlison
>> > --
>>


Re: Protocol Buffers version

Posted by Sangjin Lee <sj...@apache.org>.
When we moved to Hadoop 2.4, the associated protobuf upgrade (2.4.1 ->
2.5.0) proved to be one of the bigger problems. In our case, most of our
users were using protobuf 2.4.x or earlier.

We identified a couple of places where the backward compatibility was
broken, and patched for those issues. We've been running with that patched
version of protobuf 2.5.0 since. I can push out those changes to github or
something if others are interested FWIW.

Regards,
Sangjin

On Tue, May 19, 2015 at 9:59 AM, Colin P. McCabe <cm...@apache.org> wrote:

> I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
> handled a lot better by Google.  Specifically, since it was an
> API-breaking upgrade, it should have been a major version bump for the
> Java library version.  I also feel that removing the download links
> for the old versions of the native libraries was careless, and
> certainly burned some of our Hadoop users.
>
> However, I don't see any reason to believe that protobuf 2.6 will not
> be wire-compatible with earlier versions.  Google has actually been
> pretty good about preserving wire-compatibility... just not about API
> compatibility.  If we want to get a formal statement from the project,
> we can, but I would be pretty shocked if they decided to change the
> protocol in a backwards-incompatible way in a minor version release.
>
> I do think there are some potential issues for our users of bumping
> the library version in a minor Hadoop release.  Until we implement
> full dependency isolation for Hadoop, there may be some disruptions to
> end-users from changing Java dependency versions.  Similarly, users
> will need to install a new native protobuf library version as well.
> So I think we should bump the protobuf versions in Hadoop 3.0, but not
> in 2.x.
>
> cheers,
> Colin
>
> On Fri, May 15, 2015 at 4:55 AM, Alan Burlison <Al...@oracle.com>
> wrote:
> > On 15/05/2015 09:44, Steve Loughran wrote:
> >
> >> Now: why do you want to use a later version of protobuf.jar? Is it
> >> because "it is there"? Or is there a tangible need?
> >
> >
> > No, it's because I'm looking at this from a platform perspective: We have
> > other consumers of ProtoBuf beside Hadoop and we'd obviously like to
> > minimise the versions of PB that we ship, and preferably just ship the
> > latest version. The fact that PB seems to often be incompatible across
> > releases is an issue as it makes upgrading and dropping older versions
> > problematic.
> >
> > --
> > Alan Burlison
> > --
>

Re: Protocol Buffers version

Posted by "Colin P. McCabe" <cm...@apache.org>.
I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
handled a lot better by Google.  Specifically, since it was an
API-breaking upgrade, it should have been a major version bump for the
Java library version.  I also feel that removing the download links
for the old versions of the native libraries was careless, and
certainly burned some of our Hadoop users.

However, I don't see any reason to believe that protobuf 2.6 will not
be wire-compatible with earlier versions.  Google has actually been
pretty good about preserving wire-compatibility... just not about API
compatibility.  If we want to get a formal statement from the project,
we can, but I would be pretty shocked if they decided to change the
protocol in a backwards-incompatible way in a minor version release.

I do think there are some potential issues for our users of bumping
the library version in a minor Hadoop release.  Until we implement
full dependency isolation for Hadoop, there may be some disruptions to
end-users from changing Java dependency versions.  Similarly, users
will need to install a new native protobuf library version as well.
So I think we should bump the protobuf versions in Hadoop 3.0, but not
in 2.x.

cheers,
Colin

On Fri, May 15, 2015 at 4:55 AM, Alan Burlison <Al...@oracle.com> wrote:
> On 15/05/2015 09:44, Steve Loughran wrote:
>
>> Now: why do you want to use a later version of protobuf.jar? Is it
>> because "it is there"? Or is there a tangible need?
>
>
> No, it's because I'm looking at this from a platform perspective: We have
> other consumers of ProtoBuf beside Hadoop and we'd obviously like to
> minimise the versions of PB that we ship, and preferably just ship the
> latest version. The fact that PB seems to often be incompatible across
> releases is an issue as it makes upgrading and dropping older versions
> problematic.
>
> --
> Alan Burlison
> --

Re: Protocol Buffers version

Posted by Alan Burlison <Al...@oracle.com>.
On 15/05/2015 09:44, Steve Loughran wrote:

> Now: why do you want to use a later version of protobuf.jar? Is it
> because "it is there"? Or is there a tangible need?

No, it's because I'm looking at this from a platform perspective: We 
have other consumers of ProtoBuf beside Hadoop and we'd obviously like 
to minimise the versions of PB that we ship, and preferably just ship 
the latest version. The fact that PB seems to often be incompatible 
across releases is an issue as it makes upgrading and dropping older 
versions problematic.

-- 
Alan Burlison
--

Re: Protocol Buffers version

Posted by Steve Loughran <st...@hortonworks.com>.
On 14 May 2015, at 15:23, Alan Burlison <Al...@oracle.com>> wrote:

I think bundling or forking is the only practical option. I was looking to see if we could provide ProtocolBuffers as an installable option on our platform, if it's a version-compatibility nightmare as you say, that's going to be difficult as we really don't want to have to provide multiple versions.

The problem Hadoop has is that it's code, especially the HDFS client code, is used in a lot of other applications, and they end up having be in sync at the Java level. Hopefully the protobuf wire format is compatible (that is the whole point of the format, after all), but we know from experience that the JAR-level it isn't. Having to rebuild every single .proto derived java class and then switch across the entire dependency tree was the upgrade path there, with about a month where getting the trunk versions of two apps to link was pretty hit and miss.

I think everyone came out burned from that
-scared and unwilling to repeat the experience
-not believing any further google assertions of library compatibility (see also: guava)

What to do?

  1.  Leave alone and it slowly ages, when an upgrade happens it can be more traumatic. But until that time: nothing breaks.
  2.  Upgrade regularly and you can dramatically break things, so people don't upgrade Hadoop itself, they stick with old versions (with issues already fixed in the later releases), they keep on requesting backported fixes into the "working" branch and you end up with two branches of your code to maintain.
  3.  Fork and you take on maintenance costs of your forked library forever; it will implicitly age and theres' the opportunity cost of that work, i.e. better things to waste your time on.
  4.  Rip out protobuf entirely and switch to something else (thrift) that has better stability, tag the proto channels as deprecated, etc, etc. You'd better trust the successor's stability and security features before going to that effort.

Hadoop 2.x has defaulted to option (1).

Now: why do you want to use a later version of protobuf.jar? Is it because "it is there"? Or is there a tangible need?

-steve

Re: Protocol Buffers version

Posted by Alan Burlison <Al...@oracle.com>.
On 13/05/2015 17:13, Chris Nauroth wrote:

> It was important to complete this upgrade before Hadoop 2.x came out of
> beta.  After that, we committed to a policy of backwards-compatibility
> within the 2.x release line.  I can't find a statement about whether or
> not Protocol Buffers 2.6.1 is backwards-compatible with 2.5.0 (both at
> compile time and on the wire).  Do you know the answer?  If it's
> backwards-incompatible, then we wouldn't be able to do this upgrade within
> Hadoop 2.x, though we could consider it for 3.x (trunk).

I'm not sure about the wire format, what's the best way of checking for 
wire format issues?

http://upstream-tracker.org/versions/protobuf.html suggests there are 
are some source-level issues which will require investigation.

> In general, we upgrade dependencies when a new release offers a compelling
> benefit, not solely to keep up with the latest.  In the case of 2.5.0,
> there was a performance benefit.  Looking at the release notes for 2.6.0
> and 2.6.1, I don't see anything particularly compelling.  (That's just my
> opinion though, and others might disagree.)

I think bundling or forking is the only practical option. I was looking 
to see if we could provide ProtocolBuffers as an installable option on 
our platform, if it's a version-compatibility nightmare as you say, 
that's going to be difficult as we really don't want to have to provide 
multiple versions.

> BTW, if anyone is curious, it's possible to try a custom build right now
> linked against 2.6.1.  You'd pass -Dprotobuf.version=2.6.1 and
> -Dprotoc.path=<path to protoc 2.6.1 binary> when you run the mvn command.

Once I have fixed all the other source portability issues I'll circle 
back around and take a look at this.

-- 
Alan Burlison
--

Re: Protocol Buffers version

Posted by Chris Nauroth <cn...@hortonworks.com>.
Some additional details...

A few years ago, we moved from Protocol Buffers 2.4.1 to 2.5.0.  There
were some challenges with that upgrade, because 2.5.0 was not
backwards-compatible with 2.4.1.  We needed to coordinate carefully with
projects downstream of Hadoop that receive our protobuf classes through
transitive dependency.  Here are a few issues with more background:

https://issues.apache.org/jira/browse/HADOOP-9845

https://issues.apache.org/jira/browse/HBASE-8165

https://issues.apache.org/jira/browse/HIVE-5112

It was important to complete this upgrade before Hadoop 2.x came out of
beta.  After that, we committed to a policy of backwards-compatibility
within the 2.x release line.  I can't find a statement about whether or
not Protocol Buffers 2.6.1 is backwards-compatible with 2.5.0 (both at
compile time and on the wire).  Do you know the answer?  If it's
backwards-incompatible, then we wouldn't be able to do this upgrade within
Hadoop 2.x, though we could consider it for 3.x (trunk).

In general, we upgrade dependencies when a new release offers a compelling
benefit, not solely to keep up with the latest.  In the case of 2.5.0,
there was a performance benefit.  Looking at the release notes for 2.6.0
and 2.6.1, I don't see anything particularly compelling.  (That's just my
opinion though, and others might disagree.)

https://github.com/google/protobuf/blob/master/CHANGES.txt

BTW, if anyone is curious, it's possible to try a custom build right now
linked against 2.6.1.  You'd pass -Dprotobuf.version=2.6.1 and
-Dprotoc.path=<path to protoc 2.6.1 binary> when you run the mvn command.


--Chris Nauroth




On 5/13/15, 8:59 AM, "Allen Wittenauer" <aw...@altiscale.com> wrote:

>
>On May 13, 2015, at 5:02 AM, Alan Burlison <Al...@oracle.com>
>wrote:
>
>> The current version of Protocol Buffers is 2.6.1 but the current
>>version required by Hadoop is 2.5.0. Is there any reason for this, or
>>should I log a JIRA to get it updated?
>
>	The story of protocol buffers is part of a shameful past where Hadoop
>trusted Google.  This was a terrible mistake, based upon the last time
>the project upgraded.  2.4->2.5 required some source level, non-backward
>compatible, and completely-avoidable-but-G-made-us-do-it-anyway surgery
>to make work. This also ended up being a flag day for every single
>developer who not only worked with Hadoop but all of the downstream
>projects as well.  Big disaster.
>
>	The fact that when Google shut down Google Code, they didn't even tag
>previous releases  in the github source tree without significant amount
>of pressure from the open source community was just adding insult to
>injury.  As a result, I believe the collective opinion is to just flat
>out avoid adding any more Google bits into the system.
>
>	See also: guava, which suffers from the same shortsightedness.
>
>	At some point, we'll either upgrade, switch to a different protocol
>serialization format, or fork protobuf.
>


Re: Protocol Buffers version

Posted by Allen Wittenauer <aw...@altiscale.com>.
On May 13, 2015, at 5:02 AM, Alan Burlison <Al...@oracle.com> wrote:

> The current version of Protocol Buffers is 2.6.1 but the current version required by Hadoop is 2.5.0. Is there any reason for this, or should I log a JIRA to get it updated?

	The story of protocol buffers is part of a shameful past where Hadoop trusted Google.  This was a terrible mistake, based upon the last time the project upgraded.  2.4->2.5 required some source level, non-backward compatible, and completely-avoidable-but-G-made-us-do-it-anyway surgery to make work. This also ended up being a flag day for every single developer who not only worked with Hadoop but all of the downstream projects as well.  Big disaster.

	The fact that when Google shut down Google Code, they didn't even tag previous releases  in the github source tree without significant amount of pressure from the open source community was just adding insult to injury.  As a result, I believe the collective opinion is to just flat out avoid adding any more Google bits into the system.

	See also: guava, which suffers from the same shortsightedness. 

	At some point, we'll either upgrade, switch to a different protocol serialization format, or fork protobuf.