You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Driesprong, Fokko" <fo...@driesprong.frl> on 2020/03/02 08:59:52 UTC

Re: [DISCUSS] version numbers and where changes should land

So, as I understand it. The whole 1.x version should be binary compatible.
So anything is written with Java 1.x should be readable with Python 1.x.
We've been working on extending the integration tests as well.

Not all languages support all the features, for example, many languages
lack support for logical types. In this case, a datetime would be just read
as an integer, so there is a fallback scenario.

For 1.9 we broke some of the API's because, at the time, the decision was
that removing Jackson from the public API was required to move from
Codehaus jackson (1.x) to the Fasterxml one (2.x). The public API shouldn't
have exposed these methods in the first place.

I wouldn't be in favor of switching to 10.x (dropping the 1. in front of
it). What's the added value in this? I'm just afraid of changing this,
would confuse a lot of downstream users.

Also, a similar discussion was on the Spark devlist, I think Michael has
some valid points here:
https://mail-archives.apache.org/mod_mbox/spark-dev/202002.mbox/browser

Maybe it is good to formalize our policy, and put it on the website.

Cheers, Fokko Driesprong

Op vr 28 feb. 2020 om 17:53 schreef Sean Busbey <bu...@apache.org>:

> Counterpoint on independently versioning the various languages. Do we
> know if Python Avro X works with Java Avro Y as it is? It seems like
> we already get surprised pretty often when they don't.
>
> If we stop including the "data compatibility version" or whatever
> we're calling the first number, we'll need to get more formal on
> versioning the specification and having libraries plainly label which
> specification(s) they comport to.
>
> At the very least it seems like we'd make the _easy_ path easier for
> the languages that are well maintained. Sure it'll be burden on those
> languages that aren't well maintained, but it seems like those are
> already in that position?
>
> On Thu, Feb 27, 2020 at 9:13 AM Ismaël Mejía <ie...@gmail.com> wrote:
> >
> > Bringing my comment from the JIRA ticket here for discussion:
> >
> > > "One argument against semantic versioning is the fact that Avro
> supports
> > 9 language APIs, so if let's say C++ breaks its backwards compatibility
> > should we move the version number up for every single language? Sounds
> like
> > a burden and in particular a not easy task to track since we do not have
> > proper validation of breaking changes in place for every language at this
> > point.
> > > ... (even if we separate release numbers per language) that seems like
> a
> > lot of work for probably a similar output because then users will doubt,
> > wait is Python Avro 3.1.0 compatible with Java Avro 5.2.0? and they will
> > probably be for the binary format."
> >
> > Also there is the case of interop tests, how will those act in this case.
> > We will need a compatibility matrix, again I am not sure if it is the
> best
> > approach, looks like lots of work for not much in return.
> >
> >
> >
> > On Thu, Feb 27, 2020 at 12:21 PM Ryan Skraba <ry...@skraba.com> wrote:
> >
> > > Hello!  Resurrecting -- I think this was the last thread bringing up
> this
> > > issue!
> > >
> > > Since we've talked about releasing 1.10.x in May, and it's a nice
> > > round number... what do you think about
> > >
> > > 1) finally dropping the prefix for the "specification version" and
> > > calling it Avro 10.x
> > >
> > > 2) committing to semantic versioning for future releases
> > >
> > > I can see this being a hugely positive move for aligning with the
> > > expectations of developers and projects... but it leads to a lot of
> > > questions about releasing all the artifacts together.
> > >
> > > There's already a JIRA:
> https://issues.apache.org/jira/browse/AVRO-2687
> > >
> > > Ryan
> > >
> > > On Fri, Sep 13, 2019 at 12:00 PM Driesprong, Fokko
> <fo...@driesprong.frl>
> > > wrote:
> > > >
> > > > Thanks Sean for bringing this up.
> > > >
> > > > For the 1.9 branch there were some incompatible changes in the API
> with
> > > > respect to 1.8.2. We've removed Jackson
> > > > <https://github.com/apache/avro/pull/135> and Netty from the public
> API.
> > > > This is actually breaking some of the builds
> > > > <https://github.com/apache/incubator-iceberg/pull/297>, so,
> > > unfortunately,
> > > > it isn't compatible, and therefore the major version bump.
> > > >
> > > > The 1.9.x branch still has support for the Joda time library, but
> > > defaults
> > > > to jsr310, but is still compatible (I believe). For 1.10 the plan is
> to
> > > > completely remove Joda from the codebase since it is officially
> > > deprecated
> > > > in favor of Java8 time (jsr310). A lot of this stuff is just changes
> to
> > > the
> > > > Java API of Avro, which mostly involves changes to the LogicalTypes,
> so
> > > the
> > > > actual format is still compatible (as it should).
> > > >
> > > > I agree with you Sean, that a lot of the changes that are targeted
> for
> > > 1.10
> > > > could be cherry-picked back to the 1.9 branch. If someone is willing
> to
> > > do
> > > > this, I would be grateful. However, maintaining a lot of different
> > > branches
> > > > is quite time-consuming in terms of release management of the
> different
> > > > versions. For Apache Avro 1.9.0 we actually had some regression bugs
> > > which
> > > > were blocking, therefore the 1.9.1 release.
> > > >
> > > > Personally I don't have big objection on bumping the major version if
> > > there
> > > > are breaking changes to one of the API's. But a big +1 on having a
> > > > standardized approach on the versioning, this also includes a more
> clear
> > > > approach on documenting the upgrade process and a better changelog.
> I've
> > > > added summaries of the releases a the Github releases:
> > > > https://github.com/apache/avro/releases but I think having this on
> the
> > > Avro
> > > > website might be more appropriate.
> > > >
> > > > Cheers, Fokko Driesprong
> > > >
> > > >
> > > >
> > > > Op wo 11 sep. 2019 om 18:17 schreef Ryan Blue
> <rblue@netflix.com.invalid
> > > >:
> > > >
> > > > > > What would it look like if we *did* have to make an incompatible
> data
> > > > > format change after adopting "conventional" library version
> strings?
> > > > >
> > > > > Let's call these format v1 and v2. The library must produce v1 by
> > > default,
> > > > > so it's a matter of having support for writing v2. When the default
> > > > > changes to v2, then that behavior change would require a major
> version
> > > > > increase to signal changes to compatibility. I think we would also
> want
> > > > > clear documentation for each version that shows what versions of
> the
> > > format
> > > > > it can read, write, and what it will use by default. A table on the
> > > site
> > > > > would work.
> > > > >
> > > > > On Tue, Sep 10, 2019 at 2:51 PM Sean Busbey <bu...@apache.org>
> wrote:
> > > > >
> > > > > > What would it look like if we *did* have to make an incompatible
> data
> > > > > > format change after adopting "conventional" library version
> strings?
> > > > > >
> > > > > > What if we version the specification independent from the
> libraries
> > > > > > and then have the docs for the libraries claim spec version
> > > > > > compatibility?
> > > > > >
> > > > > > On Tue, Sep 10, 2019 at 3:55 PM Ryan Blue
> <rblue@netflix.com.invalid
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > +1 for changing the version strings to follow a more standard
> > > > > convention.
> > > > > > > We don't have any breaking format changes, so I think it is
> > > expected
> > > > > that
> > > > > > > the format compatibility version won't change.
> > > > > > >
> > > > > > > On Tue, Sep 10, 2019 at 7:28 AM Sean Busbey <busbey@apache.org
> >
> > > wrote:
> > > > > > >
> > > > > > > > Hi folks!
> > > > > > > >
> > > > > > > > historically, Avro version numbers have had the form:
> > > > > > > >
> > > > > > > > <data compatibility> . <major library version> . <minor
> library
> > > > > > version>
> > > > > > > >
> > > > > > > > That is, the first number says wether or not we expect data
> > > > > > > > serialization to be compatible, and the second to say wether
> we
> > > > > expect
> > > > > > > > some library will be backwards incompatible however that's
> > > defined
> > > > > for
> > > > > > > > the library's language. For example, in the Java library
> when we
> > > make
> > > > > > > > changes to public method signatures such that folks can't
> just
> > > swap
> > > > > > > > out jar files of our implementation.
> > > > > > > >
> > > > > > > > While getting myself up to speed on the state of our release
> > > lines, I
> > > > > > > > noticed we already have the 1.9 release line in a branch,
> with
> > > master
> > > > > > > > set up for the next major library version. JIRA shows ~46
> issues
> > > that
> > > > > > > > are in 1.10 but not in a 1.9 release[1].
> > > > > > > >
> > > > > > > > I haven't looked at all of them yet, but the few I sampled
> don't
> > > see
> > > > > > > > to require a major version increment.
> > > > > > > >
> > > > > > > > I looked around our site and I also can't find anywhere that
> > > we've
> > > > > > > > documented our version strings. I know I've been in
> discussions
> > > in
> > > > > > > > other communities where our version strings have been
> surprising.
> > > > > e.g.
> > > > > > > > folks had assumed they can do a low-effort upgrade from 1.7
> to
> > > 1.8
> > > > > > > > only to find that there were documented incompatibilities and
> > > > > behavior
> > > > > > > > changes.
> > > > > > > >
> > > > > > > > Are we actively planning on rolling out 1.10? (like, do we
> have a
> > > > > goal
> > > > > > > > date?)
> > > > > > > >
> > > > > > > > I know that when 1.9 went out we EOLed 1.7 and 1.8 in part
> due
> > > to the
> > > > > > > > overhead of trying to maintain multiple release lines
> (especially
> > > > > once
> > > > > > > > that had so much baggage) while we're trying to reestablish
> good
> > > > > > > > habits on release cadence. How many major version are we
> > > planning to
> > > > > > > > keep going once 1.10 is ready?
> > > > > > > >
> > > > > > > > What do folks think about starting a CONTRIBUTING.md with
> some of
> > > > > > > > these expectations? Is there a better place to track it?
> > > > > > > >
> > > > > > > > [1] : https://s.apache.org/71yqv
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Ryan Blue
> > > > > > > Software Engineer
> > > > > > > Netflix
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Ryan Blue
> > > > > Software Engineer
> > > > > Netflix
> > > > >
> > >
>

Re: [DISCUSS] version numbers and where changes should land

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
+1 for incrementing the major number for library compatibility changes and
decoupling the release versions across languages.

I agree with the points that Sean made. This is confusing for users, and
needlessly so because there is only one binary format and I don't think
there are major pushes to introduce a breaking change.

On Mon, Mar 2, 2020 at 7:48 AM Sean Busbey <bu...@apache.org> wrote:

> > So, as I understand it. The whole 1.x version should be binary
> compatible.
>
> the term "binary compatible" is overloaded, thanks to our
> participation in the Java ecosystem. Every data file written in Avro
> 1.y is intended to be readable by every other Avro 1.y release,
> regardless of language. That is true even if we know that there are
> cases where errors in various language libraries has prevented
> success.
>
> In Java ecosystem parlance the term "binary compatible" also refers to
> the ability to use a Java library in place of another without needing
> to recompile any code that refers to said library. It is definitely
> not the case that every Avro 1.y Java version has been binary
> compatible in this sense (in fact just the opposite).
>
> > For 1.9 we broke some of the API's because, at the time, the decision was
> > that removing Jackson from the public API was required to move from
> > Codehaus jackson (1.x) to the Fasterxml one (2.x). The public API
> shouldn't
> > have exposed these methods in the first place.
>
> I think this is confusing the two issues of "compatibility of
> serialized bytes" and "compatibility of language APIs". This is part
> of why I think we should stop relying on the first version number to
> indicate "compatibility of serialized bytes".
>
> > I wouldn't be in favor of switching to 10.x (dropping the 1. in front of
> > it). What's the added value in this? I'm just afraid of changing this,
> > would confuse a lot of downstream users.
>
> The big advantage is that literally everywhere else in the software
> ecosystems I've seen the first number in a version string is either
> "major version" or "marketing version", usually the former. Folks
> expect that if that version number hasn't changed then they should be
> able to "easily" upgrade to use the newer library. In Avro that
> plainly isn't true. I can think of multiple cases where other ASF
> projects have gotten surprised that going from e.g. Java libraries for
> Avro 1.7 to Avro 1.8 was a major version bump.
>
> I agree that going from 1.x.y to 10.y.z might be confusing due to the
> large number jump. I think going to "2.y.z" would clearly indicate
> folks needed to pay attention to a version difference because the
> first number changed. When we have their attention we can explain that
> we're using it as major version from now on.
>
>
> > Also, a similar discussion was on the Spark devlist, I think Michael has
> > some valid points here:
> > https://mail-archives.apache.org/mod_mbox/spark-dev/202002.mbox/browser
>
> This is a month of email from dev@spark. Could you link to a specific
> thread on lists.apache.org or provide a subject line?
>
> On Mon, Mar 2, 2020 at 3:00 AM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
> >
> > So, as I understand it. The whole 1.x version should be binary
> compatible.
> > So anything is written with Java 1.x should be readable with Python 1.x.
> > We've been working on extending the integration tests as well.
> >
> > Not all languages support all the features, for example, many languages
> > lack support for logical types. In this case, a datetime would be just
> read
> > as an integer, so there is a fallback scenario.
> >
> > For 1.9 we broke some of the API's because, at the time, the decision was
> > that removing Jackson from the public API was required to move from
> > Codehaus jackson (1.x) to the Fasterxml one (2.x). The public API
> shouldn't
> > have exposed these methods in the first place.
> >
> > I wouldn't be in favor of switching to 10.x (dropping the 1. in front of
> > it). What's the added value in this? I'm just afraid of changing this,
> > would confuse a lot of downstream users.
> >
> > Also, a similar discussion was on the Spark devlist, I think Michael has
> > some valid points here:
> > https://mail-archives.apache.org/mod_mbox/spark-dev/202002.mbox/browser
> >
> > Maybe it is good to formalize our policy, and put it on the website.
> >
> > Cheers, Fokko Driesprong
> >
> > Op vr 28 feb. 2020 om 17:53 schreef Sean Busbey <bu...@apache.org>:
> >
> > > Counterpoint on independently versioning the various languages. Do we
> > > know if Python Avro X works with Java Avro Y as it is? It seems like
> > > we already get surprised pretty often when they don't.
> > >
> > > If we stop including the "data compatibility version" or whatever
> > > we're calling the first number, we'll need to get more formal on
> > > versioning the specification and having libraries plainly label which
> > > specification(s) they comport to.
> > >
> > > At the very least it seems like we'd make the _easy_ path easier for
> > > the languages that are well maintained. Sure it'll be burden on those
> > > languages that aren't well maintained, but it seems like those are
> > > already in that position?
> > >
> > > On Thu, Feb 27, 2020 at 9:13 AM Ismaël Mejía <ie...@gmail.com>
> wrote:
> > > >
> > > > Bringing my comment from the JIRA ticket here for discussion:
> > > >
> > > > > "One argument against semantic versioning is the fact that Avro
> > > supports
> > > > 9 language APIs, so if let's say C++ breaks its backwards
> compatibility
> > > > should we move the version number up for every single language?
> Sounds
> > > like
> > > > a burden and in particular a not easy task to track since we do not
> have
> > > > proper validation of breaking changes in place for every language at
> this
> > > > point.
> > > > > ... (even if we separate release numbers per language) that seems
> like
> > > a
> > > > lot of work for probably a similar output because then users will
> doubt,
> > > > wait is Python Avro 3.1.0 compatible with Java Avro 5.2.0? and they
> will
> > > > probably be for the binary format."
> > > >
> > > > Also there is the case of interop tests, how will those act in this
> case.
> > > > We will need a compatibility matrix, again I am not sure if it is the
> > > best
> > > > approach, looks like lots of work for not much in return.
> > > >
> > > >
> > > >
> > > > On Thu, Feb 27, 2020 at 12:21 PM Ryan Skraba <ry...@skraba.com>
> wrote:
> > > >
> > > > > Hello!  Resurrecting -- I think this was the last thread bringing
> up
> > > this
> > > > > issue!
> > > > >
> > > > > Since we've talked about releasing 1.10.x in May, and it's a nice
> > > > > round number... what do you think about
> > > > >
> > > > > 1) finally dropping the prefix for the "specification version" and
> > > > > calling it Avro 10.x
> > > > >
> > > > > 2) committing to semantic versioning for future releases
> > > > >
> > > > > I can see this being a hugely positive move for aligning with the
> > > > > expectations of developers and projects... but it leads to a lot of
> > > > > questions about releasing all the artifacts together.
> > > > >
> > > > > There's already a JIRA:
> > > https://issues.apache.org/jira/browse/AVRO-2687
> > > > >
> > > > > Ryan
> > > > >
> > > > > On Fri, Sep 13, 2019 at 12:00 PM Driesprong, Fokko
> > > <fo...@driesprong.frl>
> > > > > wrote:
> > > > > >
> > > > > > Thanks Sean for bringing this up.
> > > > > >
> > > > > > For the 1.9 branch there were some incompatible changes in the
> API
> > > with
> > > > > > respect to 1.8.2. We've removed Jackson
> > > > > > <https://github.com/apache/avro/pull/135> and Netty from the
> public
> > > API.
> > > > > > This is actually breaking some of the builds
> > > > > > <https://github.com/apache/incubator-iceberg/pull/297>, so,
> > > > > unfortunately,
> > > > > > it isn't compatible, and therefore the major version bump.
> > > > > >
> > > > > > The 1.9.x branch still has support for the Joda time library, but
> > > > > defaults
> > > > > > to jsr310, but is still compatible (I believe). For 1.10 the
> plan is
> > > to
> > > > > > completely remove Joda from the codebase since it is officially
> > > > > deprecated
> > > > > > in favor of Java8 time (jsr310). A lot of this stuff is just
> changes
> > > to
> > > > > the
> > > > > > Java API of Avro, which mostly involves changes to the
> LogicalTypes,
> > > so
> > > > > the
> > > > > > actual format is still compatible (as it should).
> > > > > >
> > > > > > I agree with you Sean, that a lot of the changes that are
> targeted
> > > for
> > > > > 1.10
> > > > > > could be cherry-picked back to the 1.9 branch. If someone is
> willing
> > > to
> > > > > do
> > > > > > this, I would be grateful. However, maintaining a lot of
> different
> > > > > branches
> > > > > > is quite time-consuming in terms of release management of the
> > > different
> > > > > > versions. For Apache Avro 1.9.0 we actually had some regression
> bugs
> > > > > which
> > > > > > were blocking, therefore the 1.9.1 release.
> > > > > >
> > > > > > Personally I don't have big objection on bumping the major
> version if
> > > > > there
> > > > > > are breaking changes to one of the API's. But a big +1 on having
> a
> > > > > > standardized approach on the versioning, this also includes a
> more
> > > clear
> > > > > > approach on documenting the upgrade process and a better
> changelog.
> > > I've
> > > > > > added summaries of the releases a the Github releases:
> > > > > > https://github.com/apache/avro/releases but I think having this
> on
> > > the
> > > > > Avro
> > > > > > website might be more appropriate.
> > > > > >
> > > > > > Cheers, Fokko Driesprong
> > > > > >
> > > > > >
> > > > > >
> > > > > > Op wo 11 sep. 2019 om 18:17 schreef Ryan Blue
> > > <rblue@netflix.com.invalid
> > > > > >:
> > > > > >
> > > > > > > > What would it look like if we *did* have to make an
> incompatible
> > > data
> > > > > > > format change after adopting "conventional" library version
> > > strings?
> > > > > > >
> > > > > > > Let's call these format v1 and v2. The library must produce v1
> by
> > > > > default,
> > > > > > > so it's a matter of having support for writing v2. When the
> default
> > > > > > > changes to v2, then that behavior change would require a major
> > > version
> > > > > > > increase to signal changes to compatibility. I think we would
> also
> > > want
> > > > > > > clear documentation for each version that shows what versions
> of
> > > the
> > > > > format
> > > > > > > it can read, write, and what it will use by default. A table
> on the
> > > > > site
> > > > > > > would work.
> > > > > > >
> > > > > > > On Tue, Sep 10, 2019 at 2:51 PM Sean Busbey <busbey@apache.org
> >
> > > wrote:
> > > > > > >
> > > > > > > > What would it look like if we *did* have to make an
> incompatible
> > > data
> > > > > > > > format change after adopting "conventional" library version
> > > strings?
> > > > > > > >
> > > > > > > > What if we version the specification independent from the
> > > libraries
> > > > > > > > and then have the docs for the libraries claim spec version
> > > > > > > > compatibility?
> > > > > > > >
> > > > > > > > On Tue, Sep 10, 2019 at 3:55 PM Ryan Blue
> > > <rblue@netflix.com.invalid
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > +1 for changing the version strings to follow a more
> standard
> > > > > > > convention.
> > > > > > > > > We don't have any breaking format changes, so I think it is
> > > > > expected
> > > > > > > that
> > > > > > > > > the format compatibility version won't change.
> > > > > > > > >
> > > > > > > > > On Tue, Sep 10, 2019 at 7:28 AM Sean Busbey <
> busbey@apache.org
> > > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi folks!
> > > > > > > > > >
> > > > > > > > > > historically, Avro version numbers have had the form:
> > > > > > > > > >
> > > > > > > > > > <data compatibility> . <major library version> . <minor
> > > library
> > > > > > > > version>
> > > > > > > > > >
> > > > > > > > > > That is, the first number says wether or not we expect
> data
> > > > > > > > > > serialization to be compatible, and the second to say
> wether
> > > we
> > > > > > > expect
> > > > > > > > > > some library will be backwards incompatible however
> that's
> > > > > defined
> > > > > > > for
> > > > > > > > > > the library's language. For example, in the Java library
> > > when we
> > > > > make
> > > > > > > > > > changes to public method signatures such that folks can't
> > > just
> > > > > swap
> > > > > > > > > > out jar files of our implementation.
> > > > > > > > > >
> > > > > > > > > > While getting myself up to speed on the state of our
> release
> > > > > lines, I
> > > > > > > > > > noticed we already have the 1.9 release line in a branch,
> > > with
> > > > > master
> > > > > > > > > > set up for the next major library version. JIRA shows ~46
> > > issues
> > > > > that
> > > > > > > > > > are in 1.10 but not in a 1.9 release[1].
> > > > > > > > > >
> > > > > > > > > > I haven't looked at all of them yet, but the few I
> sampled
> > > don't
> > > > > see
> > > > > > > > > > to require a major version increment.
> > > > > > > > > >
> > > > > > > > > > I looked around our site and I also can't find anywhere
> that
> > > > > we've
> > > > > > > > > > documented our version strings. I know I've been in
> > > discussions
> > > > > in
> > > > > > > > > > other communities where our version strings have been
> > > surprising.
> > > > > > > e.g.
> > > > > > > > > > folks had assumed they can do a low-effort upgrade from
> 1.7
> > > to
> > > > > 1.8
> > > > > > > > > > only to find that there were documented
> incompatibilities and
> > > > > > > behavior
> > > > > > > > > > changes.
> > > > > > > > > >
> > > > > > > > > > Are we actively planning on rolling out 1.10? (like, do
> we
> > > have a
> > > > > > > goal
> > > > > > > > > > date?)
> > > > > > > > > >
> > > > > > > > > > I know that when 1.9 went out we EOLed 1.7 and 1.8 in
> part
> > > due
> > > > > to the
> > > > > > > > > > overhead of trying to maintain multiple release lines
> > > (especially
> > > > > > > once
> > > > > > > > > > that had so much baggage) while we're trying to
> reestablish
> > > good
> > > > > > > > > > habits on release cadence. How many major version are we
> > > > > planning to
> > > > > > > > > > keep going once 1.10 is ready?
> > > > > > > > > >
> > > > > > > > > > What do folks think about starting a CONTRIBUTING.md with
> > > some of
> > > > > > > > > > these expectations? Is there a better place to track it?
> > > > > > > > > >
> > > > > > > > > > [1] : https://s.apache.org/71yqv
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Ryan Blue
> > > > > > > > > Software Engineer
> > > > > > > > > Netflix
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Ryan Blue
> > > > > > > Software Engineer
> > > > > > > Netflix
> > > > > > >
> > > > >
> > >
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: [DISCUSS] version numbers and where changes should land

Posted by Sean Busbey <bu...@apache.org>.
> So, as I understand it. The whole 1.x version should be binary compatible.

the term "binary compatible" is overloaded, thanks to our
participation in the Java ecosystem. Every data file written in Avro
1.y is intended to be readable by every other Avro 1.y release,
regardless of language. That is true even if we know that there are
cases where errors in various language libraries has prevented
success.

In Java ecosystem parlance the term "binary compatible" also refers to
the ability to use a Java library in place of another without needing
to recompile any code that refers to said library. It is definitely
not the case that every Avro 1.y Java version has been binary
compatible in this sense (in fact just the opposite).

> For 1.9 we broke some of the API's because, at the time, the decision was
> that removing Jackson from the public API was required to move from
> Codehaus jackson (1.x) to the Fasterxml one (2.x). The public API shouldn't
> have exposed these methods in the first place.

I think this is confusing the two issues of "compatibility of
serialized bytes" and "compatibility of language APIs". This is part
of why I think we should stop relying on the first version number to
indicate "compatibility of serialized bytes".

> I wouldn't be in favor of switching to 10.x (dropping the 1. in front of
> it). What's the added value in this? I'm just afraid of changing this,
> would confuse a lot of downstream users.

The big advantage is that literally everywhere else in the software
ecosystems I've seen the first number in a version string is either
"major version" or "marketing version", usually the former. Folks
expect that if that version number hasn't changed then they should be
able to "easily" upgrade to use the newer library. In Avro that
plainly isn't true. I can think of multiple cases where other ASF
projects have gotten surprised that going from e.g. Java libraries for
Avro 1.7 to Avro 1.8 was a major version bump.

I agree that going from 1.x.y to 10.y.z might be confusing due to the
large number jump. I think going to "2.y.z" would clearly indicate
folks needed to pay attention to a version difference because the
first number changed. When we have their attention we can explain that
we're using it as major version from now on.


> Also, a similar discussion was on the Spark devlist, I think Michael has
> some valid points here:
> https://mail-archives.apache.org/mod_mbox/spark-dev/202002.mbox/browser

This is a month of email from dev@spark. Could you link to a specific
thread on lists.apache.org or provide a subject line?

On Mon, Mar 2, 2020 at 3:00 AM Driesprong, Fokko <fo...@driesprong.frl> wrote:
>
> So, as I understand it. The whole 1.x version should be binary compatible.
> So anything is written with Java 1.x should be readable with Python 1.x.
> We've been working on extending the integration tests as well.
>
> Not all languages support all the features, for example, many languages
> lack support for logical types. In this case, a datetime would be just read
> as an integer, so there is a fallback scenario.
>
> For 1.9 we broke some of the API's because, at the time, the decision was
> that removing Jackson from the public API was required to move from
> Codehaus jackson (1.x) to the Fasterxml one (2.x). The public API shouldn't
> have exposed these methods in the first place.
>
> I wouldn't be in favor of switching to 10.x (dropping the 1. in front of
> it). What's the added value in this? I'm just afraid of changing this,
> would confuse a lot of downstream users.
>
> Also, a similar discussion was on the Spark devlist, I think Michael has
> some valid points here:
> https://mail-archives.apache.org/mod_mbox/spark-dev/202002.mbox/browser
>
> Maybe it is good to formalize our policy, and put it on the website.
>
> Cheers, Fokko Driesprong
>
> Op vr 28 feb. 2020 om 17:53 schreef Sean Busbey <bu...@apache.org>:
>
> > Counterpoint on independently versioning the various languages. Do we
> > know if Python Avro X works with Java Avro Y as it is? It seems like
> > we already get surprised pretty often when they don't.
> >
> > If we stop including the "data compatibility version" or whatever
> > we're calling the first number, we'll need to get more formal on
> > versioning the specification and having libraries plainly label which
> > specification(s) they comport to.
> >
> > At the very least it seems like we'd make the _easy_ path easier for
> > the languages that are well maintained. Sure it'll be burden on those
> > languages that aren't well maintained, but it seems like those are
> > already in that position?
> >
> > On Thu, Feb 27, 2020 at 9:13 AM Ismaël Mejía <ie...@gmail.com> wrote:
> > >
> > > Bringing my comment from the JIRA ticket here for discussion:
> > >
> > > > "One argument against semantic versioning is the fact that Avro
> > supports
> > > 9 language APIs, so if let's say C++ breaks its backwards compatibility
> > > should we move the version number up for every single language? Sounds
> > like
> > > a burden and in particular a not easy task to track since we do not have
> > > proper validation of breaking changes in place for every language at this
> > > point.
> > > > ... (even if we separate release numbers per language) that seems like
> > a
> > > lot of work for probably a similar output because then users will doubt,
> > > wait is Python Avro 3.1.0 compatible with Java Avro 5.2.0? and they will
> > > probably be for the binary format."
> > >
> > > Also there is the case of interop tests, how will those act in this case.
> > > We will need a compatibility matrix, again I am not sure if it is the
> > best
> > > approach, looks like lots of work for not much in return.
> > >
> > >
> > >
> > > On Thu, Feb 27, 2020 at 12:21 PM Ryan Skraba <ry...@skraba.com> wrote:
> > >
> > > > Hello!  Resurrecting -- I think this was the last thread bringing up
> > this
> > > > issue!
> > > >
> > > > Since we've talked about releasing 1.10.x in May, and it's a nice
> > > > round number... what do you think about
> > > >
> > > > 1) finally dropping the prefix for the "specification version" and
> > > > calling it Avro 10.x
> > > >
> > > > 2) committing to semantic versioning for future releases
> > > >
> > > > I can see this being a hugely positive move for aligning with the
> > > > expectations of developers and projects... but it leads to a lot of
> > > > questions about releasing all the artifacts together.
> > > >
> > > > There's already a JIRA:
> > https://issues.apache.org/jira/browse/AVRO-2687
> > > >
> > > > Ryan
> > > >
> > > > On Fri, Sep 13, 2019 at 12:00 PM Driesprong, Fokko
> > <fo...@driesprong.frl>
> > > > wrote:
> > > > >
> > > > > Thanks Sean for bringing this up.
> > > > >
> > > > > For the 1.9 branch there were some incompatible changes in the API
> > with
> > > > > respect to 1.8.2. We've removed Jackson
> > > > > <https://github.com/apache/avro/pull/135> and Netty from the public
> > API.
> > > > > This is actually breaking some of the builds
> > > > > <https://github.com/apache/incubator-iceberg/pull/297>, so,
> > > > unfortunately,
> > > > > it isn't compatible, and therefore the major version bump.
> > > > >
> > > > > The 1.9.x branch still has support for the Joda time library, but
> > > > defaults
> > > > > to jsr310, but is still compatible (I believe). For 1.10 the plan is
> > to
> > > > > completely remove Joda from the codebase since it is officially
> > > > deprecated
> > > > > in favor of Java8 time (jsr310). A lot of this stuff is just changes
> > to
> > > > the
> > > > > Java API of Avro, which mostly involves changes to the LogicalTypes,
> > so
> > > > the
> > > > > actual format is still compatible (as it should).
> > > > >
> > > > > I agree with you Sean, that a lot of the changes that are targeted
> > for
> > > > 1.10
> > > > > could be cherry-picked back to the 1.9 branch. If someone is willing
> > to
> > > > do
> > > > > this, I would be grateful. However, maintaining a lot of different
> > > > branches
> > > > > is quite time-consuming in terms of release management of the
> > different
> > > > > versions. For Apache Avro 1.9.0 we actually had some regression bugs
> > > > which
> > > > > were blocking, therefore the 1.9.1 release.
> > > > >
> > > > > Personally I don't have big objection on bumping the major version if
> > > > there
> > > > > are breaking changes to one of the API's. But a big +1 on having a
> > > > > standardized approach on the versioning, this also includes a more
> > clear
> > > > > approach on documenting the upgrade process and a better changelog.
> > I've
> > > > > added summaries of the releases a the Github releases:
> > > > > https://github.com/apache/avro/releases but I think having this on
> > the
> > > > Avro
> > > > > website might be more appropriate.
> > > > >
> > > > > Cheers, Fokko Driesprong
> > > > >
> > > > >
> > > > >
> > > > > Op wo 11 sep. 2019 om 18:17 schreef Ryan Blue
> > <rblue@netflix.com.invalid
> > > > >:
> > > > >
> > > > > > > What would it look like if we *did* have to make an incompatible
> > data
> > > > > > format change after adopting "conventional" library version
> > strings?
> > > > > >
> > > > > > Let's call these format v1 and v2. The library must produce v1 by
> > > > default,
> > > > > > so it's a matter of having support for writing v2. When the default
> > > > > > changes to v2, then that behavior change would require a major
> > version
> > > > > > increase to signal changes to compatibility. I think we would also
> > want
> > > > > > clear documentation for each version that shows what versions of
> > the
> > > > format
> > > > > > it can read, write, and what it will use by default. A table on the
> > > > site
> > > > > > would work.
> > > > > >
> > > > > > On Tue, Sep 10, 2019 at 2:51 PM Sean Busbey <bu...@apache.org>
> > wrote:
> > > > > >
> > > > > > > What would it look like if we *did* have to make an incompatible
> > data
> > > > > > > format change after adopting "conventional" library version
> > strings?
> > > > > > >
> > > > > > > What if we version the specification independent from the
> > libraries
> > > > > > > and then have the docs for the libraries claim spec version
> > > > > > > compatibility?
> > > > > > >
> > > > > > > On Tue, Sep 10, 2019 at 3:55 PM Ryan Blue
> > <rblue@netflix.com.invalid
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > +1 for changing the version strings to follow a more standard
> > > > > > convention.
> > > > > > > > We don't have any breaking format changes, so I think it is
> > > > expected
> > > > > > that
> > > > > > > > the format compatibility version won't change.
> > > > > > > >
> > > > > > > > On Tue, Sep 10, 2019 at 7:28 AM Sean Busbey <busbey@apache.org
> > >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi folks!
> > > > > > > > >
> > > > > > > > > historically, Avro version numbers have had the form:
> > > > > > > > >
> > > > > > > > > <data compatibility> . <major library version> . <minor
> > library
> > > > > > > version>
> > > > > > > > >
> > > > > > > > > That is, the first number says wether or not we expect data
> > > > > > > > > serialization to be compatible, and the second to say wether
> > we
> > > > > > expect
> > > > > > > > > some library will be backwards incompatible however that's
> > > > defined
> > > > > > for
> > > > > > > > > the library's language. For example, in the Java library
> > when we
> > > > make
> > > > > > > > > changes to public method signatures such that folks can't
> > just
> > > > swap
> > > > > > > > > out jar files of our implementation.
> > > > > > > > >
> > > > > > > > > While getting myself up to speed on the state of our release
> > > > lines, I
> > > > > > > > > noticed we already have the 1.9 release line in a branch,
> > with
> > > > master
> > > > > > > > > set up for the next major library version. JIRA shows ~46
> > issues
> > > > that
> > > > > > > > > are in 1.10 but not in a 1.9 release[1].
> > > > > > > > >
> > > > > > > > > I haven't looked at all of them yet, but the few I sampled
> > don't
> > > > see
> > > > > > > > > to require a major version increment.
> > > > > > > > >
> > > > > > > > > I looked around our site and I also can't find anywhere that
> > > > we've
> > > > > > > > > documented our version strings. I know I've been in
> > discussions
> > > > in
> > > > > > > > > other communities where our version strings have been
> > surprising.
> > > > > > e.g.
> > > > > > > > > folks had assumed they can do a low-effort upgrade from 1.7
> > to
> > > > 1.8
> > > > > > > > > only to find that there were documented incompatibilities and
> > > > > > behavior
> > > > > > > > > changes.
> > > > > > > > >
> > > > > > > > > Are we actively planning on rolling out 1.10? (like, do we
> > have a
> > > > > > goal
> > > > > > > > > date?)
> > > > > > > > >
> > > > > > > > > I know that when 1.9 went out we EOLed 1.7 and 1.8 in part
> > due
> > > > to the
> > > > > > > > > overhead of trying to maintain multiple release lines
> > (especially
> > > > > > once
> > > > > > > > > that had so much baggage) while we're trying to reestablish
> > good
> > > > > > > > > habits on release cadence. How many major version are we
> > > > planning to
> > > > > > > > > keep going once 1.10 is ready?
> > > > > > > > >
> > > > > > > > > What do folks think about starting a CONTRIBUTING.md with
> > some of
> > > > > > > > > these expectations? Is there a better place to track it?
> > > > > > > > >
> > > > > > > > > [1] : https://s.apache.org/71yqv
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Ryan Blue
> > > > > > > > Software Engineer
> > > > > > > > Netflix
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Ryan Blue
> > > > > > Software Engineer
> > > > > > Netflix
> > > > > >
> > > >
> >