You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Arun C Murthy <ac...@hortonworks.com> on 2013/01/29 21:56:24 UTC

Release numbering for branch-2 releases

Folks,

There has been some discussions about incompatible changes in the hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few other jiras. Frankly, I'm surprised about some of them since the 'alpha' moniker was precisely to harden apis by changing them if necessary, borne out by the fact that every single release in hadoop-2 chain has had incompatible changes. This happened since we were releasing early, moving fast and breaking things. Furthermore, we'll have more in future as move towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS and YARN-142 (api changes) for YARN.

So, rather than debate more, I had a brief chat with Suresh and Todd. Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate the incompatibility a little better. This makes sense to me, as long as we are clear that we won't make any further *feature* releases in hadoop-2.0.x series (obviously we might be forced to do security/bug-fix release).

Going forward, I'd like to start locking down apis/protocols for a 'beta' release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha to make incompatible changes if necessary and we can call it hadoop-2.2.0-beta.

Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This forces us to do a real effort on making sure we lock down for hadoop-2.2.0-beta.

In summary:
# I plan to now release hadoop-2.1.0-alpha (this week).
# We make a real effort to lock down apis/protocols and release hadoop-2.2.0-beta, say in March.
# Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.

I'll start a separate thread on 'locking protocols' w.r.t client-protocols v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss this one separately.

Makes sense? Thoughts?

thanks,
Arun

PS: Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make some incompatible changes due to *unforeseen circumstances*, but no more gratuitous changes are allowed.

Re: Release numbering for branch-2 releases

Posted by Eli Collins <el...@cloudera.com>.

We also need to spell out what's permissible *before* GA as well.  The
alpha/beta labels, as I understand them, are not green lights to break
anything as long as it's not API compatibility.  The API compatibility
story has been somewhat fuzzy as well, eg MR2 requires users recompile all
their Hadoop 1.x jobs (ouch).  We've been working on stabilizing 2.x for a
while now and we need to start slating some changes to 3.x if we want to
get a 2.x GA release out soon.  To do that we have to consider issues for
end users (and downstream projects) upgrading from 0.23 releases and older
2.0.x releases, aside from just API compatibility, in terms of what's
permissible in the releases between now and GA.

Thanks,
Eli

On Wed, Jan 30, 2013 at 5:10 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> The discussions in HADOOP-9151 were related to wire-compatibility. I think
> we all agree that breaking API compatibility is not allowed without
> deprecating them first in a prior major release - this is something we have
> followed since hadoop-0.1.
>
> I agree we need to spell out what changes we can and cannot do *after* we
> go GA, for e.g.:
> # Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
> # Do we allow incompatible changes on Client-Server protocols? I would say
> *no*.
> # Do we allow incompatible changes on internal-server protocols (for e.g.
> NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support
> rolling-upgrades? I would like to not allow this, but I do not know how
> feasible this is. An option is to allow these changes between minor
> releases i.e. between hadoop-2.10 and hadoop-2.11.
> # Do we allow changes which force a HDFS metadata upgrade between a minor
> upgrade i.e. hadoop-2.20 to hadoop-2.21?
> # Clearly *no* incompatible changes (API/client-server/server-server)
> changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1
> have to be compatible among all respects.
>
> What else am I missing?
>
> I'll make sure we update our Roadmap wiki and other docs post this
> discussion.
>
> thanks,
> Arun
>
>
>
> On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:
>
> > Thanks for bringing this up Arun.  One of the issues is that we
> > haven't been clear about what type of compatibility breakages are
> > allowed, and which are not.  For example, renaming FileSystem#open is
> > incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> > a server/server APIs is OK pre-GA but probably not post GA, at least
> > in a point release, or required for a security fix, etc.
> > Configuration, data format, environment variable, changes etc can all
> > be similarly incompatible. The issue we had in HADOOP-9151 was someone
> > claimed it is not an incompatible change because it doesn't break API
> > compatibility even though it breaks wire compatibility. So let's be
> > clear about the types of incompatibility we are or are not permitting.
> > For example, will it be OK to merge a change before 2.2.0-beta that
> > requires an HDFS metadata upgrade? Or breaks client server wire
> > compatibility?  I've been assuming that changing an API annotated
> > Public/Stable still requires multiple major releases (one to deprecate
> > and one to remove), does the alpha label change that? To some people
> > the "alpha", "beta" label implies instability in terms of
> > quality/features, while to others it means unstable APIs (and to some
> > both) so it would be good to spell that out. In short, agree that we
> > really need to figure out what changes are permitted in what releases,
> > and we should update the docs accordingly (there's a start here:
> > http://wiki.apache.org/hadoop/Roadmap).
> >
> > Note that the 2.0.0 alpha release vote thread was clear that we
> > thought were all in agreement that we'd like to keep client/server
> > compatible post 2.0 - and there was no push back. We pulled a number
> > of jiras into the 2.0 release explicitly so that we could preserve
> > client/server compatibility going forward.  Here's the relevant part
> > of the thread as a refresher: http://s.apache.org/gQ
> >
> > "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> > envelope in branch-2, but didn't make it into this rc. So, that would
> > mean that future alphas would not be protocol-compatible with this
> > alpha. Per a discussion a few weeks ago, I think we all were in
> > agreement that, if possible, we'd like all 2.x to be compatible for
> > client-server communication, at least (even if we don't support
> > cross-version for the intra-cluster protocols)"
> >
> > Thanks,
> > Eli
> >
> > On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
> >> Folks,
> >>
> >> There has been some discussions about incompatible changes in the
> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
> few other jiras. Frankly, I'm surprised about some of them since the
> 'alpha' moniker was precisely to harden apis by changing them if necessary,
> borne out by the fact that every  single release in hadoop-2 chain has had
> incompatible changes. This happened since we were releasing early, moving
> fast and breaking things. Furthermore, we'll have more in future as move
> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
> and YARN-142 (api changes) for YARN.
> >>
> >> So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
> the incompatibility a little better. This makes sense to me, as long as we
> are clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
> >>
> >> Going forward, I'd like to start locking down apis/protocols for a
> 'beta' release. This way we'll have one *final* opportunity post
> hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
> call it hadoop-2.2.0-beta.
> >>
> >> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
> This forces us to do a real effort on making sure we lock down for
> hadoop-2.2.0-beta.
> >>
> >> In summary:
> >> # I plan to now release hadoop-2.1.0-alpha (this week).
> >> # We make a real effort to lock down apis/protocols and release
> hadoop-2.2.0-beta, say in March.
> >> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
> >>
> >> I'll start a separate thread on 'locking protocols' w.r.t
> client-protocols v/s internal protocols (to facilitate rolling upgrades
> etc.), let's discuss this one separately.
> >>
> >> Makes sense? Thoughts?
> >>
> >> thanks,
> >> Arun
> >>
> >> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to
> make some incompatible changes due to *unforeseen circumstances*, but no
> more gratuitous changes are allowed.
> >>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Release numbering for branch-2 releases

Posted by Chris Embree <ce...@gmail.com>.

Hi Arun, et. al.,

I hope you don't mind a non-contributor butting in here.  I'm currently a
Hadoop administrator and former application developer (non-hadoop).

regarding GA release changes, I think Arun has got a lot of good ideas here.

I think it's better to add new features via new flags, parameters, etc.
and deprecate "abandon" or "bad" defaults, values, etc.  At the rate Hadoop
is changing, I think you could Deprecate in GA 0.30 and change defaults in
GA 0.40.
As a user that would allow me to upgrade to a new GA version without
significant changes to my config.  As we are ready to introduce new
features, we could Add the required changes to configs.

Please no changes that require me to "migrate" date between dot releases.
 I fully expect that applications that run CentOS 6.2  will run on 6.3 with
no problems.  CentOS 5.6 to 6.3 is another matter, as expected.

As it stands, deployed on Hadoop 1.x in prod and plan to test 2.x for
several months before upgrading.

I know you guys are excited about all of the cool improvements you're
making.  Just try to remember Hadoop adoption is growing by leaps and
bounds, breaking things for the sake of "better" is not always good for the
project. :)

Just my $0.02

On Wed, Jan 30, 2013 at 8:10 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> The discussions in HADOOP-9151 were related to wire-compatibility. I think
> we all agree that breaking API compatibility is not allowed without
> deprecating them first in a prior major release - this is something we have
> followed since hadoop-0.1.
>
> I agree we need to spell out what changes we can and cannot do *after* we
> go GA, for e.g.:
> # Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
> # Do we allow incompatible changes on Client-Server protocols? I would say
> *no*.
> # Do we allow incompatible changes on internal-server protocols (for e.g.
> NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support
> rolling-upgrades? I would like to not allow this, but I do not know how
> feasible this is. An option is to allow these changes between minor
> releases i.e. between hadoop-2.10 and hadoop-2.11.
> # Do we allow changes which force a HDFS metadata upgrade between a minor
> upgrade i.e. hadoop-2.20 to hadoop-2.21?
> # Clearly *no* incompatible changes (API/client-server/server-server)
> changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1
> have to be compatible among all respects.
>
> What else am I missing?
>
> I'll make sure we update our Roadmap wiki and other docs post this
> discussion.
>
> thanks,
> Arun
>
>
>
> On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:
>
> > Thanks for bringing this up Arun.  One of the issues is that we
> > haven't been clear about what type of compatibility breakages are
> > allowed, and which are not.  For example, renaming FileSystem#open is
> > incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> > a server/server APIs is OK pre-GA but probably not post GA, at least
> > in a point release, or required for a security fix, etc.
> > Configuration, data format, environment variable, changes etc can all
> > be similarly incompatible. The issue we had in HADOOP-9151 was someone
> > claimed it is not an incompatible change because it doesn't break API
> > compatibility even though it breaks wire compatibility. So let's be
> > clear about the types of incompatibility we are or are not permitting.
> > For example, will it be OK to merge a change before 2.2.0-beta that
> > requires an HDFS metadata upgrade? Or breaks client server wire
> > compatibility?  I've been assuming that changing an API annotated
> > Public/Stable still requires multiple major releases (one to deprecate
> > and one to remove), does the alpha label change that? To some people
> > the "alpha", "beta" label implies instability in terms of
> > quality/features, while to others it means unstable APIs (and to some
> > both) so it would be good to spell that out. In short, agree that we
> > really need to figure out what changes are permitted in what releases,
> > and we should update the docs accordingly (there's a start here:
> > http://wiki.apache.org/hadoop/Roadmap).
> >
> > Note that the 2.0.0 alpha release vote thread was clear that we
> > thought were all in agreement that we'd like to keep client/server
> > compatible post 2.0 - and there was no push back. We pulled a number
> > of jiras into the 2.0 release explicitly so that we could preserve
> > client/server compatibility going forward.  Here's the relevant part
> > of the thread as a refresher: http://s.apache.org/gQ
> >
> > "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> > envelope in branch-2, but didn't make it into this rc. So, that would
> > mean that future alphas would not be protocol-compatible with this
> > alpha. Per a discussion a few weeks ago, I think we all were in
> > agreement that, if possible, we'd like all 2.x to be compatible for
> > client-server communication, at least (even if we don't support
> > cross-version for the intra-cluster protocols)"
> >
> > Thanks,
> > Eli
> >
> > On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
> >> Folks,
> >>
> >> There has been some discussions about incompatible changes in the
> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
> few other jiras. Frankly, I'm surprised about some of them since the
> 'alpha' moniker was precisely to harden apis by changing them if necessary,
> borne out by the fact that every  single release in hadoop-2 chain has had
> incompatible changes. This happened since we were releasing early, moving
> fast and breaking things. Furthermore, we'll have more in future as move
> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
> and YARN-142 (api changes) for YARN.
> >>
> >> So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
> the incompatibility a little better. This makes sense to me, as long as we
> are clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
> >>
> >> Going forward, I'd like to start locking down apis/protocols for a
> 'beta' release. This way we'll have one *final* opportunity post
> hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
> call it hadoop-2.2.0-beta.
> >>
> >> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
> This forces us to do a real effort on making sure we lock down for
> hadoop-2.2.0-beta.
> >>
> >> In summary:
> >> # I plan to now release hadoop-2.1.0-alpha (this week).
> >> # We make a real effort to lock down apis/protocols and release
> hadoop-2.2.0-beta, say in March.
> >> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
> >>
> >> I'll start a separate thread on 'locking protocols' w.r.t
> client-protocols v/s internal protocols (to facilitate rolling upgrades
> etc.), let's discuss this one separately.
> >>
> >> Makes sense? Thoughts?
> >>
> >> thanks,
> >> Arun
> >>
> >> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to
> make some incompatible changes due to *unforeseen circumstances*, but no
> more gratuitous changes are allowed.
> >>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Release numbering for branch-2 releases

Posted by Eli Collins <el...@cloudera.com>.

We also need to spell out what's permissible *before* GA as well.  The
alpha/beta labels, as I understand them, are not green lights to break
anything as long as it's not API compatibility.  The API compatibility
story has been somewhat fuzzy as well, eg MR2 requires users recompile all
their Hadoop 1.x jobs (ouch).  We've been working on stabilizing 2.x for a
while now and we need to start slating some changes to 3.x if we want to
get a 2.x GA release out soon.  To do that we have to consider issues for
end users (and downstream projects) upgrading from 0.23 releases and older
2.0.x releases, aside from just API compatibility, in terms of what's
permissible in the releases between now and GA.

Thanks,
Eli

On Wed, Jan 30, 2013 at 5:10 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> The discussions in HADOOP-9151 were related to wire-compatibility. I think
> we all agree that breaking API compatibility is not allowed without
> deprecating them first in a prior major release - this is something we have
> followed since hadoop-0.1.
>
> I agree we need to spell out what changes we can and cannot do *after* we
> go GA, for e.g.:
> # Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
> # Do we allow incompatible changes on Client-Server protocols? I would say
> *no*.
> # Do we allow incompatible changes on internal-server protocols (for e.g.
> NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support
> rolling-upgrades? I would like to not allow this, but I do not know how
> feasible this is. An option is to allow these changes between minor
> releases i.e. between hadoop-2.10 and hadoop-2.11.
> # Do we allow changes which force a HDFS metadata upgrade between a minor
> upgrade i.e. hadoop-2.20 to hadoop-2.21?
> # Clearly *no* incompatible changes (API/client-server/server-server)
> changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1
> have to be compatible among all respects.
>
> What else am I missing?
>
> I'll make sure we update our Roadmap wiki and other docs post this
> discussion.
>
> thanks,
> Arun
>
>
>
> On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:
>
> > Thanks for bringing this up Arun.  One of the issues is that we
> > haven't been clear about what type of compatibility breakages are
> > allowed, and which are not.  For example, renaming FileSystem#open is
> > incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> > a server/server APIs is OK pre-GA but probably not post GA, at least
> > in a point release, or required for a security fix, etc.
> > Configuration, data format, environment variable, changes etc can all
> > be similarly incompatible. The issue we had in HADOOP-9151 was someone
> > claimed it is not an incompatible change because it doesn't break API
> > compatibility even though it breaks wire compatibility. So let's be
> > clear about the types of incompatibility we are or are not permitting.
> > For example, will it be OK to merge a change before 2.2.0-beta that
> > requires an HDFS metadata upgrade? Or breaks client server wire
> > compatibility?  I've been assuming that changing an API annotated
> > Public/Stable still requires multiple major releases (one to deprecate
> > and one to remove), does the alpha label change that? To some people
> > the "alpha", "beta" label implies instability in terms of
> > quality/features, while to others it means unstable APIs (and to some
> > both) so it would be good to spell that out. In short, agree that we
> > really need to figure out what changes are permitted in what releases,
> > and we should update the docs accordingly (there's a start here:
> > http://wiki.apache.org/hadoop/Roadmap).
> >
> > Note that the 2.0.0 alpha release vote thread was clear that we
> > thought were all in agreement that we'd like to keep client/server
> > compatible post 2.0 - and there was no push back. We pulled a number
> > of jiras into the 2.0 release explicitly so that we could preserve
> > client/server compatibility going forward.  Here's the relevant part
> > of the thread as a refresher: http://s.apache.org/gQ
> >
> > "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> > envelope in branch-2, but didn't make it into this rc. So, that would
> > mean that future alphas would not be protocol-compatible with this
> > alpha. Per a discussion a few weeks ago, I think we all were in
> > agreement that, if possible, we'd like all 2.x to be compatible for
> > client-server communication, at least (even if we don't support
> > cross-version for the intra-cluster protocols)"
> >
> > Thanks,
> > Eli
> >
> > On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
> >> Folks,
> >>
> >> There has been some discussions about incompatible changes in the
> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
> few other jiras. Frankly, I'm surprised about some of them since the
> 'alpha' moniker was precisely to harden apis by changing them if necessary,
> borne out by the fact that every  single release in hadoop-2 chain has had
> incompatible changes. This happened since we were releasing early, moving
> fast and breaking things. Furthermore, we'll have more in future as move
> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
> and YARN-142 (api changes) for YARN.
> >>
> >> So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
> the incompatibility a little better. This makes sense to me, as long as we
> are clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
> >>
> >> Going forward, I'd like to start locking down apis/protocols for a
> 'beta' release. This way we'll have one *final* opportunity post
> hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
> call it hadoop-2.2.0-beta.
> >>
> >> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
> This forces us to do a real effort on making sure we lock down for
> hadoop-2.2.0-beta.
> >>
> >> In summary:
> >> # I plan to now release hadoop-2.1.0-alpha (this week).
> >> # We make a real effort to lock down apis/protocols and release
> hadoop-2.2.0-beta, say in March.
> >> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
> >>
> >> I'll start a separate thread on 'locking protocols' w.r.t
> client-protocols v/s internal protocols (to facilitate rolling upgrades
> etc.), let's discuss this one separately.
> >>
> >> Makes sense? Thoughts?
> >>
> >> thanks,
> >> Arun
> >>
> >> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to
> make some incompatible changes due to *unforeseen circumstances*, but no
> more gratuitous changes are allowed.
> >>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Release numbering for branch-2 releases

Posted by Eli Collins <el...@cloudera.com>.

We also need to spell out what's permissible *before* GA as well.  The
alpha/beta labels, as I understand them, are not green lights to break
anything as long as it's not API compatibility.  The API compatibility
story has been somewhat fuzzy as well, eg MR2 requires users recompile all
their Hadoop 1.x jobs (ouch).  We've been working on stabilizing 2.x for a
while now and we need to start slating some changes to 3.x if we want to
get a 2.x GA release out soon.  To do that we have to consider issues for
end users (and downstream projects) upgrading from 0.23 releases and older
2.0.x releases, aside from just API compatibility, in terms of what's
permissible in the releases between now and GA.

Thanks,
Eli

On Wed, Jan 30, 2013 at 5:10 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> The discussions in HADOOP-9151 were related to wire-compatibility. I think
> we all agree that breaking API compatibility is not allowed without
> deprecating them first in a prior major release - this is something we have
> followed since hadoop-0.1.
>
> I agree we need to spell out what changes we can and cannot do *after* we
> go GA, for e.g.:
> # Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
> # Do we allow incompatible changes on Client-Server protocols? I would say
> *no*.
> # Do we allow incompatible changes on internal-server protocols (for e.g.
> NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support
> rolling-upgrades? I would like to not allow this, but I do not know how
> feasible this is. An option is to allow these changes between minor
> releases i.e. between hadoop-2.10 and hadoop-2.11.
> # Do we allow changes which force a HDFS metadata upgrade between a minor
> upgrade i.e. hadoop-2.20 to hadoop-2.21?
> # Clearly *no* incompatible changes (API/client-server/server-server)
> changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1
> have to be compatible among all respects.
>
> What else am I missing?
>
> I'll make sure we update our Roadmap wiki and other docs post this
> discussion.
>
> thanks,
> Arun
>
>
>
> On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:
>
> > Thanks for bringing this up Arun.  One of the issues is that we
> > haven't been clear about what type of compatibility breakages are
> > allowed, and which are not.  For example, renaming FileSystem#open is
> > incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> > a server/server APIs is OK pre-GA but probably not post GA, at least
> > in a point release, or required for a security fix, etc.
> > Configuration, data format, environment variable, changes etc can all
> > be similarly incompatible. The issue we had in HADOOP-9151 was someone
> > claimed it is not an incompatible change because it doesn't break API
> > compatibility even though it breaks wire compatibility. So let's be
> > clear about the types of incompatibility we are or are not permitting.
> > For example, will it be OK to merge a change before 2.2.0-beta that
> > requires an HDFS metadata upgrade? Or breaks client server wire
> > compatibility?  I've been assuming that changing an API annotated
> > Public/Stable still requires multiple major releases (one to deprecate
> > and one to remove), does the alpha label change that? To some people
> > the "alpha", "beta" label implies instability in terms of
> > quality/features, while to others it means unstable APIs (and to some
> > both) so it would be good to spell that out. In short, agree that we
> > really need to figure out what changes are permitted in what releases,
> > and we should update the docs accordingly (there's a start here:
> > http://wiki.apache.org/hadoop/Roadmap).
> >
> > Note that the 2.0.0 alpha release vote thread was clear that we
> > thought were all in agreement that we'd like to keep client/server
> > compatible post 2.0 - and there was no push back. We pulled a number
> > of jiras into the 2.0 release explicitly so that we could preserve
> > client/server compatibility going forward.  Here's the relevant part
> > of the thread as a refresher: http://s.apache.org/gQ
> >
> > "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> > envelope in branch-2, but didn't make it into this rc. So, that would
> > mean that future alphas would not be protocol-compatible with this
> > alpha. Per a discussion a few weeks ago, I think we all were in
> > agreement that, if possible, we'd like all 2.x to be compatible for
> > client-server communication, at least (even if we don't support
> > cross-version for the intra-cluster protocols)"
> >
> > Thanks,
> > Eli
> >
> > On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
> >> Folks,
> >>
> >> There has been some discussions about incompatible changes in the
> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
> few other jiras. Frankly, I'm surprised about some of them since the
> 'alpha' moniker was precisely to harden apis by changing them if necessary,
> borne out by the fact that every  single release in hadoop-2 chain has had
> incompatible changes. This happened since we were releasing early, moving
> fast and breaking things. Furthermore, we'll have more in future as move
> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
> and YARN-142 (api changes) for YARN.
> >>
> >> So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
> the incompatibility a little better. This makes sense to me, as long as we
> are clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
> >>
> >> Going forward, I'd like to start locking down apis/protocols for a
> 'beta' release. This way we'll have one *final* opportunity post
> hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
> call it hadoop-2.2.0-beta.
> >>
> >> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
> This forces us to do a real effort on making sure we lock down for
> hadoop-2.2.0-beta.
> >>
> >> In summary:
> >> # I plan to now release hadoop-2.1.0-alpha (this week).
> >> # We make a real effort to lock down apis/protocols and release
> hadoop-2.2.0-beta, say in March.
> >> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
> >>
> >> I'll start a separate thread on 'locking protocols' w.r.t
> client-protocols v/s internal protocols (to facilitate rolling upgrades
> etc.), let's discuss this one separately.
> >>
> >> Makes sense? Thoughts?
> >>
> >> thanks,
> >> Arun
> >>
> >> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to
> make some incompatible changes due to *unforeseen circumstances*, but no
> more gratuitous changes are allowed.
> >>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Release numbering for branch-2 releases

Posted by Eli Collins <el...@cloudera.com>.

We also need to spell out what's permissible *before* GA as well.  The
alpha/beta labels, as I understand them, are not green lights to break
anything as long as it's not API compatibility.  The API compatibility
story has been somewhat fuzzy as well, eg MR2 requires users recompile all
their Hadoop 1.x jobs (ouch).  We've been working on stabilizing 2.x for a
while now and we need to start slating some changes to 3.x if we want to
get a 2.x GA release out soon.  To do that we have to consider issues for
end users (and downstream projects) upgrading from 0.23 releases and older
2.0.x releases, aside from just API compatibility, in terms of what's
permissible in the releases between now and GA.

Thanks,
Eli

On Wed, Jan 30, 2013 at 5:10 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> The discussions in HADOOP-9151 were related to wire-compatibility. I think
> we all agree that breaking API compatibility is not allowed without
> deprecating them first in a prior major release - this is something we have
> followed since hadoop-0.1.
>
> I agree we need to spell out what changes we can and cannot do *after* we
> go GA, for e.g.:
> # Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
> # Do we allow incompatible changes on Client-Server protocols? I would say
> *no*.
> # Do we allow incompatible changes on internal-server protocols (for e.g.
> NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support
> rolling-upgrades? I would like to not allow this, but I do not know how
> feasible this is. An option is to allow these changes between minor
> releases i.e. between hadoop-2.10 and hadoop-2.11.
> # Do we allow changes which force a HDFS metadata upgrade between a minor
> upgrade i.e. hadoop-2.20 to hadoop-2.21?
> # Clearly *no* incompatible changes (API/client-server/server-server)
> changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1
> have to be compatible among all respects.
>
> What else am I missing?
>
> I'll make sure we update our Roadmap wiki and other docs post this
> discussion.
>
> thanks,
> Arun
>
>
>
> On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:
>
> > Thanks for bringing this up Arun.  One of the issues is that we
> > haven't been clear about what type of compatibility breakages are
> > allowed, and which are not.  For example, renaming FileSystem#open is
> > incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> > a server/server APIs is OK pre-GA but probably not post GA, at least
> > in a point release, or required for a security fix, etc.
> > Configuration, data format, environment variable, changes etc can all
> > be similarly incompatible. The issue we had in HADOOP-9151 was someone
> > claimed it is not an incompatible change because it doesn't break API
> > compatibility even though it breaks wire compatibility. So let's be
> > clear about the types of incompatibility we are or are not permitting.
> > For example, will it be OK to merge a change before 2.2.0-beta that
> > requires an HDFS metadata upgrade? Or breaks client server wire
> > compatibility?  I've been assuming that changing an API annotated
> > Public/Stable still requires multiple major releases (one to deprecate
> > and one to remove), does the alpha label change that? To some people
> > the "alpha", "beta" label implies instability in terms of
> > quality/features, while to others it means unstable APIs (and to some
> > both) so it would be good to spell that out. In short, agree that we
> > really need to figure out what changes are permitted in what releases,
> > and we should update the docs accordingly (there's a start here:
> > http://wiki.apache.org/hadoop/Roadmap).
> >
> > Note that the 2.0.0 alpha release vote thread was clear that we
> > thought were all in agreement that we'd like to keep client/server
> > compatible post 2.0 - and there was no push back. We pulled a number
> > of jiras into the 2.0 release explicitly so that we could preserve
> > client/server compatibility going forward.  Here's the relevant part
> > of the thread as a refresher: http://s.apache.org/gQ
> >
> > "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> > envelope in branch-2, but didn't make it into this rc. So, that would
> > mean that future alphas would not be protocol-compatible with this
> > alpha. Per a discussion a few weeks ago, I think we all were in
> > agreement that, if possible, we'd like all 2.x to be compatible for
> > client-server communication, at least (even if we don't support
> > cross-version for the intra-cluster protocols)"
> >
> > Thanks,
> > Eli
> >
> > On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
> >> Folks,
> >>
> >> There has been some discussions about incompatible changes in the
> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
> few other jiras. Frankly, I'm surprised about some of them since the
> 'alpha' moniker was precisely to harden apis by changing them if necessary,
> borne out by the fact that every  single release in hadoop-2 chain has had
> incompatible changes. This happened since we were releasing early, moving
> fast and breaking things. Furthermore, we'll have more in future as move
> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
> and YARN-142 (api changes) for YARN.
> >>
> >> So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
> the incompatibility a little better. This makes sense to me, as long as we
> are clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
> >>
> >> Going forward, I'd like to start locking down apis/protocols for a
> 'beta' release. This way we'll have one *final* opportunity post
> hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
> call it hadoop-2.2.0-beta.
> >>
> >> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
> This forces us to do a real effort on making sure we lock down for
> hadoop-2.2.0-beta.
> >>
> >> In summary:
> >> # I plan to now release hadoop-2.1.0-alpha (this week).
> >> # We make a real effort to lock down apis/protocols and release
> hadoop-2.2.0-beta, say in March.
> >> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
> >>
> >> I'll start a separate thread on 'locking protocols' w.r.t
> client-protocols v/s internal protocols (to facilitate rolling upgrades
> etc.), let's discuss this one separately.
> >>
> >> Makes sense? Thoughts?
> >>
> >> thanks,
> >> Arun
> >>
> >> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to
> make some incompatible changes due to *unforeseen circumstances*, but no
> more gratuitous changes are allowed.
> >>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

The discussions in HADOOP-9151 were related to wire-compatibility. I think we all agree that breaking API compatibility is not allowed without deprecating them first in a prior major release - this is something we have followed since hadoop-0.1.

I agree we need to spell out what changes we can and cannot do *after* we go GA, for e.g.:
# Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
# Do we allow incompatible changes on Client-Server protocols? I would say *no*.
# Do we allow incompatible changes on internal-server protocols (for e.g. NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support rolling-upgrades? I would like to not allow this, but I do not know how feasible this is. An option is to allow these changes between minor releases i.e. between hadoop-2.10 and hadoop-2.11.
# Do we allow changes which force a HDFS metadata upgrade between a minor upgrade i.e. hadoop-2.20 to hadoop-2.21? 
# Clearly *no* incompatible changes (API/client-server/server-server) changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1 have to be compatible among all respects.

What else am I missing?

I'll make sure we update our Roadmap wiki and other docs post this discussion.

thanks,
Arun



On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:

> Thanks for bringing this up Arun.  One of the issues is that we
> haven't been clear about what type of compatibility breakages are
> allowed, and which are not.  For example, renaming FileSystem#open is
> incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> a server/server APIs is OK pre-GA but probably not post GA, at least
> in a point release, or required for a security fix, etc.
> Configuration, data format, environment variable, changes etc can all
> be similarly incompatible. The issue we had in HADOOP-9151 was someone
> claimed it is not an incompatible change because it doesn't break API
> compatibility even though it breaks wire compatibility. So let's be
> clear about the types of incompatibility we are or are not permitting.
> For example, will it be OK to merge a change before 2.2.0-beta that
> requires an HDFS metadata upgrade? Or breaks client server wire
> compatibility?  I've been assuming that changing an API annotated
> Public/Stable still requires multiple major releases (one to deprecate
> and one to remove), does the alpha label change that? To some people
> the "alpha", "beta" label implies instability in terms of
> quality/features, while to others it means unstable APIs (and to some
> both) so it would be good to spell that out. In short, agree that we
> really need to figure out what changes are permitted in what releases,
> and we should update the docs accordingly (there's a start here:
> http://wiki.apache.org/hadoop/Roadmap).
> 
> Note that the 2.0.0 alpha release vote thread was clear that we
> thought were all in agreement that we'd like to keep client/server
> compatible post 2.0 - and there was no push back. We pulled a number
> of jiras into the 2.0 release explicitly so that we could preserve
> client/server compatibility going forward.  Here's the relevant part
> of the thread as a refresher: http://s.apache.org/gQ
> 
> "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> envelope in branch-2, but didn't make it into this rc. So, that would
> mean that future alphas would not be protocol-compatible with this
> alpha. Per a discussion a few weeks ago, I think we all were in
> agreement that, if possible, we'd like all 2.x to be compatible for
> client-server communication, at least (even if we don't support
> cross-version for the intra-cluster protocols)"
> 
> Thanks,
> Eli
> 
> On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> Folks,
>> 
>> There has been some discussions about incompatible changes in the hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few other jiras. Frankly, I'm surprised about some of them since the 'alpha' moniker was precisely to harden apis by changing them if necessary, borne out by the fact that every  single release in hadoop-2 chain has had incompatible changes. This happened since we were releasing early, moving fast and breaking things. Furthermore, we'll have more in future as move towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS and YARN-142 (api changes) for YARN.
>> 
>> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate the incompatibility a little better. This makes sense to me, as long as we are clear that we won't make any further *feature* releases in hadoop-2.0.x series (obviously we might be forced to do security/bug-fix release).
>> 
>> Going forward, I'd like to start locking down apis/protocols for a 'beta' release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha to make incompatible changes if necessary and we can call it hadoop-2.2.0-beta.
>> 
>> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This forces us to do a real effort on making sure we lock down for hadoop-2.2.0-beta.
>> 
>> In summary:
>> # I plan to now release hadoop-2.1.0-alpha (this week).
>> # We make a real effort to lock down apis/protocols and release hadoop-2.2.0-beta, say in March.
>> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>> 
>> I'll start a separate thread on 'locking protocols' w.r.t client-protocols v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss this one separately.
>> 
>> Makes sense? Thoughts?
>> 
>> thanks,
>> Arun
>> 
>> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make some incompatible changes due to *unforeseen circumstances*, but no more gratuitous changes are allowed.
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

The discussions in HADOOP-9151 were related to wire-compatibility. I think we all agree that breaking API compatibility is not allowed without deprecating them first in a prior major release - this is something we have followed since hadoop-0.1.

I agree we need to spell out what changes we can and cannot do *after* we go GA, for e.g.:
# Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
# Do we allow incompatible changes on Client-Server protocols? I would say *no*.
# Do we allow incompatible changes on internal-server protocols (for e.g. NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support rolling-upgrades? I would like to not allow this, but I do not know how feasible this is. An option is to allow these changes between minor releases i.e. between hadoop-2.10 and hadoop-2.11.
# Do we allow changes which force a HDFS metadata upgrade between a minor upgrade i.e. hadoop-2.20 to hadoop-2.21? 
# Clearly *no* incompatible changes (API/client-server/server-server) changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1 have to be compatible among all respects.

What else am I missing?

I'll make sure we update our Roadmap wiki and other docs post this discussion.

thanks,
Arun



On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:

> Thanks for bringing this up Arun.  One of the issues is that we
> haven't been clear about what type of compatibility breakages are
> allowed, and which are not.  For example, renaming FileSystem#open is
> incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> a server/server APIs is OK pre-GA but probably not post GA, at least
> in a point release, or required for a security fix, etc.
> Configuration, data format, environment variable, changes etc can all
> be similarly incompatible. The issue we had in HADOOP-9151 was someone
> claimed it is not an incompatible change because it doesn't break API
> compatibility even though it breaks wire compatibility. So let's be
> clear about the types of incompatibility we are or are not permitting.
> For example, will it be OK to merge a change before 2.2.0-beta that
> requires an HDFS metadata upgrade? Or breaks client server wire
> compatibility?  I've been assuming that changing an API annotated
> Public/Stable still requires multiple major releases (one to deprecate
> and one to remove), does the alpha label change that? To some people
> the "alpha", "beta" label implies instability in terms of
> quality/features, while to others it means unstable APIs (and to some
> both) so it would be good to spell that out. In short, agree that we
> really need to figure out what changes are permitted in what releases,
> and we should update the docs accordingly (there's a start here:
> http://wiki.apache.org/hadoop/Roadmap).
> 
> Note that the 2.0.0 alpha release vote thread was clear that we
> thought were all in agreement that we'd like to keep client/server
> compatible post 2.0 - and there was no push back. We pulled a number
> of jiras into the 2.0 release explicitly so that we could preserve
> client/server compatibility going forward.  Here's the relevant part
> of the thread as a refresher: http://s.apache.org/gQ
> 
> "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> envelope in branch-2, but didn't make it into this rc. So, that would
> mean that future alphas would not be protocol-compatible with this
> alpha. Per a discussion a few weeks ago, I think we all were in
> agreement that, if possible, we'd like all 2.x to be compatible for
> client-server communication, at least (even if we don't support
> cross-version for the intra-cluster protocols)"
> 
> Thanks,
> Eli
> 
> On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> Folks,
>> 
>> There has been some discussions about incompatible changes in the hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few other jiras. Frankly, I'm surprised about some of them since the 'alpha' moniker was precisely to harden apis by changing them if necessary, borne out by the fact that every  single release in hadoop-2 chain has had incompatible changes. This happened since we were releasing early, moving fast and breaking things. Furthermore, we'll have more in future as move towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS and YARN-142 (api changes) for YARN.
>> 
>> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate the incompatibility a little better. This makes sense to me, as long as we are clear that we won't make any further *feature* releases in hadoop-2.0.x series (obviously we might be forced to do security/bug-fix release).
>> 
>> Going forward, I'd like to start locking down apis/protocols for a 'beta' release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha to make incompatible changes if necessary and we can call it hadoop-2.2.0-beta.
>> 
>> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This forces us to do a real effort on making sure we lock down for hadoop-2.2.0-beta.
>> 
>> In summary:
>> # I plan to now release hadoop-2.1.0-alpha (this week).
>> # We make a real effort to lock down apis/protocols and release hadoop-2.2.0-beta, say in March.
>> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>> 
>> I'll start a separate thread on 'locking protocols' w.r.t client-protocols v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss this one separately.
>> 
>> Makes sense? Thoughts?
>> 
>> thanks,
>> Arun
>> 
>> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make some incompatible changes due to *unforeseen circumstances*, but no more gratuitous changes are allowed.
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

The discussions in HADOOP-9151 were related to wire-compatibility. I think we all agree that breaking API compatibility is not allowed without deprecating them first in a prior major release - this is something we have followed since hadoop-0.1.

I agree we need to spell out what changes we can and cannot do *after* we go GA, for e.g.:
# Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
# Do we allow incompatible changes on Client-Server protocols? I would say *no*.
# Do we allow incompatible changes on internal-server protocols (for e.g. NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support rolling-upgrades? I would like to not allow this, but I do not know how feasible this is. An option is to allow these changes between minor releases i.e. between hadoop-2.10 and hadoop-2.11.
# Do we allow changes which force a HDFS metadata upgrade between a minor upgrade i.e. hadoop-2.20 to hadoop-2.21? 
# Clearly *no* incompatible changes (API/client-server/server-server) changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1 have to be compatible among all respects.

What else am I missing?

I'll make sure we update our Roadmap wiki and other docs post this discussion.

thanks,
Arun



On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:

> Thanks for bringing this up Arun.  One of the issues is that we
> haven't been clear about what type of compatibility breakages are
> allowed, and which are not.  For example, renaming FileSystem#open is
> incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> a server/server APIs is OK pre-GA but probably not post GA, at least
> in a point release, or required for a security fix, etc.
> Configuration, data format, environment variable, changes etc can all
> be similarly incompatible. The issue we had in HADOOP-9151 was someone
> claimed it is not an incompatible change because it doesn't break API
> compatibility even though it breaks wire compatibility. So let's be
> clear about the types of incompatibility we are or are not permitting.
> For example, will it be OK to merge a change before 2.2.0-beta that
> requires an HDFS metadata upgrade? Or breaks client server wire
> compatibility?  I've been assuming that changing an API annotated
> Public/Stable still requires multiple major releases (one to deprecate
> and one to remove), does the alpha label change that? To some people
> the "alpha", "beta" label implies instability in terms of
> quality/features, while to others it means unstable APIs (and to some
> both) so it would be good to spell that out. In short, agree that we
> really need to figure out what changes are permitted in what releases,
> and we should update the docs accordingly (there's a start here:
> http://wiki.apache.org/hadoop/Roadmap).
> 
> Note that the 2.0.0 alpha release vote thread was clear that we
> thought were all in agreement that we'd like to keep client/server
> compatible post 2.0 - and there was no push back. We pulled a number
> of jiras into the 2.0 release explicitly so that we could preserve
> client/server compatibility going forward.  Here's the relevant part
> of the thread as a refresher: http://s.apache.org/gQ
> 
> "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> envelope in branch-2, but didn't make it into this rc. So, that would
> mean that future alphas would not be protocol-compatible with this
> alpha. Per a discussion a few weeks ago, I think we all were in
> agreement that, if possible, we'd like all 2.x to be compatible for
> client-server communication, at least (even if we don't support
> cross-version for the intra-cluster protocols)"
> 
> Thanks,
> Eli
> 
> On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> Folks,
>> 
>> There has been some discussions about incompatible changes in the hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few other jiras. Frankly, I'm surprised about some of them since the 'alpha' moniker was precisely to harden apis by changing them if necessary, borne out by the fact that every  single release in hadoop-2 chain has had incompatible changes. This happened since we were releasing early, moving fast and breaking things. Furthermore, we'll have more in future as move towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS and YARN-142 (api changes) for YARN.
>> 
>> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate the incompatibility a little better. This makes sense to me, as long as we are clear that we won't make any further *feature* releases in hadoop-2.0.x series (obviously we might be forced to do security/bug-fix release).
>> 
>> Going forward, I'd like to start locking down apis/protocols for a 'beta' release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha to make incompatible changes if necessary and we can call it hadoop-2.2.0-beta.
>> 
>> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This forces us to do a real effort on making sure we lock down for hadoop-2.2.0-beta.
>> 
>> In summary:
>> # I plan to now release hadoop-2.1.0-alpha (this week).
>> # We make a real effort to lock down apis/protocols and release hadoop-2.2.0-beta, say in March.
>> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>> 
>> I'll start a separate thread on 'locking protocols' w.r.t client-protocols v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss this one separately.
>> 
>> Makes sense? Thoughts?
>> 
>> thanks,
>> Arun
>> 
>> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make some incompatible changes due to *unforeseen circumstances*, but no more gratuitous changes are allowed.
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

The discussions in HADOOP-9151 were related to wire-compatibility. I think we all agree that breaking API compatibility is not allowed without deprecating them first in a prior major release - this is something we have followed since hadoop-0.1.

I agree we need to spell out what changes we can and cannot do *after* we go GA, for e.g.:
# Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
# Do we allow incompatible changes on Client-Server protocols? I would say *no*.
# Do we allow incompatible changes on internal-server protocols (for e.g. NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support rolling-upgrades? I would like to not allow this, but I do not know how feasible this is. An option is to allow these changes between minor releases i.e. between hadoop-2.10 and hadoop-2.11.
# Do we allow changes which force a HDFS metadata upgrade between a minor upgrade i.e. hadoop-2.20 to hadoop-2.21? 
# Clearly *no* incompatible changes (API/client-server/server-server) changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1 have to be compatible among all respects.

What else am I missing?

I'll make sure we update our Roadmap wiki and other docs post this discussion.

thanks,
Arun



On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:

> Thanks for bringing this up Arun.  One of the issues is that we
> haven't been clear about what type of compatibility breakages are
> allowed, and which are not.  For example, renaming FileSystem#open is
> incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> a server/server APIs is OK pre-GA but probably not post GA, at least
> in a point release, or required for a security fix, etc.
> Configuration, data format, environment variable, changes etc can all
> be similarly incompatible. The issue we had in HADOOP-9151 was someone
> claimed it is not an incompatible change because it doesn't break API
> compatibility even though it breaks wire compatibility. So let's be
> clear about the types of incompatibility we are or are not permitting.
> For example, will it be OK to merge a change before 2.2.0-beta that
> requires an HDFS metadata upgrade? Or breaks client server wire
> compatibility?  I've been assuming that changing an API annotated
> Public/Stable still requires multiple major releases (one to deprecate
> and one to remove), does the alpha label change that? To some people
> the "alpha", "beta" label implies instability in terms of
> quality/features, while to others it means unstable APIs (and to some
> both) so it would be good to spell that out. In short, agree that we
> really need to figure out what changes are permitted in what releases,
> and we should update the docs accordingly (there's a start here:
> http://wiki.apache.org/hadoop/Roadmap).
> 
> Note that the 2.0.0 alpha release vote thread was clear that we
> thought were all in agreement that we'd like to keep client/server
> compatible post 2.0 - and there was no push back. We pulled a number
> of jiras into the 2.0 release explicitly so that we could preserve
> client/server compatibility going forward.  Here's the relevant part
> of the thread as a refresher: http://s.apache.org/gQ
> 
> "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> envelope in branch-2, but didn't make it into this rc. So, that would
> mean that future alphas would not be protocol-compatible with this
> alpha. Per a discussion a few weeks ago, I think we all were in
> agreement that, if possible, we'd like all 2.x to be compatible for
> client-server communication, at least (even if we don't support
> cross-version for the intra-cluster protocols)"
> 
> Thanks,
> Eli
> 
> On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> Folks,
>> 
>> There has been some discussions about incompatible changes in the hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few other jiras. Frankly, I'm surprised about some of them since the 'alpha' moniker was precisely to harden apis by changing them if necessary, borne out by the fact that every  single release in hadoop-2 chain has had incompatible changes. This happened since we were releasing early, moving fast and breaking things. Furthermore, we'll have more in future as move towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS and YARN-142 (api changes) for YARN.
>> 
>> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate the incompatibility a little better. This makes sense to me, as long as we are clear that we won't make any further *feature* releases in hadoop-2.0.x series (obviously we might be forced to do security/bug-fix release).
>> 
>> Going forward, I'd like to start locking down apis/protocols for a 'beta' release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha to make incompatible changes if necessary and we can call it hadoop-2.2.0-beta.
>> 
>> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This forces us to do a real effort on making sure we lock down for hadoop-2.2.0-beta.
>> 
>> In summary:
>> # I plan to now release hadoop-2.1.0-alpha (this week).
>> # We make a real effort to lock down apis/protocols and release hadoop-2.2.0-beta, say in March.
>> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>> 
>> I'll start a separate thread on 'locking protocols' w.r.t client-protocols v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss this one separately.
>> 
>> Makes sense? Thoughts?
>> 
>> thanks,
>> Arun
>> 
>> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make some incompatible changes due to *unforeseen circumstances*, but no more gratuitous changes are allowed.
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Eli Collins <el...@cloudera.com>.

Thanks for bringing this up Arun.  One of the issues is that we
haven't been clear about what type of compatibility breakages are
allowed, and which are not.  For example, renaming FileSystem#open is
incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
a server/server APIs is OK pre-GA but probably not post GA, at least
in a point release, or required for a security fix, etc.
Configuration, data format, environment variable, changes etc can all
be similarly incompatible. The issue we had in HADOOP-9151 was someone
claimed it is not an incompatible change because it doesn't break API
compatibility even though it breaks wire compatibility. So let's be
clear about the types of incompatibility we are or are not permitting.
 For example, will it be OK to merge a change before 2.2.0-beta that
requires an HDFS metadata upgrade? Or breaks client server wire
compatibility?  I've been assuming that changing an API annotated
Public/Stable still requires multiple major releases (one to deprecate
and one to remove), does the alpha label change that? To some people
the "alpha", "beta" label implies instability in terms of
quality/features, while to others it means unstable APIs (and to some
both) so it would be good to spell that out. In short, agree that we
really need to figure out what changes are permitted in what releases,
and we should update the docs accordingly (there's a start here:
http://wiki.apache.org/hadoop/Roadmap).

Note that the 2.0.0 alpha release vote thread was clear that we
thought were all in agreement that we'd like to keep client/server
compatible post 2.0 - and there was no push back. We pulled a number
of jiras into the 2.0 release explicitly so that we could preserve
client/server compatibility going forward.  Here's the relevant part
of the thread as a refresher: http://s.apache.org/gQ

"2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
envelope in branch-2, but didn't make it into this rc. So, that would
mean that future alphas would not be protocol-compatible with this
alpha. Per a discussion a few weeks ago, I think we all were in
agreement that, if possible, we'd like all 2.x to be compatible for
client-server communication, at least (even if we don't support
cross-version for the intra-cluster protocols)"

Thanks,
Eli

On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Folks,
>
>  There has been some discussions about incompatible changes in the hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few other jiras. Frankly, I'm surprised about some of them since the 'alpha' moniker was precisely to harden apis by changing them if necessary, borne out by the fact that every  single release in hadoop-2 chain has had incompatible changes. This happened since we were releasing early, moving fast and breaking things. Furthermore, we'll have more in future as move towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS and YARN-142 (api changes) for YARN.
>
>  So, rather than debate more, I had a brief chat with Suresh and Todd. Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate the incompatibility a little better. This makes sense to me, as long as we are clear that we won't make any further *feature* releases in hadoop-2.0.x series (obviously we might be forced to do security/bug-fix release).
>
>  Going forward, I'd like to start locking down apis/protocols for a 'beta' release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha to make incompatible changes if necessary and we can call it hadoop-2.2.0-beta.
>
>  Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This forces us to do a real effort on making sure we lock down for hadoop-2.2.0-beta.
>
>  In summary:
>  # I plan to now release hadoop-2.1.0-alpha (this week).
>  # We make a real effort to lock down apis/protocols and release hadoop-2.2.0-beta, say in March.
>  # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>
>  I'll start a separate thread on 'locking protocols' w.r.t client-protocols v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss this one separately.
>
>  Makes sense? Thoughts?
>
> thanks,
> Arun
>
> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make some incompatible changes due to *unforeseen circumstances*, but no more gratuitous changes are allowed.
>

Re: Release numbering for branch-2 releases

Posted by Andrew Purtell <ap...@apache.org>.

On Fri, Feb 1, 2013 at 2:34 AM, Tom White <to...@cloudera.com> wrote:

> Possibly the reason for Stack's consternation is that this is a
> Hadoop-specific versioning scheme, rather than a standard one like
> Semantic Versioning (http://semver.org/) which is more widely
> understood.

If I can offer an alternate and likely more accurate divination, I think
it's the idea of having API incompatibility (also protocol incompatibility)
with each 2.x release.

The preference I believe is for API incompatibilities /
protocol incompatibilities to trigger a major release increment rather than
be rolled into the 2.x branch. Alternatively, I think I can anticipate the
concerns, but have you considered introducing feature flags into the RPC
protocols? Protobuf is a tagged format, by design readers can deal with
missing or unexpected optional fields as long as sender and receiver can
negotiate a lingua franca (via feature flags, is one way).

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> On Mon, Feb 4, 2013 at 1:07 PM, Owen O'Malley <om...@apache.org> wrote:
>
> > I think that using "-(alpha,beta)" tags on the release versions is a
> really
> > bad idea.
>
>
> Why? Can you please share some reasons?
>
>
We already had a means for denoting 'alpha' software -- release candidates
-- and 'beta'; early versions of a major release were installed with
trepidation by all but the clueless.

We also had a place for API changes and wire format revamps; they were done
in the next major version, not between point releases (caveat unintended
mess-ups).

The -alpha and -beta designations muddy hard-won understanding of what the
numbers mean.



> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
>

Not in hadoop though, not until these 2.0ings.



> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
Lets call it a snapshot instead because alpha is damaged (IMO).

Thanks Suresh,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Steve Loughran <st...@hortonworks.com>.

disclaimer, personal opinions only, I just can't be bothered to subscribe
with @apache.org right now.

On 4 February 2013 14:36, Todd Lipcon <to...@cloudera.com> wrote:

> - Quality/completeness: for example, missing docs, buggy UIs, difficult
> setup/install, etc
>

par for the course. Have you ever used Linux?

> - Safety: for example, potential bugs which may risk data loss
>

Anything that threatens data loss is a blocker, at least for data you care
about.

> - Stability: for example, potential bugs which may risk uptime
>

Less critical for most people, though it can cost lots of $$.

> - End-user API compatibility: will user-facing APIs change in this version?
> (affecting those who write MR jobs)
>

> - Framework-developer API compatibility: will YARN-internal APIs change in
> this version? (affecting those who write non-MR YARN frameworks)
>

Things aren't stable in 2.x there yet, YARN-117 is on my todo list, and
without that I consider it broken. the ASF haven't shipped a non-alpha
version of this -and I don't think anyone else has made any stability
claims either. That includes CDH 4.x, where YARN was a "play if you want"
feature. Or "wide-alpha", as I viewed it.

> - Binary compatibility: can I continue to use my application (or YARN)
> framework compiled against an old version with this version, without a
> recompile?
>

This is one thing Computer Science has never addressed fully. The whole of
the entire computing stack has to be considered "best-effort". If there is
one thing we can do here it is hooking up the entire set of OSS apps to the
nightly build, in a nice DAG including things like Cascading, Spring Data
&c, the way Apache Gump did to act as the regression test for Ant (before
Maven broke it)

> - Intra-cluster wire compatibility: can I rolling-upgrade from A to B?
>

The presence of the 2.0.2 alpha stuff in the field complicates things. I
know you want upgrades, I'm sure others do too, but if that became an
approved version, there's the conflict with the "-1 version supported" rule
of wire compatibility -does it get changed?

> - Client-server wire compatibility: can I use old clients to talk to an
> upgraded cluster?
>

IMO we should move clients off the intra-cluster protocol, get them on
WebHDFS, the hcat job APIs, and have a hard split between public and
private. That includes distcp. As webhdfs is in 1.x+ that's the one to care
about.

>
> Depending on the user's expectations and needs, different factors above may
> be significantly more or less important. And different portions of the
> software may have different levels of stability in each of the areas. As
> I've mentioned in previous threads, my experiences supporting production
> Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
> while being "alpha" is significantly less prone to data loss bugs than 1.x
> in Hadoop.

I hope you are right -it's where everything is going.

> But, with some of the changes in the proposed 2.0.3-alpha, it
> wouldn't be wire-protocol-stable.
>
>
I don't know of anyone who wanted that, anyone who said "let's create chaos
and confusion", it was just a consequence of fixing things against an alpha
rlease.

> How can we best devise a scheme that explains the various factors above in
> a more detailed way than one big red warning sticker? What of the above
> factors does the community think would be implied by "GA?"
>

Let's see

 $ ant -version
> Apache Ant(TM) version 1.9.0alpha compiled on November 12 2012

Yes, Ant says "anything you build locally is an alpha release".

In that context,  it's no different from -SNAPSHOT except it's easier to
field bugreps against, because they are at least replicable; things
downstream can be updated to work with the alpha and test it.

I view beta as the transition to "feature complete: bugs and regression
only", with some triage, "patches that don't cause visible regressions"

Shipping is pretty much bugs only, with serious triage -only the widely
visible things happen after that. Critical integrity and performance merit
new updates.

Security fixes: out of band emergency updates. This is a good reason for
leaving security out of anything: a simpler support model. Unlike Oracle I
don't think security plugins should have side effects other than fix the
security hole.

Maven complicates things as you can't ever undeclare a release there -not
even for security reasons. Its why ops-managed RPM and deb updates are
preferred by ops groups for rolling out new binaries of any form to a pool
of boxes -at the expense of the application having control of its classpath
(ant has some special classpath setup to support OS-based installations,
BTW).

The way I've always viewed alpha and beta tags in apache projects is this:

   - you don't care about regressions of behaviour from features that
   weren't in the previous full release
   - the way you field all bug reports is say "is it gone from the latest
   release on that branch?" (*)

The big change in Hadoop is the filesystem: nobody want's to lose their
data, so you do need a story to help people migrate from alpha to next
alpha, beta to next beta. What I don't see being needed is

   1. Support for upgrades from, 2.x.x-alpha to anything 3.x-
   2. Freezing changes to the semantics of the user level APIs that weren't
   in the previous version.

I don't want to gratuitously break anything. It's just that releasing stuff
with the alpha tag doesn't mean "here is something that is stable and
supported by having its own branch maintained", it's "please play with this
and tell us what didn't work".

-Steve

Re: Release numbering for branch-2 releases

Posted by Steve Loughran <st...@hortonworks.com>.

disclaimer, personal opinions only, I just can't be bothered to subscribe
with @apache.org right now.

On 4 February 2013 14:36, Todd Lipcon <to...@cloudera.com> wrote:

> - Quality/completeness: for example, missing docs, buggy UIs, difficult
> setup/install, etc
>

par for the course. Have you ever used Linux?

> - Safety: for example, potential bugs which may risk data loss
>

Anything that threatens data loss is a blocker, at least for data you care
about.

> - Stability: for example, potential bugs which may risk uptime
>

Less critical for most people, though it can cost lots of $$.

> - End-user API compatibility: will user-facing APIs change in this version?
> (affecting those who write MR jobs)
>

> - Framework-developer API compatibility: will YARN-internal APIs change in
> this version? (affecting those who write non-MR YARN frameworks)
>

Things aren't stable in 2.x there yet, YARN-117 is on my todo list, and
without that I consider it broken. the ASF haven't shipped a non-alpha
version of this -and I don't think anyone else has made any stability
claims either. That includes CDH 4.x, where YARN was a "play if you want"
feature. Or "wide-alpha", as I viewed it.

> - Binary compatibility: can I continue to use my application (or YARN)
> framework compiled against an old version with this version, without a
> recompile?
>

This is one thing Computer Science has never addressed fully. The whole of
the entire computing stack has to be considered "best-effort". If there is
one thing we can do here it is hooking up the entire set of OSS apps to the
nightly build, in a nice DAG including things like Cascading, Spring Data
&c, the way Apache Gump did to act as the regression test for Ant (before
Maven broke it)

> - Intra-cluster wire compatibility: can I rolling-upgrade from A to B?
>

The presence of the 2.0.2 alpha stuff in the field complicates things. I
know you want upgrades, I'm sure others do too, but if that became an
approved version, there's the conflict with the "-1 version supported" rule
of wire compatibility -does it get changed?

> - Client-server wire compatibility: can I use old clients to talk to an
> upgraded cluster?
>

IMO we should move clients off the intra-cluster protocol, get them on
WebHDFS, the hcat job APIs, and have a hard split between public and
private. That includes distcp. As webhdfs is in 1.x+ that's the one to care
about.

>
> Depending on the user's expectations and needs, different factors above may
> be significantly more or less important. And different portions of the
> software may have different levels of stability in each of the areas. As
> I've mentioned in previous threads, my experiences supporting production
> Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
> while being "alpha" is significantly less prone to data loss bugs than 1.x
> in Hadoop.

I hope you are right -it's where everything is going.

> But, with some of the changes in the proposed 2.0.3-alpha, it
> wouldn't be wire-protocol-stable.
>
>
I don't know of anyone who wanted that, anyone who said "let's create chaos
and confusion", it was just a consequence of fixing things against an alpha
rlease.

> How can we best devise a scheme that explains the various factors above in
> a more detailed way than one big red warning sticker? What of the above
> factors does the community think would be implied by "GA?"
>

Let's see

 $ ant -version
> Apache Ant(TM) version 1.9.0alpha compiled on November 12 2012

Yes, Ant says "anything you build locally is an alpha release".

In that context,  it's no different from -SNAPSHOT except it's easier to
field bugreps against, because they are at least replicable; things
downstream can be updated to work with the alpha and test it.

I view beta as the transition to "feature complete: bugs and regression
only", with some triage, "patches that don't cause visible regressions"

Shipping is pretty much bugs only, with serious triage -only the widely
visible things happen after that. Critical integrity and performance merit
new updates.

Security fixes: out of band emergency updates. This is a good reason for
leaving security out of anything: a simpler support model. Unlike Oracle I
don't think security plugins should have side effects other than fix the
security hole.

Maven complicates things as you can't ever undeclare a release there -not
even for security reasons. Its why ops-managed RPM and deb updates are
preferred by ops groups for rolling out new binaries of any form to a pool
of boxes -at the expense of the application having control of its classpath
(ant has some special classpath setup to support OS-based installations,
BTW).

The way I've always viewed alpha and beta tags in apache projects is this:

   - you don't care about regressions of behaviour from features that
   weren't in the previous full release
   - the way you field all bug reports is say "is it gone from the latest
   release on that branch?" (*)

The big change in Hadoop is the filesystem: nobody want's to lose their
data, so you do need a story to help people migrate from alpha to next
alpha, beta to next beta. What I don't see being needed is

   1. Support for upgrades from, 2.x.x-alpha to anything 3.x-
   2. Freezing changes to the semantics of the user level APIs that weren't
   in the previous version.

I don't want to gratuitously break anything. It's just that releasing stuff
with the alpha tag doesn't mean "here is something that is stable and
supported by having its own branch maintained", it's "please play with this
and tell us what didn't work".

-Steve

Re: Release numbering for branch-2 releases

Posted by Steve Loughran <st...@hortonworks.com>.

disclaimer, personal opinions only, I just can't be bothered to subscribe
with @apache.org right now.

On 4 February 2013 14:36, Todd Lipcon <to...@cloudera.com> wrote:

> - Quality/completeness: for example, missing docs, buggy UIs, difficult
> setup/install, etc
>

par for the course. Have you ever used Linux?

> - Safety: for example, potential bugs which may risk data loss
>

Anything that threatens data loss is a blocker, at least for data you care
about.

> - Stability: for example, potential bugs which may risk uptime
>

Less critical for most people, though it can cost lots of $$.

> - End-user API compatibility: will user-facing APIs change in this version?
> (affecting those who write MR jobs)
>

> - Framework-developer API compatibility: will YARN-internal APIs change in
> this version? (affecting those who write non-MR YARN frameworks)
>

Things aren't stable in 2.x there yet, YARN-117 is on my todo list, and
without that I consider it broken. the ASF haven't shipped a non-alpha
version of this -and I don't think anyone else has made any stability
claims either. That includes CDH 4.x, where YARN was a "play if you want"
feature. Or "wide-alpha", as I viewed it.

> - Binary compatibility: can I continue to use my application (or YARN)
> framework compiled against an old version with this version, without a
> recompile?
>

This is one thing Computer Science has never addressed fully. The whole of
the entire computing stack has to be considered "best-effort". If there is
one thing we can do here it is hooking up the entire set of OSS apps to the
nightly build, in a nice DAG including things like Cascading, Spring Data
&c, the way Apache Gump did to act as the regression test for Ant (before
Maven broke it)

> - Intra-cluster wire compatibility: can I rolling-upgrade from A to B?
>

The presence of the 2.0.2 alpha stuff in the field complicates things. I
know you want upgrades, I'm sure others do too, but if that became an
approved version, there's the conflict with the "-1 version supported" rule
of wire compatibility -does it get changed?

> - Client-server wire compatibility: can I use old clients to talk to an
> upgraded cluster?
>

IMO we should move clients off the intra-cluster protocol, get them on
WebHDFS, the hcat job APIs, and have a hard split between public and
private. That includes distcp. As webhdfs is in 1.x+ that's the one to care
about.

>
> Depending on the user's expectations and needs, different factors above may
> be significantly more or less important. And different portions of the
> software may have different levels of stability in each of the areas. As
> I've mentioned in previous threads, my experiences supporting production
> Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
> while being "alpha" is significantly less prone to data loss bugs than 1.x
> in Hadoop.

I hope you are right -it's where everything is going.

> But, with some of the changes in the proposed 2.0.3-alpha, it
> wouldn't be wire-protocol-stable.
>
>
I don't know of anyone who wanted that, anyone who said "let's create chaos
and confusion", it was just a consequence of fixing things against an alpha
rlease.

> How can we best devise a scheme that explains the various factors above in
> a more detailed way than one big red warning sticker? What of the above
> factors does the community think would be implied by "GA?"
>

Let's see

 $ ant -version
> Apache Ant(TM) version 1.9.0alpha compiled on November 12 2012

Yes, Ant says "anything you build locally is an alpha release".

In that context,  it's no different from -SNAPSHOT except it's easier to
field bugreps against, because they are at least replicable; things
downstream can be updated to work with the alpha and test it.

I view beta as the transition to "feature complete: bugs and regression
only", with some triage, "patches that don't cause visible regressions"

Shipping is pretty much bugs only, with serious triage -only the widely
visible things happen after that. Critical integrity and performance merit
new updates.

Security fixes: out of band emergency updates. This is a good reason for
leaving security out of anything: a simpler support model. Unlike Oracle I
don't think security plugins should have side effects other than fix the
security hole.

Maven complicates things as you can't ever undeclare a release there -not
even for security reasons. Its why ops-managed RPM and deb updates are
preferred by ops groups for rolling out new binaries of any form to a pool
of boxes -at the expense of the application having control of its classpath
(ant has some special classpath setup to support OS-based installations,
BTW).

The way I've always viewed alpha and beta tags in apache projects is this:

   - you don't care about regressions of behaviour from features that
   weren't in the previous full release
   - the way you field all bug reports is say "is it gone from the latest
   release on that branch?" (*)

The big change in Hadoop is the filesystem: nobody want's to lose their
data, so you do need a story to help people migrate from alpha to next
alpha, beta to next beta. What I don't see being needed is

   1. Support for upgrades from, 2.x.x-alpha to anything 3.x-
   2. Freezing changes to the semantics of the user level APIs that weren't
   in the previous version.

I don't want to gratuitously break anything. It's just that releasing stuff
with the alpha tag doesn't mean "here is something that is stable and
supported by having its own branch maintained", it's "please play with this
and tell us what didn't work".

-Steve

Re: Release numbering for branch-2 releases

Posted by Todd Lipcon <to...@cloudera.com>.

On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

>
> Why? Can you please share some reasons?
>
> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
My issue with the current scheme is that there is little definition as to
what alpha/beta/stable means. We're trying to boil down a complex issue
into a simple tag which doesn't well capture the various subtleties. For
example, different people may variously use the terms to describe:

- Quality/completeness: for example, missing docs, buggy UIs, difficult
setup/install, etc
- Safety: for example, potential bugs which may risk data loss
- Stability: for example, potential bugs which may risk uptime
- End-user API compatibility: will user-facing APIs change in this version?
(affecting those who write MR jobs)
- Framework-developer API compatibility: will YARN-internal APIs change in
this version? (affecting those who write non-MR YARN frameworks)
- Binary compatibility: can I continue to use my application (or YARN)
framework compiled against an old version with this version, without a
recompile?
- Intra-cluster wire compatibility: can I rolling-upgrade from A to B?
- Client-server wire compatibility: can I use old clients to talk to an
upgraded cluster?

Depending on the user's expectations and needs, different factors above may
be significantly more or less important. And different portions of the
software may have different levels of stability in each of the areas. As
I've mentioned in previous threads, my experiences supporting production
Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
while being "alpha" is significantly less prone to data loss bugs than 1.x
in Hadoop. But, with some of the changes in the proposed 2.0.3-alpha, it
wouldn't be wire-protocol-stable.

How can we best devise a scheme that explains the various factors above in
a more detailed way than one big red warning sticker? What of the above
factors does the community think would be implied by "GA?"

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Release numbering for branch-2 releases

Posted by Todd Lipcon <to...@cloudera.com>.

On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

>
> Why? Can you please share some reasons?
>
> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
My issue with the current scheme is that there is little definition as to
what alpha/beta/stable means. We're trying to boil down a complex issue
into a simple tag which doesn't well capture the various subtleties. For
example, different people may variously use the terms to describe:

- Quality/completeness: for example, missing docs, buggy UIs, difficult
setup/install, etc
- Safety: for example, potential bugs which may risk data loss
- Stability: for example, potential bugs which may risk uptime
- End-user API compatibility: will user-facing APIs change in this version?
(affecting those who write MR jobs)
- Framework-developer API compatibility: will YARN-internal APIs change in
this version? (affecting those who write non-MR YARN frameworks)
- Binary compatibility: can I continue to use my application (or YARN)
framework compiled against an old version with this version, without a
recompile?
- Intra-cluster wire compatibility: can I rolling-upgrade from A to B?
- Client-server wire compatibility: can I use old clients to talk to an
upgraded cluster?

Depending on the user's expectations and needs, different factors above may
be significantly more or less important. And different portions of the
software may have different levels of stability in each of the areas. As
I've mentioned in previous threads, my experiences supporting production
Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
while being "alpha" is significantly less prone to data loss bugs than 1.x
in Hadoop. But, with some of the changes in the proposed 2.0.3-alpha, it
wouldn't be wire-protocol-stable.

How can we best devise a scheme that explains the various factors above in
a more detailed way than one big red warning sticker? What of the above
factors does the community think would be implied by "GA?"

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Release numbering for branch-2 releases

Posted by Todd Lipcon <to...@cloudera.com>.

On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

>
> Why? Can you please share some reasons?
>
> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
My issue with the current scheme is that there is little definition as to
what alpha/beta/stable means. We're trying to boil down a complex issue
into a simple tag which doesn't well capture the various subtleties. For
example, different people may variously use the terms to describe:

- Quality/completeness: for example, missing docs, buggy UIs, difficult
setup/install, etc
- Safety: for example, potential bugs which may risk data loss
- Stability: for example, potential bugs which may risk uptime
- End-user API compatibility: will user-facing APIs change in this version?
(affecting those who write MR jobs)
- Framework-developer API compatibility: will YARN-internal APIs change in
this version? (affecting those who write non-MR YARN frameworks)
- Binary compatibility: can I continue to use my application (or YARN)
framework compiled against an old version with this version, without a
recompile?
- Intra-cluster wire compatibility: can I rolling-upgrade from A to B?
- Client-server wire compatibility: can I use old clients to talk to an
upgraded cluster?

Depending on the user's expectations and needs, different factors above may
be significantly more or less important. And different portions of the
software may have different levels of stability in each of the areas. As
I've mentioned in previous threads, my experiences supporting production
Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
while being "alpha" is significantly less prone to data loss bugs than 1.x
in Hadoop. But, with some of the changes in the proposed 2.0.3-alpha, it
wouldn't be wire-protocol-stable.

How can we best devise a scheme that explains the various factors above in
a more detailed way than one big red warning sticker? What of the above
factors does the community think would be implied by "GA?"

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> On Mon, Feb 4, 2013 at 1:07 PM, Owen O'Malley <om...@apache.org> wrote:
>
> > I think that using "-(alpha,beta)" tags on the release versions is a
> really
> > bad idea.
>
>
> Why? Can you please share some reasons?
>
>
We already had a means for denoting 'alpha' software -- release candidates
-- and 'beta'; early versions of a major release were installed with
trepidation by all but the clueless.

We also had a place for API changes and wire format revamps; they were done
in the next major version, not between point releases (caveat unintended
mess-ups).

The -alpha and -beta designations muddy hard-won understanding of what the
numbers mean.



> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
>

Not in hadoop though, not until these 2.0ings.



> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
Lets call it a snapshot instead because alpha is damaged (IMO).

Thanks Suresh,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Todd Lipcon <to...@cloudera.com>.

On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

>
> Why? Can you please share some reasons?
>
> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
My issue with the current scheme is that there is little definition as to
what alpha/beta/stable means. We're trying to boil down a complex issue
into a simple tag which doesn't well capture the various subtleties. For
example, different people may variously use the terms to describe:

- Quality/completeness: for example, missing docs, buggy UIs, difficult
setup/install, etc
- Safety: for example, potential bugs which may risk data loss
- Stability: for example, potential bugs which may risk uptime
- End-user API compatibility: will user-facing APIs change in this version?
(affecting those who write MR jobs)
- Framework-developer API compatibility: will YARN-internal APIs change in
this version? (affecting those who write non-MR YARN frameworks)
- Binary compatibility: can I continue to use my application (or YARN)
framework compiled against an old version with this version, without a
recompile?
- Intra-cluster wire compatibility: can I rolling-upgrade from A to B?
- Client-server wire compatibility: can I use old clients to talk to an
upgraded cluster?

Depending on the user's expectations and needs, different factors above may
be significantly more or less important. And different portions of the
software may have different levels of stability in each of the areas. As
I've mentioned in previous threads, my experiences supporting production
Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
while being "alpha" is significantly less prone to data loss bugs than 1.x
in Hadoop. But, with some of the changes in the proposed 2.0.3-alpha, it
wouldn't be wire-protocol-stable.

How can we best devise a scheme that explains the various factors above in
a more detailed way than one big red warning sticker? What of the above
factors does the community think would be implied by "GA?"

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> On Mon, Feb 4, 2013 at 1:07 PM, Owen O'Malley <om...@apache.org> wrote:
>
> > I think that using "-(alpha,beta)" tags on the release versions is a
> really
> > bad idea.
>
>
> Why? Can you please share some reasons?
>
>
We already had a means for denoting 'alpha' software -- release candidates
-- and 'beta'; early versions of a major release were installed with
trepidation by all but the clueless.

We also had a place for API changes and wire format revamps; they were done
in the next major version, not between point releases (caveat unintended
mess-ups).

The -alpha and -beta designations muddy hard-won understanding of what the
numbers mean.



> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
>

Not in hadoop though, not until these 2.0ings.



> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
Lets call it a snapshot instead because alpha is damaged (IMO).

Thanks Suresh,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas <su...@hortonworks.com>wrote:

> On Mon, Feb 4, 2013 at 1:07 PM, Owen O'Malley <om...@apache.org> wrote:
>
> > I think that using "-(alpha,beta)" tags on the release versions is a
> really
> > bad idea.
>
>
> Why? Can you please share some reasons?
>
>
We already had a means for denoting 'alpha' software -- release candidates
-- and 'beta'; early versions of a major release were installed with
trepidation by all but the clueless.

We also had a place for API changes and wire format revamps; they were done
in the next major version, not between point releases (caveat unintended
mess-ups).

The -alpha and -beta designations muddy hard-won understanding of what the
numbers mean.



> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
>

Not in hadoop though, not until these 2.0ings.



> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
Lets call it a snapshot instead because alpha is damaged (IMO).

Thanks Suresh,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Suresh Srinivas <su...@hortonworks.com>.

On Mon, Feb 4, 2013 at 1:07 PM, Owen O'Malley <om...@apache.org> wrote:

> I think that using "-(alpha,beta)" tags on the release versions is a really
> bad idea.

Why? Can you please share some reasons?

I actually think alpha and beta and stable/GA are much better way to set
the expectation
of the quality of a release. This has been practiced in software release
cycle for a long time.
Having an option to release alpha is good for releasing early and getting
feedback from
people who can try it out and at the same time warning other not so
adventurous users on
quality expectation.

Or do you propose any release that is not marked stable (currently 1.x) is
implicitly alpha/beta?

All releases should follow the strictly numeric
> (Major.Minor.Patch) pattern that we've used for all of the releases except
> the 2.0.x ones.
>
> -- Owen
>
>
> On Mon, Feb 4, 2013 at 11:53 AM, Stack <st...@duboce.net> wrote:
>
> > On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com>
> > wrote:
> >
> > > Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> > > stable release? This way we just have one series (2.0.x) which is not
> > > suitable for general consumption.
> > >
> > >
> >
> > That contains the versioning damage to the 2.0.x set.  This is an
> > improvement over the original proposal where we let the versioning mayhem
> > run out 2.3.
> >
> > Thanks Arun,
> > St.Ack
> >
>

-- 
http://hortonworks.com/download/

Re: Release numbering for branch-2 releases

Posted by Suresh Srinivas <su...@hortonworks.com>.

On Mon, Feb 4, 2013 at 1:07 PM, Owen O'Malley <om...@apache.org> wrote:

> I think that using "-(alpha,beta)" tags on the release versions is a really
> bad idea.

Why? Can you please share some reasons?

I actually think alpha and beta and stable/GA are much better way to set
the expectation
of the quality of a release. This has been practiced in software release
cycle for a long time.
Having an option to release alpha is good for releasing early and getting
feedback from
people who can try it out and at the same time warning other not so
adventurous users on
quality expectation.

Or do you propose any release that is not marked stable (currently 1.x) is
implicitly alpha/beta?

All releases should follow the strictly numeric
> (Major.Minor.Patch) pattern that we've used for all of the releases except
> the 2.0.x ones.
>
> -- Owen
>
>
> On Mon, Feb 4, 2013 at 11:53 AM, Stack <st...@duboce.net> wrote:
>
> > On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com>
> > wrote:
> >
> > > Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> > > stable release? This way we just have one series (2.0.x) which is not
> > > suitable for general consumption.
> > >
> > >
> >
> > That contains the versioning damage to the 2.0.x set.  This is an
> > improvement over the original proposal where we let the versioning mayhem
> > run out 2.3.
> >
> > Thanks Arun,
> > St.Ack
> >
>

-- 
http://hortonworks.com/download/

Re: Release numbering for branch-2 releases

Posted by Suresh Srinivas <su...@hortonworks.com>.

On Mon, Feb 4, 2013 at 1:07 PM, Owen O'Malley <om...@apache.org> wrote:

> I think that using "-(alpha,beta)" tags on the release versions is a really
> bad idea.

Why? Can you please share some reasons?

I actually think alpha and beta and stable/GA are much better way to set
the expectation
of the quality of a release. This has been practiced in software release
cycle for a long time.
Having an option to release alpha is good for releasing early and getting
feedback from
people who can try it out and at the same time warning other not so
adventurous users on
quality expectation.

Or do you propose any release that is not marked stable (currently 1.x) is
implicitly alpha/beta?

All releases should follow the strictly numeric
> (Major.Minor.Patch) pattern that we've used for all of the releases except
> the 2.0.x ones.
>
> -- Owen
>
>
> On Mon, Feb 4, 2013 at 11:53 AM, Stack <st...@duboce.net> wrote:
>
> > On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com>
> > wrote:
> >
> > > Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> > > stable release? This way we just have one series (2.0.x) which is not
> > > suitable for general consumption.
> > >
> > >
> >
> > That contains the versioning damage to the 2.0.x set.  This is an
> > improvement over the original proposal where we let the versioning mayhem
> > run out 2.3.
> >
> > Thanks Arun,
> > St.Ack
> >
>

-- 
http://hortonworks.com/download/

Re: Release numbering for branch-2 releases

Posted by Owen O'Malley <om...@apache.org>.

I think that using "-(alpha,beta)" tags on the release versions is a really
bad idea. All releases should follow the strictly numeric
(Major.Minor.Patch) pattern that we've used for all of the releases except
the 2.0.x ones.

-- Owen

On Mon, Feb 4, 2013 at 11:53 AM, Stack <st...@duboce.net> wrote:

> On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
>
> > Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> > stable release? This way we just have one series (2.0.x) which is not
> > suitable for general consumption.
> >
> >
>
> That contains the versioning damage to the 2.0.x set.  This is an
> improvement over the original proposal where we let the versioning mayhem
> run out 2.3.
>
> Thanks Arun,
> St.Ack
>

Re: Release numbering for branch-2 releases

Posted by Owen O'Malley <om...@apache.org>.

I think that using "-(alpha,beta)" tags on the release versions is a really
bad idea. All releases should follow the strictly numeric
(Major.Minor.Patch) pattern that we've used for all of the releases except
the 2.0.x ones.

-- Owen

On Mon, Feb 4, 2013 at 11:53 AM, Stack <st...@duboce.net> wrote:

> On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
>
> > Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> > stable release? This way we just have one series (2.0.x) which is not
> > suitable for general consumption.
> >
> >
>
> That contains the versioning damage to the 2.0.x set.  This is an
> improvement over the original proposal where we let the versioning mayhem
> run out 2.3.
>
> Thanks Arun,
> St.Ack
>

Re: Release numbering for branch-2 releases

Posted by Owen O'Malley <om...@apache.org>.

I think that using "-(alpha,beta)" tags on the release versions is a really
bad idea. All releases should follow the strictly numeric
(Major.Minor.Patch) pattern that we've used for all of the releases except
the 2.0.x ones.

-- Owen

On Mon, Feb 4, 2013 at 11:53 AM, Stack <st...@duboce.net> wrote:

> On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
>
> > Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> > stable release? This way we just have one series (2.0.x) which is not
> > suitable for general consumption.
> >
> >
>
> That contains the versioning damage to the 2.0.x set.  This is an
> improvement over the original proposal where we let the versioning mayhem
> run out 2.3.
>
> Thanks Arun,
> St.Ack
>

Re: Release numbering for branch-2 releases

Posted by Owen O'Malley <om...@apache.org>.

I think that using "-(alpha,beta)" tags on the release versions is a really
bad idea. All releases should follow the strictly numeric
(Major.Minor.Patch) pattern that we've used for all of the releases except
the 2.0.x ones.

-- Owen

On Mon, Feb 4, 2013 at 11:53 AM, Stack <st...@duboce.net> wrote:

> On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
>
> > Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> > stable release? This way we just have one series (2.0.x) which is not
> > suitable for general consumption.
> >
> >
>
> That contains the versioning damage to the 2.0.x set.  This is an
> improvement over the original proposal where we let the versioning mayhem
> run out 2.3.
>
> Thanks Arun,
> St.Ack
>

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> stable release? This way we just have one series (2.0.x) which is not
> suitable for general consumption.
>
>

That contains the versioning damage to the 2.0.x set.  This is an
improvement over the original proposal where we let the versioning mayhem
run out 2.3.

Thanks Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Suresh Srinivas <su...@hortonworks.com>.

On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

>
> On Feb 1, 2013, at 2:34 AM, Tom White wrote:
> > Whereas Arun is proposing
> >
> >  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0
> >
> > and the casual observer might expect there to be a stable 2.0.1 (say)
> > on seeing the existence of 2.0.2-alpha.
> >
> > The first three of these are already released, so I don't think we
> > could switch to the Semantic Versioning scheme at this stage. We could
> > for release 3 though.
> >
>
> I agree that would have been slightly better, unfortunately it's too late
> now - a new versioning scheme would be even more confusing!
>
> Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> stable release? This way we just have one series (2.0.x) which is not
> suitable for general consumption.
>
> I'm ok either way, but I want to just make a decision and move on to
> making the release asap, appreciate a quick resolution.
>

+1 for 2.0.3-alpha. 2.0.3-alpha has been the release number that we have
been working on for a while. I am surprised to see the feedback that it is
confusing.

Lets constructively move forward and make a decision and send the release
out quickly. Arun, my suggestion is to call for a release vote.

Regards,
Suresh




-- 
http://hortonworks.com/download/

Re: Release numbering for branch-2 releases

Posted by Suresh Srinivas <su...@hortonworks.com>.

On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

>
> On Feb 1, 2013, at 2:34 AM, Tom White wrote:
> > Whereas Arun is proposing
> >
> >  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0
> >
> > and the casual observer might expect there to be a stable 2.0.1 (say)
> > on seeing the existence of 2.0.2-alpha.
> >
> > The first three of these are already released, so I don't think we
> > could switch to the Semantic Versioning scheme at this stage. We could
> > for release 3 though.
> >
>
> I agree that would have been slightly better, unfortunately it's too late
> now - a new versioning scheme would be even more confusing!
>
> Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> stable release? This way we just have one series (2.0.x) which is not
> suitable for general consumption.
>
> I'm ok either way, but I want to just make a decision and move on to
> making the release asap, appreciate a quick resolution.
>

+1 for 2.0.3-alpha. 2.0.3-alpha has been the release number that we have
been working on for a while. I am surprised to see the feedback that it is
confusing.

Lets constructively move forward and make a decision and send the release
out quickly. Arun, my suggestion is to call for a release vote.

Regards,
Suresh




-- 
http://hortonworks.com/download/

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> stable release? This way we just have one series (2.0.x) which is not
> suitable for general consumption.
>
>

That contains the versioning damage to the 2.0.x set.  This is an
improvement over the original proposal where we let the versioning mayhem
run out 2.3.

Thanks Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Suresh Srinivas <su...@hortonworks.com>.

On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

>
> On Feb 1, 2013, at 2:34 AM, Tom White wrote:
> > Whereas Arun is proposing
> >
> >  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0
> >
> > and the casual observer might expect there to be a stable 2.0.1 (say)
> > on seeing the existence of 2.0.2-alpha.
> >
> > The first three of these are already released, so I don't think we
> > could switch to the Semantic Versioning scheme at this stage. We could
> > for release 3 though.
> >
>
> I agree that would have been slightly better, unfortunately it's too late
> now - a new versioning scheme would be even more confusing!
>
> Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> stable release? This way we just have one series (2.0.x) which is not
> suitable for general consumption.
>
> I'm ok either way, but I want to just make a decision and move on to
> making the release asap, appreciate a quick resolution.
>

+1 for 2.0.3-alpha. 2.0.3-alpha has been the release number that we have
been working on for a while. I am surprised to see the feedback that it is
confusing.

Lets constructively move forward and make a decision and send the release
out quickly. Arun, my suggestion is to call for a release vote.

Regards,
Suresh




-- 
http://hortonworks.com/download/

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> stable release? This way we just have one series (2.0.x) which is not
> suitable for general consumption.
>
>

That contains the versioning damage to the 2.0.x set.  This is an
improvement over the original proposal where we let the versioning mayhem
run out 2.3.

Thanks Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> stable release? This way we just have one series (2.0.x) which is not
> suitable for general consumption.
>
>

That contains the versioning damage to the 2.0.x set.  This is an
improvement over the original proposal where we let the versioning mayhem
run out 2.3.

Thanks Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Feb 1, 2013, at 2:34 AM, Tom White wrote:
> Whereas Arun is proposing
> 
>  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0
> 
> and the casual observer might expect there to be a stable 2.0.1 (say)
> on seeing the existence of 2.0.2-alpha.
> 
> The first three of these are already released, so I don't think we
> could switch to the Semantic Versioning scheme at this stage. We could
> for release 3 though.
> 

I agree that would have been slightly better, unfortunately it's too late now - a new versioning scheme would be even more confusing!

Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a stable release? This way we just have one series (2.0.x) which is not suitable for general consumption.

I'm ok either way, but I want to just make a decision and move on to making the release asap, appreciate a quick resolution.

thanks,
Arun

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Feb 1, 2013, at 2:34 AM, Tom White wrote:
> Whereas Arun is proposing
> 
>  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0
> 
> and the casual observer might expect there to be a stable 2.0.1 (say)
> on seeing the existence of 2.0.2-alpha.
> 
> The first three of these are already released, so I don't think we
> could switch to the Semantic Versioning scheme at this stage. We could
> for release 3 though.
> 

I agree that would have been slightly better, unfortunately it's too late now - a new versioning scheme would be even more confusing!

Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a stable release? This way we just have one series (2.0.x) which is not suitable for general consumption.

I'm ok either way, but I want to just make a decision and move on to making the release asap, appreciate a quick resolution.

thanks,
Arun

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Feb 1, 2013, at 2:34 AM, Tom White wrote:
> Whereas Arun is proposing
> 
>  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0
> 
> and the casual observer might expect there to be a stable 2.0.1 (say)
> on seeing the existence of 2.0.2-alpha.
> 
> The first three of these are already released, so I don't think we
> could switch to the Semantic Versioning scheme at this stage. We could
> for release 3 though.
> 

I agree that would have been slightly better, unfortunately it's too late now - a new versioning scheme would be even more confusing!

Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a stable release? This way we just have one series (2.0.x) which is not suitable for general consumption.

I'm ok either way, but I want to just make a decision and move on to making the release asap, appreciate a quick resolution.

thanks,
Arun

Re: Release numbering for branch-2 releases

Posted by Andrew Purtell <ap...@apache.org>.

On Fri, Feb 1, 2013 at 2:34 AM, Tom White <to...@cloudera.com> wrote:

> Possibly the reason for Stack's consternation is that this is a
> Hadoop-specific versioning scheme, rather than a standard one like
> Semantic Versioning (http://semver.org/) which is more widely
> understood.

If I can offer an alternate and likely more accurate divination, I think
it's the idea of having API incompatibility (also protocol incompatibility)
with each 2.x release.

The preference I believe is for API incompatibilities /
protocol incompatibilities to trigger a major release increment rather than
be rolled into the 2.x branch. Alternatively, I think I can anticipate the
concerns, but have you considered introducing feature flags into the RPC
protocols? Protobuf is a tagged format, by design readers can deal with
missing or unexpected optional fields as long as sender and receiver can
negotiate a lingua franca (via feature flags, is one way).

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Release numbering for branch-2 releases

Posted by Andrew Purtell <ap...@apache.org>.

On Fri, Feb 1, 2013 at 2:34 AM, Tom White <to...@cloudera.com> wrote:

> Possibly the reason for Stack's consternation is that this is a
> Hadoop-specific versioning scheme, rather than a standard one like
> Semantic Versioning (http://semver.org/) which is more widely
> understood.

If I can offer an alternate and likely more accurate divination, I think
it's the idea of having API incompatibility (also protocol incompatibility)
with each 2.x release.

The preference I believe is for API incompatibilities /
protocol incompatibilities to trigger a major release increment rather than
be rolled into the 2.x branch. Alternatively, I think I can anticipate the
concerns, but have you considered introducing feature flags into the RPC
protocols? Protobuf is a tagged format, by design readers can deal with
missing or unexpected optional fields as long as sender and receiver can
negotiate a lingua franca (via feature flags, is one way).

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Feb 1, 2013, at 2:34 AM, Tom White wrote:

> Possibly the reason for Stack's consternation is that this is a
> Hadoop-specific versioning scheme, rather than a standard one like
> Semantic Versioning (http://semver.org/) which is more widely
> understood.
> 
> With that scheme we would have something like
> 
>  2.0.0-alpha, 2.0.0-alpha.1, 2.0.0-alpha.2, 2.0.0-alpha.3, 2.0.0-beta, 2.0.0
> 
> so that the alpha and beta tags all precede the 2.0.0 GA release,
> which is the one that we make compatibility promises for.
> 
> Whereas Arun is proposing
> 
>  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0
> 

Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a stable release?

I'm ok either way, but I want to just make a decision and move on to making the release asap, appreciate a quick resolution.

thanks,
Arun

Re: Release numbering for branch-2 releases

Posted by Andrew Purtell <ap...@apache.org>.

On Fri, Feb 1, 2013 at 2:34 AM, Tom White <to...@cloudera.com> wrote:

> Possibly the reason for Stack's consternation is that this is a
> Hadoop-specific versioning scheme, rather than a standard one like
> Semantic Versioning (http://semver.org/) which is more widely
> understood.

If I can offer an alternate and likely more accurate divination, I think
it's the idea of having API incompatibility (also protocol incompatibility)
with each 2.x release.

The preference I believe is for API incompatibilities /
protocol incompatibilities to trigger a major release increment rather than
be rolled into the 2.x branch. Alternatively, I think I can anticipate the
concerns, but have you considered introducing feature flags into the RPC
protocols? Protobuf is a tagged format, by design readers can deal with
missing or unexpected optional fields as long as sender and receiver can
negotiate a lingua franca (via feature flags, is one way).

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Feb 1, 2013, at 2:34 AM, Tom White wrote:
> Whereas Arun is proposing
> 
>  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0
> 
> and the casual observer might expect there to be a stable 2.0.1 (say)
> on seeing the existence of 2.0.2-alpha.
> 
> The first three of these are already released, so I don't think we
> could switch to the Semantic Versioning scheme at this stage. We could
> for release 3 though.
> 

I agree that would have been slightly better, unfortunately it's too late now - a new versioning scheme would be even more confusing!

Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a stable release? This way we just have one series (2.0.x) which is not suitable for general consumption.

I'm ok either way, but I want to just make a decision and move on to making the release asap, appreciate a quick resolution.

thanks,
Arun

Re: Release numbering for branch-2 releases

Posted by Tom White <to...@cloudera.com>.

Possibly the reason for Stack's consternation is that this is a
Hadoop-specific versioning scheme, rather than a standard one like
Semantic Versioning (http://semver.org/) which is more widely
understood.

With that scheme we would have something like

  2.0.0-alpha, 2.0.0-alpha.1, 2.0.0-alpha.2, 2.0.0-alpha.3, 2.0.0-beta, 2.0.0

so that the alpha and beta tags all precede the 2.0.0 GA release,
which is the one that we make compatibility promises for.

Whereas Arun is proposing

  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0

and the casual observer might expect there to be a stable 2.0.1 (say)
on seeing the existence of 2.0.2-alpha.

The first three of these are already released, so I don't think we
could switch to the Semantic Versioning scheme at this stage. We could
for release 3 though.

Tom

On Thu, Jan 31, 2013 at 8:12 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Stack,
>
> On Jan 30, 2013, at 9:25 PM, Stack wrote:
>
>> I find the above opaque and written in a cryptic language that I might grok
>> if I spent a day or two running over cited issues trying to make some
>> distillation of the esotericia debated therein.  If you want feedback from
>> other than the cognescenti, I would suggest a better summation of what all
>> is involved.
>
>
> I apologize if there was too much technical details.
>
> The simplified version is that hadoop-2 isn't baked as it stands today, and is not viable to be supported by this community in a stable manner. In particular, it is due to the move to PB for HDFS protocols and the freshly minted YARN apis/protocols. As a result, we have been forced to make (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2 etc.). Since we released the previous bits we have found security issues, bugs and other issues which will cause long-term maintenance harm (details are in the HADOOP/HDFS/YARN jiras in the original email).
>
> My aim, as the RM, is to try nudge (nay, force) all contributors to spend time over the next couple of months focussing on fixing known issues and to look for other surprises - this way I hope to ensure we do not have further incompatible changes for downstream projects and we can support hadoop-2 for at least a couple of years. I hope this makes sense to you. I don't think turning around and calling these 3.x or 4.x makes things better since no amount of numbering lipstick will make the software better or viable for the long-term for both users and other projects. Worse, it will force HBase and other projects to deal with *even more* major Hadoop releases... which seems like a royal pita.
>
> I hope that clarifies things. Thanks Stack.
>
> Arun
>

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Thu, Jan 31, 2013 at 12:12 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> I apologize if there was too much technical details.
>
> The simplified version is that hadoop-2 isn't baked as it stands today,
> and is not viable to be supported by this community in a stable manner. In
> particular, it is due to the move to PB for HDFS protocols and the freshly
> minted YARN apis/protocols. As a result, we have been forced to make
> (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2
> etc.). Since we released the previous bits we have found security issues,
> bugs and other issues which will cause long-term maintenance harm (details
> are in the HADOOP/HDFS/YARN jiras in the original email).
>
> My aim, as the RM, is to try nudge (nay, force) all contributors to spend
> time over the next couple of months focussing on fixing known issues and to
> look for other surprises - this way I hope to ensure we do not have further
> incompatible changes for downstream projects and we can support hadoop-2
> for at least a couple of years. I hope this makes sense to you. I don't
> think turning around and calling these 3.x or 4.x makes things better since
> no amount of numbering lipstick will make the software better or viable for
> the long-term for both users and other projects. Worse, it will force HBase
> and other projects to deal with *even more* major Hadoop releases... which
> seems like a royal pita.
>
> I hope that clarifies things. Thanks Stack.
>

Tom above puts his finger on the problem I am having.  It seems that the
'hadoop versioning' is arbitrary, flaunts convention, and on top of that is
without a discernible pattern (2.0.0 is actually going to be 2.3.0?).  It
is also tantalizing as it holds out the promise of a 2.0.0 or a 2.1.0,
etc., but seemingly these will never ship.

Above you call 3.x and 4.x 'numbering liipstick' -- nice one! -- but to
this 'casual observer', IMO, it would be more calling a spade a spade; i.e.
3.x.x, a major version change, has API and possibly wire protocol changes
in it.

Thank you for taking the time to dumb it all down for me Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Thu, Jan 31, 2013 at 12:12 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> I apologize if there was too much technical details.
>
> The simplified version is that hadoop-2 isn't baked as it stands today,
> and is not viable to be supported by this community in a stable manner. In
> particular, it is due to the move to PB for HDFS protocols and the freshly
> minted YARN apis/protocols. As a result, we have been forced to make
> (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2
> etc.). Since we released the previous bits we have found security issues,
> bugs and other issues which will cause long-term maintenance harm (details
> are in the HADOOP/HDFS/YARN jiras in the original email).
>
> My aim, as the RM, is to try nudge (nay, force) all contributors to spend
> time over the next couple of months focussing on fixing known issues and to
> look for other surprises - this way I hope to ensure we do not have further
> incompatible changes for downstream projects and we can support hadoop-2
> for at least a couple of years. I hope this makes sense to you. I don't
> think turning around and calling these 3.x or 4.x makes things better since
> no amount of numbering lipstick will make the software better or viable for
> the long-term for both users and other projects. Worse, it will force HBase
> and other projects to deal with *even more* major Hadoop releases... which
> seems like a royal pita.
>
> I hope that clarifies things. Thanks Stack.
>

Tom above puts his finger on the problem I am having.  It seems that the
'hadoop versioning' is arbitrary, flaunts convention, and on top of that is
without a discernible pattern (2.0.0 is actually going to be 2.3.0?).  It
is also tantalizing as it holds out the promise of a 2.0.0 or a 2.1.0,
etc., but seemingly these will never ship.

Above you call 3.x and 4.x 'numbering liipstick' -- nice one! -- but to
this 'casual observer', IMO, it would be more calling a spade a spade; i.e.
3.x.x, a major version change, has API and possibly wire protocol changes
in it.

Thank you for taking the time to dumb it all down for me Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Tom White <to...@cloudera.com>.

Possibly the reason for Stack's consternation is that this is a
Hadoop-specific versioning scheme, rather than a standard one like
Semantic Versioning (http://semver.org/) which is more widely
understood.

With that scheme we would have something like

  2.0.0-alpha, 2.0.0-alpha.1, 2.0.0-alpha.2, 2.0.0-alpha.3, 2.0.0-beta, 2.0.0

so that the alpha and beta tags all precede the 2.0.0 GA release,
which is the one that we make compatibility promises for.

Whereas Arun is proposing

  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0

and the casual observer might expect there to be a stable 2.0.1 (say)
on seeing the existence of 2.0.2-alpha.

The first three of these are already released, so I don't think we
could switch to the Semantic Versioning scheme at this stage. We could
for release 3 though.

Tom

On Thu, Jan 31, 2013 at 8:12 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Stack,
>
> On Jan 30, 2013, at 9:25 PM, Stack wrote:
>
>> I find the above opaque and written in a cryptic language that I might grok
>> if I spent a day or two running over cited issues trying to make some
>> distillation of the esotericia debated therein.  If you want feedback from
>> other than the cognescenti, I would suggest a better summation of what all
>> is involved.
>
>
> I apologize if there was too much technical details.
>
> The simplified version is that hadoop-2 isn't baked as it stands today, and is not viable to be supported by this community in a stable manner. In particular, it is due to the move to PB for HDFS protocols and the freshly minted YARN apis/protocols. As a result, we have been forced to make (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2 etc.). Since we released the previous bits we have found security issues, bugs and other issues which will cause long-term maintenance harm (details are in the HADOOP/HDFS/YARN jiras in the original email).
>
> My aim, as the RM, is to try nudge (nay, force) all contributors to spend time over the next couple of months focussing on fixing known issues and to look for other surprises - this way I hope to ensure we do not have further incompatible changes for downstream projects and we can support hadoop-2 for at least a couple of years. I hope this makes sense to you. I don't think turning around and calling these 3.x or 4.x makes things better since no amount of numbering lipstick will make the software better or viable for the long-term for both users and other projects. Worse, it will force HBase and other projects to deal with *even more* major Hadoop releases... which seems like a royal pita.
>
> I hope that clarifies things. Thanks Stack.
>
> Arun
>

Re: Release numbering for branch-2 releases

Posted by Tom White <to...@cloudera.com>.

Possibly the reason for Stack's consternation is that this is a
Hadoop-specific versioning scheme, rather than a standard one like
Semantic Versioning (http://semver.org/) which is more widely
understood.

With that scheme we would have something like

  2.0.0-alpha, 2.0.0-alpha.1, 2.0.0-alpha.2, 2.0.0-alpha.3, 2.0.0-beta, 2.0.0

so that the alpha and beta tags all precede the 2.0.0 GA release,
which is the one that we make compatibility promises for.

Whereas Arun is proposing

  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0

and the casual observer might expect there to be a stable 2.0.1 (say)
on seeing the existence of 2.0.2-alpha.

The first three of these are already released, so I don't think we
could switch to the Semantic Versioning scheme at this stage. We could
for release 3 though.

Tom

On Thu, Jan 31, 2013 at 8:12 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Stack,
>
> On Jan 30, 2013, at 9:25 PM, Stack wrote:
>
>> I find the above opaque and written in a cryptic language that I might grok
>> if I spent a day or two running over cited issues trying to make some
>> distillation of the esotericia debated therein.  If you want feedback from
>> other than the cognescenti, I would suggest a better summation of what all
>> is involved.
>
>
> I apologize if there was too much technical details.
>
> The simplified version is that hadoop-2 isn't baked as it stands today, and is not viable to be supported by this community in a stable manner. In particular, it is due to the move to PB for HDFS protocols and the freshly minted YARN apis/protocols. As a result, we have been forced to make (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2 etc.). Since we released the previous bits we have found security issues, bugs and other issues which will cause long-term maintenance harm (details are in the HADOOP/HDFS/YARN jiras in the original email).
>
> My aim, as the RM, is to try nudge (nay, force) all contributors to spend time over the next couple of months focussing on fixing known issues and to look for other surprises - this way I hope to ensure we do not have further incompatible changes for downstream projects and we can support hadoop-2 for at least a couple of years. I hope this makes sense to you. I don't think turning around and calling these 3.x or 4.x makes things better since no amount of numbering lipstick will make the software better or viable for the long-term for both users and other projects. Worse, it will force HBase and other projects to deal with *even more* major Hadoop releases... which seems like a royal pita.
>
> I hope that clarifies things. Thanks Stack.
>
> Arun
>

Re: Release numbering for branch-2 releases

Posted by Tom White <to...@cloudera.com>.

Possibly the reason for Stack's consternation is that this is a
Hadoop-specific versioning scheme, rather than a standard one like
Semantic Versioning (http://semver.org/) which is more widely
understood.

With that scheme we would have something like

  2.0.0-alpha, 2.0.0-alpha.1, 2.0.0-alpha.2, 2.0.0-alpha.3, 2.0.0-beta, 2.0.0

so that the alpha and beta tags all precede the 2.0.0 GA release,
which is the one that we make compatibility promises for.

Whereas Arun is proposing

  2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0

and the casual observer might expect there to be a stable 2.0.1 (say)
on seeing the existence of 2.0.2-alpha.

The first three of these are already released, so I don't think we
could switch to the Semantic Versioning scheme at this stage. We could
for release 3 though.

Tom

On Thu, Jan 31, 2013 at 8:12 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Stack,
>
> On Jan 30, 2013, at 9:25 PM, Stack wrote:
>
>> I find the above opaque and written in a cryptic language that I might grok
>> if I spent a day or two running over cited issues trying to make some
>> distillation of the esotericia debated therein.  If you want feedback from
>> other than the cognescenti, I would suggest a better summation of what all
>> is involved.
>
>
> I apologize if there was too much technical details.
>
> The simplified version is that hadoop-2 isn't baked as it stands today, and is not viable to be supported by this community in a stable manner. In particular, it is due to the move to PB for HDFS protocols and the freshly minted YARN apis/protocols. As a result, we have been forced to make (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2 etc.). Since we released the previous bits we have found security issues, bugs and other issues which will cause long-term maintenance harm (details are in the HADOOP/HDFS/YARN jiras in the original email).
>
> My aim, as the RM, is to try nudge (nay, force) all contributors to spend time over the next couple of months focussing on fixing known issues and to look for other surprises - this way I hope to ensure we do not have further incompatible changes for downstream projects and we can support hadoop-2 for at least a couple of years. I hope this makes sense to you. I don't think turning around and calling these 3.x or 4.x makes things better since no amount of numbering lipstick will make the software better or viable for the long-term for both users and other projects. Worse, it will force HBase and other projects to deal with *even more* major Hadoop releases... which seems like a royal pita.
>
> I hope that clarifies things. Thanks Stack.
>
> Arun
>

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Thu, Jan 31, 2013 at 12:12 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> I apologize if there was too much technical details.
>
> The simplified version is that hadoop-2 isn't baked as it stands today,
> and is not viable to be supported by this community in a stable manner. In
> particular, it is due to the move to PB for HDFS protocols and the freshly
> minted YARN apis/protocols. As a result, we have been forced to make
> (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2
> etc.). Since we released the previous bits we have found security issues,
> bugs and other issues which will cause long-term maintenance harm (details
> are in the HADOOP/HDFS/YARN jiras in the original email).
>
> My aim, as the RM, is to try nudge (nay, force) all contributors to spend
> time over the next couple of months focussing on fixing known issues and to
> look for other surprises - this way I hope to ensure we do not have further
> incompatible changes for downstream projects and we can support hadoop-2
> for at least a couple of years. I hope this makes sense to you. I don't
> think turning around and calling these 3.x or 4.x makes things better since
> no amount of numbering lipstick will make the software better or viable for
> the long-term for both users and other projects. Worse, it will force HBase
> and other projects to deal with *even more* major Hadoop releases... which
> seems like a royal pita.
>
> I hope that clarifies things. Thanks Stack.
>

Tom above puts his finger on the problem I am having.  It seems that the
'hadoop versioning' is arbitrary, flaunts convention, and on top of that is
without a discernible pattern (2.0.0 is actually going to be 2.3.0?).  It
is also tantalizing as it holds out the promise of a 2.0.0 or a 2.1.0,
etc., but seemingly these will never ship.

Above you call 3.x and 4.x 'numbering liipstick' -- nice one! -- but to
this 'casual observer', IMO, it would be more calling a spade a spade; i.e.
3.x.x, a major version change, has API and possibly wire protocol changes
in it.

Thank you for taking the time to dumb it all down for me Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Thu, Jan 31, 2013 at 12:12 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> I apologize if there was too much technical details.
>
> The simplified version is that hadoop-2 isn't baked as it stands today,
> and is not viable to be supported by this community in a stable manner. In
> particular, it is due to the move to PB for HDFS protocols and the freshly
> minted YARN apis/protocols. As a result, we have been forced to make
> (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2
> etc.). Since we released the previous bits we have found security issues,
> bugs and other issues which will cause long-term maintenance harm (details
> are in the HADOOP/HDFS/YARN jiras in the original email).
>
> My aim, as the RM, is to try nudge (nay, force) all contributors to spend
> time over the next couple of months focussing on fixing known issues and to
> look for other surprises - this way I hope to ensure we do not have further
> incompatible changes for downstream projects and we can support hadoop-2
> for at least a couple of years. I hope this makes sense to you. I don't
> think turning around and calling these 3.x or 4.x makes things better since
> no amount of numbering lipstick will make the software better or viable for
> the long-term for both users and other projects. Worse, it will force HBase
> and other projects to deal with *even more* major Hadoop releases... which
> seems like a royal pita.
>
> I hope that clarifies things. Thanks Stack.
>

Tom above puts his finger on the problem I am having.  It seems that the
'hadoop versioning' is arbitrary, flaunts convention, and on top of that is
without a discernible pattern (2.0.0 is actually going to be 2.3.0?).  It
is also tantalizing as it holds out the promise of a 2.0.0 or a 2.1.0,
etc., but seemingly these will never ship.

Above you call 3.x and 4.x 'numbering liipstick' -- nice one! -- but to
this 'casual observer', IMO, it would be more calling a spade a spade; i.e.
3.x.x, a major version change, has API and possibly wire protocol changes
in it.

Thank you for taking the time to dumb it all down for me Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

Stack,

On Jan 30, 2013, at 9:25 PM, Stack wrote:

> I find the above opaque and written in a cryptic language that I might grok
> if I spent a day or two running over cited issues trying to make some
> distillation of the esotericia debated therein.  If you want feedback from
> other than the cognescenti, I would suggest a better summation of what all
> is involved.  

I apologize if there was too much technical details.

The simplified version is that hadoop-2 isn't baked as it stands today, and is not viable to be supported by this community in a stable manner. In particular, it is due to the move to PB for HDFS protocols and the freshly minted YARN apis/protocols. As a result, we have been forced to make (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2 etc.). Since we released the previous bits we have found security issues, bugs and other issues which will cause long-term maintenance harm (details are in the HADOOP/HDFS/YARN jiras in the original email).

My aim, as the RM, is to try nudge (nay, force) all contributors to spend time over the next couple of months focussing on fixing known issues and to look for other surprises - this way I hope to ensure we do not have further incompatible changes for downstream projects and we can support hadoop-2 for at least a couple of years. I hope this makes sense to you. I don't think turning around and calling these 3.x or 4.x makes things better since no amount of numbering lipstick will make the software better or viable for the long-term for both users and other projects. Worse, it will force HBase and other projects to deal with *even more* major Hadoop releases... which seems like a royal pita. 

I hope that clarifies things. Thanks Stack.

Arun

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

Stack,

On Jan 30, 2013, at 9:25 PM, Stack wrote:

> I find the above opaque and written in a cryptic language that I might grok
> if I spent a day or two running over cited issues trying to make some
> distillation of the esotericia debated therein.  If you want feedback from
> other than the cognescenti, I would suggest a better summation of what all
> is involved.  

I apologize if there was too much technical details.

The simplified version is that hadoop-2 isn't baked as it stands today, and is not viable to be supported by this community in a stable manner. In particular, it is due to the move to PB for HDFS protocols and the freshly minted YARN apis/protocols. As a result, we have been forced to make (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2 etc.). Since we released the previous bits we have found security issues, bugs and other issues which will cause long-term maintenance harm (details are in the HADOOP/HDFS/YARN jiras in the original email).

My aim, as the RM, is to try nudge (nay, force) all contributors to spend time over the next couple of months focussing on fixing known issues and to look for other surprises - this way I hope to ensure we do not have further incompatible changes for downstream projects and we can support hadoop-2 for at least a couple of years. I hope this makes sense to you. I don't think turning around and calling these 3.x or 4.x makes things better since no amount of numbering lipstick will make the software better or viable for the long-term for both users and other projects. Worse, it will force HBase and other projects to deal with *even more* major Hadoop releases... which seems like a royal pita. 

I hope that clarifies things. Thanks Stack.

Arun

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

Stack,

On Jan 30, 2013, at 9:25 PM, Stack wrote:

> I find the above opaque and written in a cryptic language that I might grok
> if I spent a day or two running over cited issues trying to make some
> distillation of the esotericia debated therein.  If you want feedback from
> other than the cognescenti, I would suggest a better summation of what all
> is involved.  

I apologize if there was too much technical details.

The simplified version is that hadoop-2 isn't baked as it stands today, and is not viable to be supported by this community in a stable manner. In particular, it is due to the move to PB for HDFS protocols and the freshly minted YARN apis/protocols. As a result, we have been forced to make (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2 etc.). Since we released the previous bits we have found security issues, bugs and other issues which will cause long-term maintenance harm (details are in the HADOOP/HDFS/YARN jiras in the original email).

My aim, as the RM, is to try nudge (nay, force) all contributors to spend time over the next couple of months focussing on fixing known issues and to look for other surprises - this way I hope to ensure we do not have further incompatible changes for downstream projects and we can support hadoop-2 for at least a couple of years. I hope this makes sense to you. I don't think turning around and calling these 3.x or 4.x makes things better since no amount of numbering lipstick will make the software better or viable for the long-term for both users and other projects. Worse, it will force HBase and other projects to deal with *even more* major Hadoop releases... which seems like a royal pita. 

I hope that clarifies things. Thanks Stack.

Arun

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

Stack,

On Jan 30, 2013, at 9:25 PM, Stack wrote:

> I find the above opaque and written in a cryptic language that I might grok
> if I spent a day or two running over cited issues trying to make some
> distillation of the esotericia debated therein.  If you want feedback from
> other than the cognescenti, I would suggest a better summation of what all
> is involved.  

I apologize if there was too much technical details.

The simplified version is that hadoop-2 isn't baked as it stands today, and is not viable to be supported by this community in a stable manner. In particular, it is due to the move to PB for HDFS protocols and the freshly minted YARN apis/protocols. As a result, we have been forced to make (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2 etc.). Since we released the previous bits we have found security issues, bugs and other issues which will cause long-term maintenance harm (details are in the HADOOP/HDFS/YARN jiras in the original email).

My aim, as the RM, is to try nudge (nay, force) all contributors to spend time over the next couple of months focussing on fixing known issues and to look for other surprises - this way I hope to ensure we do not have further incompatible changes for downstream projects and we can support hadoop-2 for at least a couple of years. I hope this makes sense to you. I don't think turning around and calling these 3.x or 4.x makes things better since no amount of numbering lipstick will make the software better or viable for the long-term for both users and other projects. Worse, it will force HBase and other projects to deal with *even more* major Hadoop releases... which seems like a royal pita. 

I hope that clarifies things. Thanks Stack.

Arun

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Folks,
>
>  There has been some discussions about incompatible changes in the
> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
> few other jiras. Frankly, I'm surprised about some of them since the
> 'alpha' moniker was precisely to harden apis by changing them if necessary,
> borne out by the fact that every  single release in hadoop-2 chain has had
> incompatible changes. This happened since we were releasing early, moving
> fast and breaking things. Furthermore, we'll have more in future as move
> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
> and YARN-142 (api changes) for YARN.
>
>  So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
> the incompatibility a little better. This makes sense to me, as long as we
> are clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
>
>  Going forward, I'd like to start locking down apis/protocols for a 'beta'
> release. This way we'll have one *final* opportunity post
> hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
> call it hadoop-2.2.0-beta.
>
>  Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
> This forces us to do a real effort on making sure we lock down for
> hadoop-2.2.0-beta.
>
>  In summary:
>  # I plan to now release hadoop-2.1.0-alpha (this week).
>  # We make a real effort to lock down apis/protocols and release
> hadoop-2.2.0-beta, say in March.
>  # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>
>  I'll start a separate thread on 'locking protocols' w.r.t
> client-protocols v/s internal protocols (to facilitate rolling upgrades
> etc.), let's discuss this one separately.
>
>  Makes sense?



No.

I find the above opaque and written in a cryptic language that I might grok
if I spent a day or two running over cited issues trying to make some
distillation of the esotericia debated therein.  If you want feedback from
other than the cognescenti, I would suggest a better summation of what all
is involved.  I think jargon is fine for arcane technical discussion but it
seems we are talking basic hadoop versioning here and if I am following at
all, we are talking about possibly breaking API (?) and even wire protocol
inside a major version: i.e. between 2.0.x to 2.3.x say (give or take an
-alpha or -beta suffix thrown in here and there).  Does this have to be?
 Can't we do API changes and wire protocol change off in hadoop 3.x and
4.x, etc.  As is, how is a little ol' downstream project like the one I
work on supposed to cope w/ this plethora of 2.X.X-{alpha,beta,?} with no
each new 2.x possibly a whole new 'experience'?

Thanks Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Folks,
>
>  There has been some discussions about incompatible changes in the
> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
> few other jiras. Frankly, I'm surprised about some of them since the
> 'alpha' moniker was precisely to harden apis by changing them if necessary,
> borne out by the fact that every  single release in hadoop-2 chain has had
> incompatible changes. This happened since we were releasing early, moving
> fast and breaking things. Furthermore, we'll have more in future as move
> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
> and YARN-142 (api changes) for YARN.
>
>  So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
> the incompatibility a little better. This makes sense to me, as long as we
> are clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
>
>  Going forward, I'd like to start locking down apis/protocols for a 'beta'
> release. This way we'll have one *final* opportunity post
> hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
> call it hadoop-2.2.0-beta.
>
>  Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
> This forces us to do a real effort on making sure we lock down for
> hadoop-2.2.0-beta.
>
>  In summary:
>  # I plan to now release hadoop-2.1.0-alpha (this week).
>  # We make a real effort to lock down apis/protocols and release
> hadoop-2.2.0-beta, say in March.
>  # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>
>  I'll start a separate thread on 'locking protocols' w.r.t
> client-protocols v/s internal protocols (to facilitate rolling upgrades
> etc.), let's discuss this one separately.
>
>  Makes sense?



No.

I find the above opaque and written in a cryptic language that I might grok
if I spent a day or two running over cited issues trying to make some
distillation of the esotericia debated therein.  If you want feedback from
other than the cognescenti, I would suggest a better summation of what all
is involved.  I think jargon is fine for arcane technical discussion but it
seems we are talking basic hadoop versioning here and if I am following at
all, we are talking about possibly breaking API (?) and even wire protocol
inside a major version: i.e. between 2.0.x to 2.3.x say (give or take an
-alpha or -beta suffix thrown in here and there).  Does this have to be?
 Can't we do API changes and wire protocol change off in hadoop 3.x and
4.x, etc.  As is, how is a little ol' downstream project like the one I
work on supposed to cope w/ this plethora of 2.X.X-{alpha,beta,?} with no
each new 2.x possibly a whole new 'experience'?

Thanks Arun,
St.Ack

Re: Release numbering for branch-2 releases

Posted by Eli Collins <el...@cloudera.com>.

Thanks for bringing this up Arun.  One of the issues is that we
haven't been clear about what type of compatibility breakages are
allowed, and which are not.  For example, renaming FileSystem#open is
incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
a server/server APIs is OK pre-GA but probably not post GA, at least
in a point release, or required for a security fix, etc.
Configuration, data format, environment variable, changes etc can all
be similarly incompatible. The issue we had in HADOOP-9151 was someone
claimed it is not an incompatible change because it doesn't break API
compatibility even though it breaks wire compatibility. So let's be
clear about the types of incompatibility we are or are not permitting.
 For example, will it be OK to merge a change before 2.2.0-beta that
requires an HDFS metadata upgrade? Or breaks client server wire
compatibility?  I've been assuming that changing an API annotated
Public/Stable still requires multiple major releases (one to deprecate
and one to remove), does the alpha label change that? To some people
the "alpha", "beta" label implies instability in terms of
quality/features, while to others it means unstable APIs (and to some
both) so it would be good to spell that out. In short, agree that we
really need to figure out what changes are permitted in what releases,
and we should update the docs accordingly (there's a start here:
http://wiki.apache.org/hadoop/Roadmap).

Note that the 2.0.0 alpha release vote thread was clear that we
thought were all in agreement that we'd like to keep client/server
compatible post 2.0 - and there was no push back. We pulled a number
of jiras into the 2.0 release explicitly so that we could preserve
client/server compatibility going forward.  Here's the relevant part
of the thread as a refresher: http://s.apache.org/gQ

"2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
envelope in branch-2, but didn't make it into this rc. So, that would
mean that future alphas would not be protocol-compatible with this
alpha. Per a discussion a few weeks ago, I think we all were in
agreement that, if possible, we'd like all 2.x to be compatible for
client-server communication, at least (even if we don't support
cross-version for the intra-cluster protocols)"

Thanks,
Eli

On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Folks,
>
>  There has been some discussions about incompatible changes in the hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few other jiras. Frankly, I'm surprised about some of them since the 'alpha' moniker was precisely to harden apis by changing them if necessary, borne out by the fact that every  single release in hadoop-2 chain has had incompatible changes. This happened since we were releasing early, moving fast and breaking things. Furthermore, we'll have more in future as move towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS and YARN-142 (api changes) for YARN.
>
>  So, rather than debate more, I had a brief chat with Suresh and Todd. Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate the incompatibility a little better. This makes sense to me, as long as we are clear that we won't make any further *feature* releases in hadoop-2.0.x series (obviously we might be forced to do security/bug-fix release).
>
>  Going forward, I'd like to start locking down apis/protocols for a 'beta' release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha to make incompatible changes if necessary and we can call it hadoop-2.2.0-beta.
>
>  Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This forces us to do a real effort on making sure we lock down for hadoop-2.2.0-beta.
>
>  In summary:
>  # I plan to now release hadoop-2.1.0-alpha (this week).
>  # We make a real effort to lock down apis/protocols and release hadoop-2.2.0-beta, say in March.
>  # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>
>  I'll start a separate thread on 'locking protocols' w.r.t client-protocols v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss this one separately.
>
>  Makes sense? Thoughts?
>
> thanks,
> Arun
>
> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make some incompatible changes due to *unforeseen circumstances*, but no more gratuitous changes are allowed.
>

Re: Release numbering for branch-2 releases

Posted by Eli Collins <el...@cloudera.com>.

Thanks for bringing this up Arun.  One of the issues is that we
haven't been clear about what type of compatibility breakages are
allowed, and which are not.  For example, renaming FileSystem#open is
incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
a server/server APIs is OK pre-GA but probably not post GA, at least
in a point release, or required for a security fix, etc.
Configuration, data format, environment variable, changes etc can all
be similarly incompatible. The issue we had in HADOOP-9151 was someone
claimed it is not an incompatible change because it doesn't break API
compatibility even though it breaks wire compatibility. So let's be
clear about the types of incompatibility we are or are not permitting.
 For example, will it be OK to merge a change before 2.2.0-beta that
requires an HDFS metadata upgrade? Or breaks client server wire
compatibility?  I've been assuming that changing an API annotated
Public/Stable still requires multiple major releases (one to deprecate
and one to remove), does the alpha label change that? To some people
the "alpha", "beta" label implies instability in terms of
quality/features, while to others it means unstable APIs (and to some
both) so it would be good to spell that out. In short, agree that we
really need to figure out what changes are permitted in what releases,
and we should update the docs accordingly (there's a start here:
http://wiki.apache.org/hadoop/Roadmap).

Note that the 2.0.0 alpha release vote thread was clear that we
thought were all in agreement that we'd like to keep client/server
compatible post 2.0 - and there was no push back. We pulled a number
of jiras into the 2.0 release explicitly so that we could preserve
client/server compatibility going forward.  Here's the relevant part
of the thread as a refresher: http://s.apache.org/gQ

"2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
envelope in branch-2, but didn't make it into this rc. So, that would
mean that future alphas would not be protocol-compatible with this
alpha. Per a discussion a few weeks ago, I think we all were in
agreement that, if possible, we'd like all 2.x to be compatible for
client-server communication, at least (even if we don't support
cross-version for the intra-cluster protocols)"

Thanks,
Eli

On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Folks,
>
>  There has been some discussions about incompatible changes in the hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few other jiras. Frankly, I'm surprised about some of them since the 'alpha' moniker was precisely to harden apis by changing them if necessary, borne out by the fact that every  single release in hadoop-2 chain has had incompatible changes. This happened since we were releasing early, moving fast and breaking things. Furthermore, we'll have more in future as move towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS and YARN-142 (api changes) for YARN.
>
>  So, rather than debate more, I had a brief chat with Suresh and Todd. Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate the incompatibility a little better. This makes sense to me, as long as we are clear that we won't make any further *feature* releases in hadoop-2.0.x series (obviously we might be forced to do security/bug-fix release).
>
>  Going forward, I'd like to start locking down apis/protocols for a 'beta' release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha to make incompatible changes if necessary and we can call it hadoop-2.2.0-beta.
>
>  Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This forces us to do a real effort on making sure we lock down for hadoop-2.2.0-beta.
>
>  In summary:
>  # I plan to now release hadoop-2.1.0-alpha (this week).
>  # We make a real effort to lock down apis/protocols and release hadoop-2.2.0-beta, say in March.
>  # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>
>  I'll start a separate thread on 'locking protocols' w.r.t client-protocols v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss this one separately.
>
>  Makes sense? Thoughts?
>
> thanks,
> Arun
>
> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make some incompatible changes due to *unforeseen circumstances*, but no more gratuitous changes are allowed.
>

Re: Release numbering for branch-2 releases

Posted by Eli Collins <el...@cloudera.com>.

Thanks for bringing this up Arun.  One of the issues is that we
haven't been clear about what type of compatibility breakages are
allowed, and which are not.  For example, renaming FileSystem#open is
incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
a server/server APIs is OK pre-GA but probably not post GA, at least
in a point release, or required for a security fix, etc.
Configuration, data format, environment variable, changes etc can all
be similarly incompatible. The issue we had in HADOOP-9151 was someone
claimed it is not an incompatible change because it doesn't break API
compatibility even though it breaks wire compatibility. So let's be
clear about the types of incompatibility we are or are not permitting.
 For example, will it be OK to merge a change before 2.2.0-beta that
requires an HDFS metadata upgrade? Or breaks client server wire
compatibility?  I've been assuming that changing an API annotated
Public/Stable still requires multiple major releases (one to deprecate
and one to remove), does the alpha label change that? To some people
the "alpha", "beta" label implies instability in terms of
quality/features, while to others it means unstable APIs (and to some
both) so it would be good to spell that out. In short, agree that we
really need to figure out what changes are permitted in what releases,
and we should update the docs accordingly (there's a start here:
http://wiki.apache.org/hadoop/Roadmap).

Note that the 2.0.0 alpha release vote thread was clear that we
thought were all in agreement that we'd like to keep client/server
compatible post 2.0 - and there was no push back. We pulled a number
of jiras into the 2.0 release explicitly so that we could preserve
client/server compatibility going forward.  Here's the relevant part
of the thread as a refresher: http://s.apache.org/gQ

"2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
envelope in branch-2, but didn't make it into this rc. So, that would
mean that future alphas would not be protocol-compatible with this
alpha. Per a discussion a few weeks ago, I think we all were in
agreement that, if possible, we'd like all 2.x to be compatible for
client-server communication, at least (even if we don't support
cross-version for the intra-cluster protocols)"

Thanks,
Eli

On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Folks,
>
>  There has been some discussions about incompatible changes in the hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few other jiras. Frankly, I'm surprised about some of them since the 'alpha' moniker was precisely to harden apis by changing them if necessary, borne out by the fact that every  single release in hadoop-2 chain has had incompatible changes. This happened since we were releasing early, moving fast and breaking things. Furthermore, we'll have more in future as move towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS and YARN-142 (api changes) for YARN.
>
>  So, rather than debate more, I had a brief chat with Suresh and Todd. Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate the incompatibility a little better. This makes sense to me, as long as we are clear that we won't make any further *feature* releases in hadoop-2.0.x series (obviously we might be forced to do security/bug-fix release).
>
>  Going forward, I'd like to start locking down apis/protocols for a 'beta' release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha to make incompatible changes if necessary and we can call it hadoop-2.2.0-beta.
>
>  Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This forces us to do a real effort on making sure we lock down for hadoop-2.2.0-beta.
>
>  In summary:
>  # I plan to now release hadoop-2.1.0-alpha (this week).
>  # We make a real effort to lock down apis/protocols and release hadoop-2.2.0-beta, say in March.
>  # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>
>  I'll start a separate thread on 'locking protocols' w.r.t client-protocols v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss this one separately.
>
>  Makes sense? Thoughts?
>
> thanks,
> Arun
>
> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make some incompatible changes due to *unforeseen circumstances*, but no more gratuitous changes are allowed.
>

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Fri, Feb 1, 2013 at 3:03 AM, Tom White <to...@cloudera.com> wrote:

> On Wed, Jan 30, 2013 at 11:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
> > I still have a list of pending API/protocol cleanup in YARN that need to
> be
> > in before we even attempt supporting compatibility further down the road.
>
>
YARN requires changing HDFS/MapReduce API/wire-protocol?  Can't it be done
in hadoop 3.x?

> Just caught up with the discussion on the referred JIRAs. I can clearly
> see
> > how a single release with an umbrella alpha/beta tag is causing tensions
> > *only* because we have a single project and product. More reinforcement
> for
> > my proclivity towards separate releases and by extension towards the
> > projects' split.
>
> Good point. There's nothing to stop us doing separate releases of
> sub-project components now. Doing so might help us find
> incompatibilities between the different components in a release line
> (2.x at the moment).
>
>

I like the sound of this.  So, if HDFS, say, went unscathed by the higher
level API and wire-protocol machinations, it could make its way out to a
2.0.0 (or 2.0.4) absent the -beta/-alpha tails?

That'd help us downstreamers (As is, just trying to explain our now
out-of-date hadoop dependency is a couple of pages of the hbase reference
guide [1] -- and we haven't started in on how you'd run against hadoop2).

Thanks,
St.Ack
1. http://hbase.apache.org/book.html#hadoop

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Fri, Feb 1, 2013 at 3:03 AM, Tom White <to...@cloudera.com> wrote:

> On Wed, Jan 30, 2013 at 11:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
> > I still have a list of pending API/protocol cleanup in YARN that need to
> be
> > in before we even attempt supporting compatibility further down the road.
>
>
YARN requires changing HDFS/MapReduce API/wire-protocol?  Can't it be done
in hadoop 3.x?

> Just caught up with the discussion on the referred JIRAs. I can clearly
> see
> > how a single release with an umbrella alpha/beta tag is causing tensions
> > *only* because we have a single project and product. More reinforcement
> for
> > my proclivity towards separate releases and by extension towards the
> > projects' split.
>
> Good point. There's nothing to stop us doing separate releases of
> sub-project components now. Doing so might help us find
> incompatibilities between the different components in a release line
> (2.x at the moment).
>
>

I like the sound of this.  So, if HDFS, say, went unscathed by the higher
level API and wire-protocol machinations, it could make its way out to a
2.0.0 (or 2.0.4) absent the -beta/-alpha tails?

That'd help us downstreamers (As is, just trying to explain our now
out-of-date hadoop dependency is a couple of pages of the hbase reference
guide [1] -- and we haven't started in on how you'd run against hadoop2).

Thanks,
St.Ack
1. http://hbase.apache.org/book.html#hadoop

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Fri, Feb 1, 2013 at 3:03 AM, Tom White <to...@cloudera.com> wrote:

> On Wed, Jan 30, 2013 at 11:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
> > I still have a list of pending API/protocol cleanup in YARN that need to
> be
> > in before we even attempt supporting compatibility further down the road.
>
>
YARN requires changing HDFS/MapReduce API/wire-protocol?  Can't it be done
in hadoop 3.x?

> Just caught up with the discussion on the referred JIRAs. I can clearly
> see
> > how a single release with an umbrella alpha/beta tag is causing tensions
> > *only* because we have a single project and product. More reinforcement
> for
> > my proclivity towards separate releases and by extension towards the
> > projects' split.
>
> Good point. There's nothing to stop us doing separate releases of
> sub-project components now. Doing so might help us find
> incompatibilities between the different components in a release line
> (2.x at the moment).
>
>

I like the sound of this.  So, if HDFS, say, went unscathed by the higher
level API and wire-protocol machinations, it could make its way out to a
2.0.0 (or 2.0.4) absent the -beta/-alpha tails?

That'd help us downstreamers (As is, just trying to explain our now
out-of-date hadoop dependency is a couple of pages of the hbase reference
guide [1] -- and we haven't started in on how you'd run against hadoop2).

Thanks,
St.Ack
1. http://hbase.apache.org/book.html#hadoop

Re: Release numbering for branch-2 releases

Posted by Tom White <to...@cloudera.com>.

On Wed, Jan 30, 2013 at 11:32 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> I still have a list of pending API/protocol cleanup in YARN that need to be
> in before we even attempt supporting compatibility further down the road.

To let others track these it would be useful if they were tagged in
JIRA with a label (e.g. apichange).

> There's no way we can support wire compatibility with the APIs in the state
> that they are in now. So, +1 for a beta sometime in March.
>
> There are some early adopters, I am particularly speaking of YARN, who have
> been instrumental in helping ironing out the alpha software, some with very
> large clusters and end-user base. These users will continue to be affected
> with these API/protocol changes, but the alpha tag was clearly meant to
> clarify this. I think we should graciously send out a note (on general@)
> about an impending beta from where everyone can except a high degree of
> compatibility.
>
> Just caught up with the discussion on the referred JIRAs. I can clearly see
> how a single release with an umbrella alpha/beta tag is causing tensions
> *only* because we have a single project and product. More reinforcement for
> my proclivity towards separate releases and by extension towards the
> projects' split.

Good point. There's nothing to stop us doing separate releases of
sub-project components now. Doing so might help us find
incompatibilities between the different components in a release line
(2.x at the moment).

>
> Thanks,
> +Vinod
>
>
>
> On Tue, Jan 29, 2013 at 2:40 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>
>> Thanks Suresh. Adding back other *-dev lists.
>>
>> On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:
>>
>> > +1 for a release with all the changes that are committed. That way it
>> > carries all the important bug fixes.
>> >
>> >
>> > So, rather than debate more, I had a brief chat with Suresh and Todd.
>> Todd
>> >> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
>> >> incompatibility a little better. This makes sense to me, as long as we
>> are
>> >> clear that we won't make any further *feature* releases in hadoop-2.0.x
>> >> series (obviously we might be forced to do security/bug-fix release).
>> >>
>> >
>> >
>> > We have been incorrectly using point releases to introduce features.
>> Given
>> > there are many features in this release, calling it 2.1.0 instead of
>> 2.0.3
>> > makes sense. As you said, I am okay with the proposed plan as long as we
>> do
>> > not lapse back to introducing new features in point releases meant for
>> > critical bugs.
>>
>>
>>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Tom White <to...@cloudera.com>.

On Wed, Jan 30, 2013 at 11:32 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> I still have a list of pending API/protocol cleanup in YARN that need to be
> in before we even attempt supporting compatibility further down the road.

To let others track these it would be useful if they were tagged in
JIRA with a label (e.g. apichange).

> There's no way we can support wire compatibility with the APIs in the state
> that they are in now. So, +1 for a beta sometime in March.
>
> There are some early adopters, I am particularly speaking of YARN, who have
> been instrumental in helping ironing out the alpha software, some with very
> large clusters and end-user base. These users will continue to be affected
> with these API/protocol changes, but the alpha tag was clearly meant to
> clarify this. I think we should graciously send out a note (on general@)
> about an impending beta from where everyone can except a high degree of
> compatibility.
>
> Just caught up with the discussion on the referred JIRAs. I can clearly see
> how a single release with an umbrella alpha/beta tag is causing tensions
> *only* because we have a single project and product. More reinforcement for
> my proclivity towards separate releases and by extension towards the
> projects' split.

Good point. There's nothing to stop us doing separate releases of
sub-project components now. Doing so might help us find
incompatibilities between the different components in a release line
(2.x at the moment).

>
> Thanks,
> +Vinod
>
>
>
> On Tue, Jan 29, 2013 at 2:40 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>
>> Thanks Suresh. Adding back other *-dev lists.
>>
>> On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:
>>
>> > +1 for a release with all the changes that are committed. That way it
>> > carries all the important bug fixes.
>> >
>> >
>> > So, rather than debate more, I had a brief chat with Suresh and Todd.
>> Todd
>> >> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
>> >> incompatibility a little better. This makes sense to me, as long as we
>> are
>> >> clear that we won't make any further *feature* releases in hadoop-2.0.x
>> >> series (obviously we might be forced to do security/bug-fix release).
>> >>
>> >
>> >
>> > We have been incorrectly using point releases to introduce features.
>> Given
>> > there are many features in this release, calling it 2.1.0 instead of
>> 2.0.3
>> > makes sense. As you said, I am okay with the proposed plan as long as we
>> do
>> > not lapse back to introducing new features in point releases meant for
>> > critical bugs.
>>
>>
>>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Tom White <to...@cloudera.com>.

On Wed, Jan 30, 2013 at 11:32 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> I still have a list of pending API/protocol cleanup in YARN that need to be
> in before we even attempt supporting compatibility further down the road.

To let others track these it would be useful if they were tagged in
JIRA with a label (e.g. apichange).

> There's no way we can support wire compatibility with the APIs in the state
> that they are in now. So, +1 for a beta sometime in March.
>
> There are some early adopters, I am particularly speaking of YARN, who have
> been instrumental in helping ironing out the alpha software, some with very
> large clusters and end-user base. These users will continue to be affected
> with these API/protocol changes, but the alpha tag was clearly meant to
> clarify this. I think we should graciously send out a note (on general@)
> about an impending beta from where everyone can except a high degree of
> compatibility.
>
> Just caught up with the discussion on the referred JIRAs. I can clearly see
> how a single release with an umbrella alpha/beta tag is causing tensions
> *only* because we have a single project and product. More reinforcement for
> my proclivity towards separate releases and by extension towards the
> projects' split.

Good point. There's nothing to stop us doing separate releases of
sub-project components now. Doing so might help us find
incompatibilities between the different components in a release line
(2.x at the moment).

>
> Thanks,
> +Vinod
>
>
>
> On Tue, Jan 29, 2013 at 2:40 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>
>> Thanks Suresh. Adding back other *-dev lists.
>>
>> On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:
>>
>> > +1 for a release with all the changes that are committed. That way it
>> > carries all the important bug fixes.
>> >
>> >
>> > So, rather than debate more, I had a brief chat with Suresh and Todd.
>> Todd
>> >> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
>> >> incompatibility a little better. This makes sense to me, as long as we
>> are
>> >> clear that we won't make any further *feature* releases in hadoop-2.0.x
>> >> series (obviously we might be forced to do security/bug-fix release).
>> >>
>> >
>> >
>> > We have been incorrectly using point releases to introduce features.
>> Given
>> > there are many features in this release, calling it 2.1.0 instead of
>> 2.0.3
>> > makes sense. As you said, I am okay with the proposed plan as long as we
>> do
>> > not lapse back to introducing new features in point releases meant for
>> > critical bugs.
>>
>>
>>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Tom White <to...@cloudera.com>.

On Wed, Jan 30, 2013 at 11:32 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> I still have a list of pending API/protocol cleanup in YARN that need to be
> in before we even attempt supporting compatibility further down the road.

To let others track these it would be useful if they were tagged in
JIRA with a label (e.g. apichange).

> There's no way we can support wire compatibility with the APIs in the state
> that they are in now. So, +1 for a beta sometime in March.
>
> There are some early adopters, I am particularly speaking of YARN, who have
> been instrumental in helping ironing out the alpha software, some with very
> large clusters and end-user base. These users will continue to be affected
> with these API/protocol changes, but the alpha tag was clearly meant to
> clarify this. I think we should graciously send out a note (on general@)
> about an impending beta from where everyone can except a high degree of
> compatibility.
>
> Just caught up with the discussion on the referred JIRAs. I can clearly see
> how a single release with an umbrella alpha/beta tag is causing tensions
> *only* because we have a single project and product. More reinforcement for
> my proclivity towards separate releases and by extension towards the
> projects' split.

Good point. There's nothing to stop us doing separate releases of
sub-project components now. Doing so might help us find
incompatibilities between the different components in a release line
(2.x at the moment).

>
> Thanks,
> +Vinod
>
>
>
> On Tue, Jan 29, 2013 at 2:40 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>
>> Thanks Suresh. Adding back other *-dev lists.
>>
>> On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:
>>
>> > +1 for a release with all the changes that are committed. That way it
>> > carries all the important bug fixes.
>> >
>> >
>> > So, rather than debate more, I had a brief chat with Suresh and Todd.
>> Todd
>> >> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
>> >> incompatibility a little better. This makes sense to me, as long as we
>> are
>> >> clear that we won't make any further *feature* releases in hadoop-2.0.x
>> >> series (obviously we might be forced to do security/bug-fix release).
>> >>
>> >
>> >
>> > We have been incorrectly using point releases to introduce features.
>> Given
>> > there are many features in this release, calling it 2.1.0 instead of
>> 2.0.3
>> > makes sense. As you said, I am okay with the proposed plan as long as we
>> do
>> > not lapse back to introducing new features in point releases meant for
>> > critical bugs.
>>
>>
>>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I still have a list of pending API/protocol cleanup in YARN that need to be
in before we even attempt supporting compatibility further down the road.
There's no way we can support wire compatibility with the APIs in the state
that they are in now. So, +1 for a beta sometime in March.

There are some early adopters, I am particularly speaking of YARN, who have
been instrumental in helping ironing out the alpha software, some with very
large clusters and end-user base. These users will continue to be affected
with these API/protocol changes, but the alpha tag was clearly meant to
clarify this. I think we should graciously send out a note (on general@)
about an impending beta from where everyone can except a high degree of
compatibility.

Just caught up with the discussion on the referred JIRAs. I can clearly see
how a single release with an umbrella alpha/beta tag is causing tensions
*only* because we have a single project and product. More reinforcement for
my proclivity towards separate releases and by extension towards the
projects' split.

Thanks,
+Vinod

On Tue, Jan 29, 2013 at 2:40 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Thanks Suresh. Adding back other *-dev lists.
>
> On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:
>
> > +1 for a release with all the changes that are committed. That way it
> > carries all the important bug fixes.
> >
> >
> > So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd
> >> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
> >> incompatibility a little better. This makes sense to me, as long as we
> are
> >> clear that we won't make any further *feature* releases in hadoop-2.0.x
> >> series (obviously we might be forced to do security/bug-fix release).
> >>
> >
> >
> > We have been incorrectly using point releases to introduce features.
> Given
> > there are many features in this release, calling it 2.1.0 instead of
> 2.0.3
> > makes sense. As you said, I am okay with the proposed plan as long as we
> do
> > not lapse back to introducing new features in point releases meant for
> > critical bugs.
>
>
>

-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I still have a list of pending API/protocol cleanup in YARN that need to be
in before we even attempt supporting compatibility further down the road.
There's no way we can support wire compatibility with the APIs in the state
that they are in now. So, +1 for a beta sometime in March.

There are some early adopters, I am particularly speaking of YARN, who have
been instrumental in helping ironing out the alpha software, some with very
large clusters and end-user base. These users will continue to be affected
with these API/protocol changes, but the alpha tag was clearly meant to
clarify this. I think we should graciously send out a note (on general@)
about an impending beta from where everyone can except a high degree of
compatibility.

Just caught up with the discussion on the referred JIRAs. I can clearly see
how a single release with an umbrella alpha/beta tag is causing tensions
*only* because we have a single project and product. More reinforcement for
my proclivity towards separate releases and by extension towards the
projects' split.

Thanks,
+Vinod

On Tue, Jan 29, 2013 at 2:40 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Thanks Suresh. Adding back other *-dev lists.
>
> On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:
>
> > +1 for a release with all the changes that are committed. That way it
> > carries all the important bug fixes.
> >
> >
> > So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd
> >> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
> >> incompatibility a little better. This makes sense to me, as long as we
> are
> >> clear that we won't make any further *feature* releases in hadoop-2.0.x
> >> series (obviously we might be forced to do security/bug-fix release).
> >>
> >
> >
> > We have been incorrectly using point releases to introduce features.
> Given
> > there are many features in this release, calling it 2.1.0 instead of
> 2.0.3
> > makes sense. As you said, I am okay with the proposed plan as long as we
> do
> > not lapse back to introducing new features in point releases meant for
> > critical bugs.
>
>
>

-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I still have a list of pending API/protocol cleanup in YARN that need to be
in before we even attempt supporting compatibility further down the road.
There's no way we can support wire compatibility with the APIs in the state
that they are in now. So, +1 for a beta sometime in March.

There are some early adopters, I am particularly speaking of YARN, who have
been instrumental in helping ironing out the alpha software, some with very
large clusters and end-user base. These users will continue to be affected
with these API/protocol changes, but the alpha tag was clearly meant to
clarify this. I think we should graciously send out a note (on general@)
about an impending beta from where everyone can except a high degree of
compatibility.

Just caught up with the discussion on the referred JIRAs. I can clearly see
how a single release with an umbrella alpha/beta tag is causing tensions
*only* because we have a single project and product. More reinforcement for
my proclivity towards separate releases and by extension towards the
projects' split.

Thanks,
+Vinod

On Tue, Jan 29, 2013 at 2:40 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Thanks Suresh. Adding back other *-dev lists.
>
> On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:
>
> > +1 for a release with all the changes that are committed. That way it
> > carries all the important bug fixes.
> >
> >
> > So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd
> >> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
> >> incompatibility a little better. This makes sense to me, as long as we
> are
> >> clear that we won't make any further *feature* releases in hadoop-2.0.x
> >> series (obviously we might be forced to do security/bug-fix release).
> >>
> >
> >
> > We have been incorrectly using point releases to introduce features.
> Given
> > there are many features in this release, calling it 2.1.0 instead of
> 2.0.3
> > makes sense. As you said, I am okay with the proposed plan as long as we
> do
> > not lapse back to introducing new features in point releases meant for
> > critical bugs.
>
>
>

-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I still have a list of pending API/protocol cleanup in YARN that need to be
in before we even attempt supporting compatibility further down the road.
There's no way we can support wire compatibility with the APIs in the state
that they are in now. So, +1 for a beta sometime in March.

There are some early adopters, I am particularly speaking of YARN, who have
been instrumental in helping ironing out the alpha software, some with very
large clusters and end-user base. These users will continue to be affected
with these API/protocol changes, but the alpha tag was clearly meant to
clarify this. I think we should graciously send out a note (on general@)
about an impending beta from where everyone can except a high degree of
compatibility.

Just caught up with the discussion on the referred JIRAs. I can clearly see
how a single release with an umbrella alpha/beta tag is causing tensions
*only* because we have a single project and product. More reinforcement for
my proclivity towards separate releases and by extension towards the
projects' split.

Thanks,
+Vinod

On Tue, Jan 29, 2013 at 2:40 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Thanks Suresh. Adding back other *-dev lists.
>
> On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:
>
> > +1 for a release with all the changes that are committed. That way it
> > carries all the important bug fixes.
> >
> >
> > So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd
> >> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
> >> incompatibility a little better. This makes sense to me, as long as we
> are
> >> clear that we won't make any further *feature* releases in hadoop-2.0.x
> >> series (obviously we might be forced to do security/bug-fix release).
> >>
> >
> >
> > We have been incorrectly using point releases to introduce features.
> Given
> > there are many features in this release, calling it 2.1.0 instead of
> 2.0.3
> > makes sense. As you said, I am okay with the proposed plan as long as we
> do
> > not lapse back to introducing new features in point releases meant for
> > critical bugs.
>
>
>

-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

Thanks Suresh. Adding back other *-dev lists.

On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:

> +1 for a release with all the changes that are committed. That way it
> carries all the important bug fixes.
> 
> 
> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd
>> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
>> incompatibility a little better. This makes sense to me, as long as we are
>> clear that we won't make any further *feature* releases in hadoop-2.0.x
>> series (obviously we might be forced to do security/bug-fix release).
>> 
> 
> 
> We have been incorrectly using point releases to introduce features. Given
> there are many features in this release, calling it 2.1.0 instead of 2.0.3
> makes sense. As you said, I am okay with the proposed plan as long as we do
> not lapse back to introducing new features in point releases meant for
> critical bugs.

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

Thanks Suresh. Adding back other *-dev lists.

On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:

> +1 for a release with all the changes that are committed. That way it
> carries all the important bug fixes.
> 
> 
> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd
>> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
>> incompatibility a little better. This makes sense to me, as long as we are
>> clear that we won't make any further *feature* releases in hadoop-2.0.x
>> series (obviously we might be forced to do security/bug-fix release).
>> 
> 
> 
> We have been incorrectly using point releases to introduce features. Given
> there are many features in this release, calling it 2.1.0 instead of 2.0.3
> makes sense. As you said, I am okay with the proposed plan as long as we do
> not lapse back to introducing new features in point releases meant for
> critical bugs.

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

Thanks Suresh. Adding back other *-dev lists.

On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:

> +1 for a release with all the changes that are committed. That way it
> carries all the important bug fixes.
> 
> 
> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd
>> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
>> incompatibility a little better. This makes sense to me, as long as we are
>> clear that we won't make any further *feature* releases in hadoop-2.0.x
>> series (obviously we might be forced to do security/bug-fix release).
>> 
> 
> 
> We have been incorrectly using point releases to introduce features. Given
> there are many features in this release, calling it 2.1.0 instead of 2.0.3
> makes sense. As you said, I am okay with the proposed plan as long as we do
> not lapse back to introducing new features in point releases meant for
> critical bugs.

Re: Release numbering for branch-2 releases

Posted by Arun C Murthy <ac...@hortonworks.com>.

Thanks Suresh. Adding back other *-dev lists.

On Jan 29, 2013, at 1:58 PM, Suresh Srinivas wrote:

> +1 for a release with all the changes that are committed. That way it
> carries all the important bug fixes.
> 
> 
> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd
>> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
>> incompatibility a little better. This makes sense to me, as long as we are
>> clear that we won't make any further *feature* releases in hadoop-2.0.x
>> series (obviously we might be forced to do security/bug-fix release).
>> 
> 
> 
> We have been incorrectly using point releases to introduce features. Given
> there are many features in this release, calling it 2.1.0 instead of 2.0.3
> makes sense. As you said, I am okay with the proposed plan as long as we do
> not lapse back to introducing new features in point releases meant for
> critical bugs.

Re: Release numbering for branch-2 releases

Posted by Suresh Srinivas <su...@hortonworks.com>.

 +1 for a release with all the changes that are committed. That way it
carries all the important bug fixes.


 So, rather than debate more, I had a brief chat with Suresh and Todd. Todd
> suggested calling the next release as hadoop-2.1.0-alpha to indicate the
> incompatibility a little better. This makes sense to me, as long as we are
> clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
>


We have been incorrectly using point releases to introduce features. Given
there are many features in this release, calling it 2.1.0 instead of 2.0.3
makes sense. As you said, I am okay with the proposed plan as long as we do
not lapse back to introducing new features in point releases meant for
critical bugs.

Re: Release numbering for branch-2 releases

Posted by Stack <st...@duboce.net>.

On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Folks,
>
>  There has been some discussions about incompatible changes in the
> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
> few other jiras. Frankly, I'm surprised about some of them since the
> 'alpha' moniker was precisely to harden apis by changing them if necessary,
> borne out by the fact that every  single release in hadoop-2 chain has had
> incompatible changes. This happened since we were releasing early, moving
> fast and breaking things. Furthermore, we'll have more in future as move
> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
> and YARN-142 (api changes) for YARN.
>
>  So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
> the incompatibility a little better. This makes sense to me, as long as we
> are clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
>
>  Going forward, I'd like to start locking down apis/protocols for a 'beta'
> release. This way we'll have one *final* opportunity post
> hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
> call it hadoop-2.2.0-beta.
>
>  Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
> This forces us to do a real effort on making sure we lock down for
> hadoop-2.2.0-beta.
>
>  In summary:
>  # I plan to now release hadoop-2.1.0-alpha (this week).
>  # We make a real effort to lock down apis/protocols and release
> hadoop-2.2.0-beta, say in March.
>  # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>
>  I'll start a separate thread on 'locking protocols' w.r.t
> client-protocols v/s internal protocols (to facilitate rolling upgrades
> etc.), let's discuss this one separately.
>
>  Makes sense?



No.

I find the above opaque and written in a cryptic language that I might grok
if I spent a day or two running over cited issues trying to make some
distillation of the esotericia debated therein.  If you want feedback from
other than the cognescenti, I would suggest a better summation of what all
is involved.  I think jargon is fine for arcane technical discussion but it
seems we are talking basic hadoop versioning here and if I am following at
all, we are talking about possibly breaking API (?) and even wire protocol
inside a major version: i.e. between 2.0.x to 2.3.x say (give or take an
-alpha or -beta suffix thrown in here and there).  Does this have to be?
 Can't we do API changes and wire protocol change off in hadoop 3.x and
4.x, etc.  As is, how is a little ol' downstream project like the one I
work on supposed to cope w/ this plethora of 2.X.X-{alpha,beta,?} with no
each new 2.x possibly a whole new 'experience'?

Thanks Arun,
St.Ack