You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Andrew Wang <an...@cloudera.com> on 2015/03/03 00:19:53 UTC

Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Totally agreed. I just left a comment there on the current state and what is needed. As of now, I think the big (and only?) changes are flipping the default classloader for tasks and splitting the HDFS jar.

Thanks,
+Vinod

On Mar 3, 2015, at 9:02 AM, Steve Loughran <st...@hortonworks.com>> wrote:


I want to understand a lot more about the classpath isolation (HADOOP-11656) proposal, specifically, what is proposed and does it have to be tagged as incompatible? That's a bigger change than must setting javac.version=8 in the POM —though given what a fundamental problem it addresses, I'm in favour of doing something there.


On 3 March 2015 at 08:05:46, Andrew Wang (andrew.wang@cloudera.com<ma...@cloudera.com>) wrote:

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Between:

* removing -finalize
* breaking HDFS browsing
* changing du’s output (in the 2.7 branch)
* changing various names of metrics (either intentionally or otherwise)
* changing the JDK release

… and probably lots of other stuff in branch-2 I haven’t seen/know about, our best course of action is to:

$ git rm hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md

At least this way we as caretakers don’t come across as hypocrits. It’s pretty clear the direction has shown we only care about API compatibility and the rest is ignored when it isn’t “convenient”. [The next time someone tells you that Hadoop is hard to operate, I want you think about this email.] (1)

Making 2.7 build with JDK7 led to the *exact* situation I figured it would: now we have a precedent where we just say to the community “You know those guarantees? Yeah, you might as well ignore them because we’re going to change the core component any damn time we feel like it.”

We haven’t made a release branch off of trunk since branch-0.23. If anyone thinks that’s healthy, there is some beach property in Alberta you might be interested in as well. Our release cycle came to a screeching halt after 0.20 and we’ve never recovered.

However, I offer an alternative.

This same circular argument comes up all the time: (2)

* There aren’t enough changes in trunk to make a new branch.
* We can’t upgrade/change component X because there is no plan to make a new major release.

To quote Frozen: Let It Go

We’re probably at the point where there aren’t likely to be very many more earth shattering changes to the Hadoop code base. The community has decided instead to push these types of changes as separate projects via incubator to avoid the committer paralysis that this community suffers.

Because of this, I don’t think the “enough changes” argument works anymore. Instead, we need to pick a new metric to build a cadence to force regular updates. I’d offer that the “every two years” JDK EOL sets the perfect cadence, matched by many other enterprise and OSS software, and gives us an opportunity to reflect in the version number that the critical component of our software has changed.

This cadence allows for people to plan appropriately and know what our roadmap and direction actually is. Folks are more likely to build “real” solutions rather than make compromises that suffer in quality in the name of compatibility simply because they don’t know when their work will actually show up. We’ll have a normal, regular opportunity to update dependencies (regardless of the state of HADOOP-11656).

Now, if you’ll excuse me, I have more contributor's patches to go through.

(1) FWIW, I made the decision not to worry about backward compatibility in the shell code rewrite when I made the realization that the jsvc log and pid file names were poorly chosen to allow for certain capabilities. Did anyone actually touch them from outside the software? Probably not. But it is still effectively an interface, so off to trunk it went.

(2) … and that’s before we even get to the “Version numbers are cheap” arguments that were made during the Great Renames of 0.20 and 0.23.

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Between:

* removing -finalize
* breaking HDFS browsing
* changing du’s output (in the 2.7 branch)
* changing various names of metrics (either intentionally or otherwise)
* changing the JDK release

… and probably lots of other stuff in branch-2 I haven’t seen/know about, our best course of action is to:

$ git rm hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md

However, I offer an alternative.

This same circular argument comes up all the time: (2)

* There aren’t enough changes in trunk to make a new branch.
* We can’t upgrade/change component X because there is no plan to make a new major release.

To quote Frozen: Let It Go

Now, if you’ll excuse me, I have more contributor's patches to go through.

(2) … and that’s before we even get to the “Version numbers are cheap” arguments that were made during the Great Renames of 0.20 and 0.23.

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Between:

* removing -finalize
* breaking HDFS browsing
* changing du’s output (in the 2.7 branch)
* changing various names of metrics (either intentionally or otherwise)
* changing the JDK release

… and probably lots of other stuff in branch-2 I haven’t seen/know about, our best course of action is to:

$ git rm hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md

However, I offer an alternative.

This same circular argument comes up all the time: (2)

* There aren’t enough changes in trunk to make a new branch.
* We can’t upgrade/change component X because there is no plan to make a new major release.

To quote Frozen: Let It Go

Now, if you’ll excuse me, I have more contributor's patches to go through.

(2) … and that’s before we even get to the “Version numbers are cheap” arguments that were made during the Great Renames of 0.20 and 0.23.

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Between:

* removing -finalize
* breaking HDFS browsing
* changing du’s output (in the 2.7 branch)
* changing various names of metrics (either intentionally or otherwise)
* changing the JDK release

… and probably lots of other stuff in branch-2 I haven’t seen/know about, our best course of action is to:

$ git rm hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md

However, I offer an alternative.

This same circular argument comes up all the time: (2)

* There aren’t enough changes in trunk to make a new branch.
* We can’t upgrade/change component X because there is no plan to make a new major release.

To quote Frozen: Let It Go

Now, if you’ll excuse me, I have more contributor's patches to go through.

(2) … and that’s before we even get to the “Version numbers are cheap” arguments that were made during the Great Renames of 0.20 and 0.23.

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

> On Mar 3, 2015, at 9:36 AM, Karthik Kambatla <ka...@cloudera.com> wrote:
> 
> If we preserve API compat and try to preserve wire compat, I don't see the
> harm in bumping the major release.

If we preserve compatibility, then there is no need to bump major number.
> It allows us to include several
> fixes/features in trunk in a release. If we are not actively thinking of a
> way to release items in trunk, why even have it?

What are the fixes and features in trunk that you would like to see get out quickly?
Can these be back ported easily to branch 2?

sanjay

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

> On Mar 3, 2015, at 9:36 AM, Karthik Kambatla <ka...@cloudera.com> wrote:
> 
> If we preserve API compat and try to preserve wire compat, I don't see the
> harm in bumping the major release.

If we preserve compatibility, then there is no need to bump major number.
> It allows us to include several
> fixes/features in trunk in a release. If we are not actively thinking of a
> way to release items in trunk, why even have it?

What are the fixes and features in trunk that you would like to see get out quickly?
Can these be back ported easily to branch 2?

sanjay

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

> On Mar 3, 2015, at 9:36 AM, Karthik Kambatla <ka...@cloudera.com> wrote:
> 
> If we preserve API compat and try to preserve wire compat, I don't see the
> harm in bumping the major release.

If we preserve compatibility, then there is no need to bump major number.
> It allows us to include several
> fixes/features in trunk in a release. If we are not actively thinking of a
> way to release items in trunk, why even have it?

What are the fixes and features in trunk that you would like to see get out quickly?
Can these be back ported easily to branch 2?

sanjay

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

I am surprised classpath-isolation is being called a minor issue. We have
been hearing users complain about Hadoop leaking its dependencies into the
classpath for a while now, Guava being the culprit often. Not being able to
upgrade our dependencies without affecting users has started to hamper our
development too; e.g. Guava conflict with upgrading Curator version.

If we preserve API compat and try to preserve wire compat, I don't see the
harm in bumping the major release. It allows us to include several
fixes/features in trunk in a release. If we are not actively thinking of a
way to release items in trunk, why even have it?

If there are any disadvantages to doing a major release, I would like to
know. May be, we could arrive at a plan to accomplish it without those
problems.

Thanks
Karthik

On Tue, Mar 3, 2015 at 9:02 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
>  I want to understand a lot more about the classpath isolation
> (HADOOP-11656) proposal, specifically, what is proposed and does it have to
> be tagged as incompatible? That's a bigger change than must setting
> javac.version=8 in the POM —though given what a fundamental problem it
> addresses, I'm in favour of doing something there.
>
> On 3 March 2015 at 08:05:46, Andrew Wang (andrew.wang@cloudera.com) wrote:
>
> I view branch-3 as essentially the same size as our recent 2.x releases,
> with the exception of incompatible changes like classpath isolation and
> JDK8 target version. These, while perhaps not revolutionary, are still
> incompatible, and require a major version bump.
>
>

-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Totally agreed. I just left a comment there on the current state and what is needed. As of now, I think the big (and only?) changes are flipping the default classloader for tasks and splitting the HDFS jar.

Thanks,
+Vinod

On Mar 3, 2015, at 9:02 AM, Steve Loughran <st...@hortonworks.com>> wrote:


I want to understand a lot more about the classpath isolation (HADOOP-11656) proposal, specifically, what is proposed and does it have to be tagged as incompatible? That's a bigger change than must setting javac.version=8 in the POM —though given what a fundamental problem it addresses, I'm in favour of doing something there.


On 3 March 2015 at 08:05:46, Andrew Wang (andrew.wang@cloudera.com<ma...@cloudera.com>) wrote:

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

I am surprised classpath-isolation is being called a minor issue. We have
been hearing users complain about Hadoop leaking its dependencies into the
classpath for a while now, Guava being the culprit often. Not being able to
upgrade our dependencies without affecting users has started to hamper our
development too; e.g. Guava conflict with upgrading Curator version.

If we preserve API compat and try to preserve wire compat, I don't see the
harm in bumping the major release. It allows us to include several
fixes/features in trunk in a release. If we are not actively thinking of a
way to release items in trunk, why even have it?

If there are any disadvantages to doing a major release, I would like to
know. May be, we could arrive at a plan to accomplish it without those
problems.

Thanks
Karthik

On Tue, Mar 3, 2015 at 9:02 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
>  I want to understand a lot more about the classpath isolation
> (HADOOP-11656) proposal, specifically, what is proposed and does it have to
> be tagged as incompatible? That's a bigger change than must setting
> javac.version=8 in the POM —though given what a fundamental problem it
> addresses, I'm in favour of doing something there.
>
> On 3 March 2015 at 08:05:46, Andrew Wang (andrew.wang@cloudera.com) wrote:
>
> I view branch-3 as essentially the same size as our recent 2.x releases,
> with the exception of incompatible changes like classpath isolation and
> JDK8 target version. These, while perhaps not revolutionary, are still
> incompatible, and require a major version bump.
>
>

-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

I am surprised classpath-isolation is being called a minor issue. We have
been hearing users complain about Hadoop leaking its dependencies into the
classpath for a while now, Guava being the culprit often. Not being able to
upgrade our dependencies without affecting users has started to hamper our
development too; e.g. Guava conflict with upgrading Curator version.

If we preserve API compat and try to preserve wire compat, I don't see the
harm in bumping the major release. It allows us to include several
fixes/features in trunk in a release. If we are not actively thinking of a
way to release items in trunk, why even have it?

If there are any disadvantages to doing a major release, I would like to
know. May be, we could arrive at a plan to accomplish it without those
problems.

Thanks
Karthik

On Tue, Mar 3, 2015 at 9:02 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
>  I want to understand a lot more about the classpath isolation
> (HADOOP-11656) proposal, specifically, what is proposed and does it have to
> be tagged as incompatible? That's a bigger change than must setting
> javac.version=8 in the POM —though given what a fundamental problem it
> addresses, I'm in favour of doing something there.
>
> On 3 March 2015 at 08:05:46, Andrew Wang (andrew.wang@cloudera.com) wrote:
>
> I view branch-3 as essentially the same size as our recent 2.x releases,
> with the exception of incompatible changes like classpath isolation and
> JDK8 target version. These, while perhaps not revolutionary, are still
> incompatible, and require a major version bump.
>
>

-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Totally agreed. I just left a comment there on the current state and what is needed. As of now, I think the big (and only?) changes are flipping the default classloader for tasks and splitting the HDFS jar.

Thanks,
+Vinod

On Mar 3, 2015, at 9:02 AM, Steve Loughran <st...@hortonworks.com>> wrote:


I want to understand a lot more about the classpath isolation (HADOOP-11656) proposal, specifically, what is proposed and does it have to be tagged as incompatible? That's a bigger change than must setting javac.version=8 in the POM —though given what a fundamental problem it addresses, I'm in favour of doing something there.


On 3 March 2015 at 08:05:46, Andrew Wang (andrew.wang@cloudera.com<ma...@cloudera.com>) wrote:

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

I want to understand a lot more about the classpath isolation (HADOOP-11656) proposal, specifically, what is proposed and does it have to be tagged as incompatible? That's a bigger change than must setting javac.version=8 in the POM —though given what a fundamental problem it addresses, I'm in favour of doing something there.


On 3 March 2015 at 08:05:46, Andrew Wang (andrew.wang@cloudera.com<ma...@cloudera.com>) wrote:

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

I want to understand a lot more about the classpath isolation (HADOOP-11656) proposal, specifically, what is proposed and does it have to be tagged as incompatible? That's a bigger change than must setting javac.version=8 in the POM —though given what a fundamental problem it addresses, I'm in favour of doing something there.


On 3 March 2015 at 08:05:46, Andrew Wang (andrew.wang@cloudera.com<ma...@cloudera.com>) wrote:

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

I want to understand a lot more about the classpath isolation (HADOOP-11656) proposal, specifically, what is proposed and does it have to be tagged as incompatible? That's a bigger change than must setting javac.version=8 in the POM —though given what a fundamental problem it addresses, I'm in favour of doing something there.


On 3 March 2015 at 08:05:46, Andrew Wang (andrew.wang@cloudera.com<ma...@cloudera.com>) wrote:

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Junping, thanks for your response,

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

I don't see a forking of the community effort, since backports should flow
pretty easily from branch-3 to branch-2 the same way they currently can
flow from branch-2 to branch-2.6. It's just an extra git commit, not like
what we had to deal with in the branch-1 days with a custom backport.

Hopefully that addresses your concerns.

Thanks,
Andrew

On Tue, Mar 3, 2015 at 6:12 AM, Junping Du <jd...@hortonworks.com> wrote:

> Thanks all for good discussions here.
> +1 on supporting Java 8 ASAP. In addition, I agree that we should
> separating this effort with cutting down Hadoop 3.
> IMO, Hadoop is still very cool today, and we should only consider Hadoop 3
> until we have revolutionary feature (like YARN for 2.0) which deserve to
> break fundamental compatibilities. Or it may just cause more distractions
> for community effort.
> Just 2 cents.
>
> Thanks,
>
> Junping
> ________________________________________
> From: Akira AJISAKA <aj...@oss.nttdata.co.jp>
> Sent: Tuesday, March 03, 2015 12:04 PM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
>  > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
>  > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
>  > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> Regards,
> Akira
>
> On 3/2/15 15:19, Andrew Wang wrote:
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Junping, thanks for your response,

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

I don't see a forking of the community effort, since backports should flow
pretty easily from branch-3 to branch-2 the same way they currently can
flow from branch-2 to branch-2.6. It's just an extra git commit, not like
what we had to deal with in the branch-1 days with a custom backport.

Hopefully that addresses your concerns.

Thanks,
Andrew

On Tue, Mar 3, 2015 at 6:12 AM, Junping Du <jd...@hortonworks.com> wrote:

> Thanks all for good discussions here.
> +1 on supporting Java 8 ASAP. In addition, I agree that we should
> separating this effort with cutting down Hadoop 3.
> IMO, Hadoop is still very cool today, and we should only consider Hadoop 3
> until we have revolutionary feature (like YARN for 2.0) which deserve to
> break fundamental compatibilities. Or it may just cause more distractions
> for community effort.
> Just 2 cents.
>
> Thanks,
>
> Junping
> ________________________________________
> From: Akira AJISAKA <aj...@oss.nttdata.co.jp>
> Sent: Tuesday, March 03, 2015 12:04 PM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
>  > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
>  > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
>  > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> Regards,
> Akira
>
> On 3/2/15 15:19, Andrew Wang wrote:
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Junping, thanks for your response,

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

I don't see a forking of the community effort, since backports should flow
pretty easily from branch-3 to branch-2 the same way they currently can
flow from branch-2 to branch-2.6. It's just an extra git commit, not like
what we had to deal with in the branch-1 days with a custom backport.

Hopefully that addresses your concerns.

Thanks,
Andrew

On Tue, Mar 3, 2015 at 6:12 AM, Junping Du <jd...@hortonworks.com> wrote:

> Thanks all for good discussions here.
> +1 on supporting Java 8 ASAP. In addition, I agree that we should
> separating this effort with cutting down Hadoop 3.
> IMO, Hadoop is still very cool today, and we should only consider Hadoop 3
> until we have revolutionary feature (like YARN for 2.0) which deserve to
> break fundamental compatibilities. Or it may just cause more distractions
> for community effort.
> Just 2 cents.
>
> Thanks,
>
> Junping
> ________________________________________
> From: Akira AJISAKA <aj...@oss.nttdata.co.jp>
> Sent: Tuesday, March 03, 2015 12:04 PM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
>  > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
>  > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
>  > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> Regards,
> Akira
>
> On 3/2/15 15:19, Andrew Wang wrote:
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Junping, thanks for your response,

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

I don't see a forking of the community effort, since backports should flow
pretty easily from branch-3 to branch-2 the same way they currently can
flow from branch-2 to branch-2.6. It's just an extra git commit, not like
what we had to deal with in the branch-1 days with a custom backport.

Hopefully that addresses your concerns.

Thanks,
Andrew

On Tue, Mar 3, 2015 at 6:12 AM, Junping Du <jd...@hortonworks.com> wrote:

> Thanks all for good discussions here.
> +1 on supporting Java 8 ASAP. In addition, I agree that we should
> separating this effort with cutting down Hadoop 3.
> IMO, Hadoop is still very cool today, and we should only consider Hadoop 3
> until we have revolutionary feature (like YARN for 2.0) which deserve to
> break fundamental compatibilities. Or it may just cause more distractions
> for community effort.
> Just 2 cents.
>
> Thanks,
>
> Junping
> ________________________________________
> From: Akira AJISAKA <aj...@oss.nttdata.co.jp>
> Sent: Tuesday, March 03, 2015 12:04 PM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
>  > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
>  > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
>  > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> Regards,
> Akira
>
> On 3/2/15 15:19, Andrew Wang wrote:
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>
>

Re: Looking to a Hadoop 3 release

Posted by Junping Du <jd...@hortonworks.com>.

Thanks all for good discussions here.
+1 on supporting Java 8 ASAP. In addition, I agree that we should separating this effort with cutting down Hadoop 3. 
IMO, Hadoop is still very cool today, and we should only consider Hadoop 3 until we have revolutionary feature (like YARN for 2.0) which deserve to break fundamental compatibilities. Or it may just cause more distractions for community effort.
Just 2 cents.

Thanks,

Junping
________________________________________
From: Akira AJISAKA <aj...@oss.nttdata.co.jp>
Sent: Tuesday, March 03, 2015 12:04 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

 > classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

 > JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

 > maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Akira, thanks for responding,

On Tue, Mar 3, 2015 at 4:04 AM, Akira AJISAKA <aj...@oss.nttdata.co.jp>
wrote:

> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
> > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
> I'm willing to wait a bit here, but I think even what we have now is worth
kicking the tires, and either the JDK8 target version or classpath
isolation would make it even more compelling.

If you're worried about backport overheads, Konst's proposal of releasing
directly from trunk might be appealing. Needs some more examination though.


> > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
> We can make sure branch-2 runs well under JDK8, but I'm against doing a
target version bump to JDK8 like we're planning to do for JDK7 in a minor
release. As I described in my reply to Arun, that was a special
circumstance, and JDK target version bumps really are deserving of a new
major release.


> > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> The value in releasing alphas right now is not so much for end users, but
for downstream projects which need time to integrate. I don't expect
end-users to really jump on 3.x until the downstreams have also rolled new
releases based on 3.x.

Determining when support for 2.x is over is done by the community. I
personally plan to keep backporting for a while after 3.x GA is released.
If backports to branch-2 tail off, it just takes one committer with the
interest to keep maintaining it. This has been a common thing in HBase for
instance, Lars H maintained 0.92 for a long time because he had the
interest.


> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> There are still a lot of unresolved compatibility and security issues,
especially with cross-filesystem symlinks. We tabled this work before, and
frankly I'm not sure these issues will ever be satisfactorily resolved.
Even today, there are plenty of Unix apps that don't handle symlinks
correctly, and we still lack equivalents of more secure syscalls like
openat() in the first place.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Junping Du <jd...@hortonworks.com>.

Thanks all for good discussions here.
+1 on supporting Java 8 ASAP. In addition, I agree that we should separating this effort with cutting down Hadoop 3. 
IMO, Hadoop is still very cool today, and we should only consider Hadoop 3 until we have revolutionary feature (like YARN for 2.0) which deserve to break fundamental compatibilities. Or it may just cause more distractions for community effort.
Just 2 cents.

Thanks,

Junping
________________________________________
From: Akira AJISAKA <aj...@oss.nttdata.co.jp>
Sent: Tuesday, March 03, 2015 12:04 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

 > classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

 > JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

 > maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Akira, thanks for responding,

On Tue, Mar 3, 2015 at 4:04 AM, Akira AJISAKA <aj...@oss.nttdata.co.jp>
wrote:

> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
> > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
> I'm willing to wait a bit here, but I think even what we have now is worth
kicking the tires, and either the JDK8 target version or classpath
isolation would make it even more compelling.

If you're worried about backport overheads, Konst's proposal of releasing
directly from trunk might be appealing. Needs some more examination though.


> > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
> We can make sure branch-2 runs well under JDK8, but I'm against doing a
target version bump to JDK8 like we're planning to do for JDK7 in a minor
release. As I described in my reply to Arun, that was a special
circumstance, and JDK target version bumps really are deserving of a new
major release.


> > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> The value in releasing alphas right now is not so much for end users, but
for downstream projects which need time to integrate. I don't expect
end-users to really jump on 3.x until the downstreams have also rolled new
releases based on 3.x.

Determining when support for 2.x is over is done by the community. I
personally plan to keep backporting for a while after 3.x GA is released.
If backports to branch-2 tail off, it just takes one committer with the
interest to keep maintaining it. This has been a common thing in HBase for
instance, Lars H maintained 0.92 for a long time because he had the
interest.


> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> There are still a lot of unresolved compatibility and security issues,
especially with cross-filesystem symlinks. We tabled this work before, and
frankly I'm not sure these issues will ever be satisfactorily resolved.
Even today, there are plenty of Unix apps that don't handle symlinks
correctly, and we still lack equivalents of more secure syscalls like
openat() in the first place.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Akira, thanks for responding,

On Tue, Mar 3, 2015 at 4:04 AM, Akira AJISAKA <aj...@oss.nttdata.co.jp>
wrote:

> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
> > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
> I'm willing to wait a bit here, but I think even what we have now is worth
kicking the tires, and either the JDK8 target version or classpath
isolation would make it even more compelling.

If you're worried about backport overheads, Konst's proposal of releasing
directly from trunk might be appealing. Needs some more examination though.


> > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
> We can make sure branch-2 runs well under JDK8, but I'm against doing a
target version bump to JDK8 like we're planning to do for JDK7 in a minor
release. As I described in my reply to Arun, that was a special
circumstance, and JDK target version bumps really are deserving of a new
major release.


> > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> The value in releasing alphas right now is not so much for end users, but
for downstream projects which need time to integrate. I don't expect
end-users to really jump on 3.x until the downstreams have also rolled new
releases based on 3.x.

Determining when support for 2.x is over is done by the community. I
personally plan to keep backporting for a while after 3.x GA is released.
If backports to branch-2 tail off, it just takes one committer with the
interest to keep maintaining it. This has been a common thing in HBase for
instance, Lars H maintained 0.92 for a long time because he had the
interest.


> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> There are still a lot of unresolved compatibility and security issues,
especially with cross-filesystem symlinks. We tabled this work before, and
frankly I'm not sure these issues will ever be satisfactorily resolved.
Even today, there are plenty of Unix apps that don't handle symlinks
correctly, and we still lack equivalents of more secure syscalls like
openat() in the first place.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Junping Du <jd...@hortonworks.com>.

Thanks all for good discussions here.
+1 on supporting Java 8 ASAP. In addition, I agree that we should separating this effort with cutting down Hadoop 3. 
IMO, Hadoop is still very cool today, and we should only consider Hadoop 3 until we have revolutionary feature (like YARN for 2.0) which deserve to break fundamental compatibilities. Or it may just cause more distractions for community effort.
Just 2 cents.

Thanks,

Junping
________________________________________
From: Akira AJISAKA <aj...@oss.nttdata.co.jp>
Sent: Tuesday, March 03, 2015 12:04 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

 > classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

 > JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

 > maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Akira, thanks for responding,

On Tue, Mar 3, 2015 at 4:04 AM, Akira AJISAKA <aj...@oss.nttdata.co.jp>
wrote:

> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
> > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
> I'm willing to wait a bit here, but I think even what we have now is worth
kicking the tires, and either the JDK8 target version or classpath
isolation would make it even more compelling.

If you're worried about backport overheads, Konst's proposal of releasing
directly from trunk might be appealing. Needs some more examination though.


> > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
> We can make sure branch-2 runs well under JDK8, but I'm against doing a
target version bump to JDK8 like we're planning to do for JDK7 in a minor
release. As I described in my reply to Arun, that was a special
circumstance, and JDK target version bumps really are deserving of a new
major release.


> > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> The value in releasing alphas right now is not so much for end users, but
for downstream projects which need time to integrate. I don't expect
end-users to really jump on 3.x until the downstreams have also rolled new
releases based on 3.x.

Determining when support for 2.x is over is done by the community. I
personally plan to keep backporting for a while after 3.x GA is released.
If backports to branch-2 tail off, it just takes one committer with the
interest to keep maintaining it. This has been a common thing in HBase for
instance, Lars H maintained 0.92 for a long time because he had the
interest.


> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> There are still a lot of unresolved compatibility and security issues,
especially with cross-filesystem symlinks. We tabled this work before, and
frankly I'm not sure these issues will ever be satisfactorily resolved.
Even today, there are plenty of Unix apps that don't handle symlinks
correctly, and we still lack equivalents of more secure syscalls like
openat() in the first place.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Junping Du <jd...@hortonworks.com>.

Thanks all for good discussions here.
+1 on supporting Java 8 ASAP. In addition, I agree that we should separating this effort with cutting down Hadoop 3. 
IMO, Hadoop is still very cool today, and we should only consider Hadoop 3 until we have revolutionary feature (like YARN for 2.0) which deserve to break fundamental compatibilities. Or it may just cause more distractions for community effort.
Just 2 cents.

Thanks,

Junping
________________________________________
From: Akira AJISAKA <aj...@oss.nttdata.co.jp>
Sent: Tuesday, March 03, 2015 12:04 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

 > classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

 > JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

 > maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

 > classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

 > JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

 > maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

I'm +1 for a migrate to Java 8 as soon as possible.

That's branch-2 & trunk, as having them on the same language level makes cherrypicking stuff off trunk possible. That's particularly the case for Java 8 as it is the first major change to the language since Java 5.

w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully not as long as the 2.x release process, but you never know.   Which means I expect some more Hadoop 2 releases this year. We need to make the jump there too, get 2.7 out the door and include a roadmap in there to when the java 8+ only event happens across the codebase.


-Steve


ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on the classpath of a maven build. Last time I tried there were some (minor) bits of YARN that wouldn't compile...




On 2 March 2015 at 18:31:00, Arun Murthy (acm@hortonworks.com<ma...@hortonworks.com>) wrote:

Andrew,

Thanks for bringing up this discussion.

I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.

IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount.

Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1).

However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.

Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?

We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.

Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways.

Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.

Overall, my biggest concern is the compatibility story vis-a-vis the benefit.

Thoughts?

thanks,
Arun

________________________________________
From: Andrew Wang <an...@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Agreed. The difference between a 3.0 GA release and a parallel 2.x release line is just JDK8 + a different classpath (potentially isolated) - doesn't sound like a big enough delta warranting the license to break compat.

Thanks,
+Vinod

On Mar 2, 2015, at 6:30 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Andrew,
> 
> Thanks for bringing up this discussion.
> 
> I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.
> 
> IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. 
> 
> Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 
> 
> However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.
> 
> Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?
> 
> We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.
> 
> Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. 
> 
> Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.
> 
> Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 
> 
> Thoughts?
> 
> thanks,
> Arun
> 
> ________________________________________
> From: Andrew Wang <an...@cloudera.com>
> Sent: Monday, March 02, 2015 3:19 PM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Looking to a Hadoop 3 release
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Agreed. The difference between a 3.0 GA release and a parallel 2.x release line is just JDK8 + a different classpath (potentially isolated) - doesn't sound like a big enough delta warranting the license to break compat.

Thanks,
+Vinod

On Mar 2, 2015, at 6:30 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Andrew,
> 
> Thanks for bringing up this discussion.
> 
> I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.
> 
> IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. 
> 
> Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 
> 
> However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.
> 
> Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?
> 
> We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.
> 
> Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. 
> 
> Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.
> 
> Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 
> 
> Thoughts?
> 
> thanks,
> Arun
> 
> ________________________________________
> From: Andrew Wang <an...@cloudera.com>
> Sent: Monday, March 02, 2015 3:19 PM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Looking to a Hadoop 3 release
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

I'm +1 for a migrate to Java 8 as soon as possible.

That's branch-2 & trunk, as having them on the same language level makes cherrypicking stuff off trunk possible. That's particularly the case for Java 8 as it is the first major change to the language since Java 5.

w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully not as long as the 2.x release process, but you never know.   Which means I expect some more Hadoop 2 releases this year. We need to make the jump there too, get 2.7 out the door and include a roadmap in there to when the java 8+ only event happens across the codebase.


-Steve


ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on the classpath of a maven build. Last time I tried there were some (minor) bits of YARN that wouldn't compile...




On 2 March 2015 at 18:31:00, Arun Murthy (acm@hortonworks.com<ma...@hortonworks.com>) wrote:

Andrew,

Thanks for bringing up this discussion.

I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.

IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount.

Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1).

However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.

Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?

We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.

Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways.

Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.

Overall, my biggest concern is the compatibility story vis-a-vis the benefit.

Thoughts?

thanks,
Arun

________________________________________
From: Andrew Wang <an...@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

I'm +1 for a migrate to Java 8 as soon as possible.

That's branch-2 & trunk, as having them on the same language level makes cherrypicking stuff off trunk possible. That's particularly the case for Java 8 as it is the first major change to the language since Java 5.

w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully not as long as the 2.x release process, but you never know.   Which means I expect some more Hadoop 2 releases this year. We need to make the jump there too, get 2.7 out the door and include a roadmap in there to when the java 8+ only event happens across the codebase.


-Steve


ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on the classpath of a maven build. Last time I tried there were some (minor) bits of YARN that wouldn't compile...




On 2 March 2015 at 18:31:00, Arun Murthy (acm@hortonworks.com<ma...@hortonworks.com>) wrote:

Andrew,

Thanks for bringing up this discussion.

I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.

IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount.

Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1).

However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.

Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?

We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.

Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways.

Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.

Overall, my biggest concern is the compatibility story vis-a-vis the benefit.

Thoughts?

thanks,
Arun

________________________________________
From: Andrew Wang <an...@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Agreed. The difference between a 3.0 GA release and a parallel 2.x release line is just JDK8 + a different classpath (potentially isolated) - doesn't sound like a big enough delta warranting the license to break compat.

Thanks,
+Vinod

On Mar 2, 2015, at 6:30 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Andrew,
> 
> Thanks for bringing up this discussion.
> 
> I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.
> 
> IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. 
> 
> Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 
> 
> However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.
> 
> Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?
> 
> We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.
> 
> Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. 
> 
> Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.
> 
> Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 
> 
> Thoughts?
> 
> thanks,
> Arun
> 
> ________________________________________
> From: Andrew Wang <an...@cloudera.com>
> Sent: Monday, March 02, 2015 3:19 PM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Looking to a Hadoop 3 release
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Moving to JDK8 involves a lot of things
 (1) Get Hadoop apps to be able to run on JDK8 and chose JDK8 language features. This is already possible with the decoupling of apps from the platform.
 (2) Get the platform to run on JDK8. This can be done so that we can run Hadoop on both JDK8 and JDK7 without any compatibility issues. This in itself is a huge move, what with potential GC behavior changes, native library compat etc.
 (3) Get the platform to use JDK8 language features. As much as I love the new stuff in JDK8, I'm willing to postpone usage of the language features in the platform till the time when JDK8 is already in full force.

So, how about we do (1) + (2) for now, get JDK8 going and then come around to make the decision of dropping support for JDK7? This is no different from what we did for the adoption of JDK7. For a bit of time (2/3 releases?), we were able to run on both JDK6 and JDK7 and we are phasing out JDK6 only when most of the community stopped using it.

Thanks,
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com> wrote:
>> Given that we already agreed to put in JDK7 in 2.7, and that the
>> classpath is a fairly minor irritant given some existing solutions (e.g. a
>> new default classloader), how do you quantify the benefit for users?
>> 
>> I looked at our thread on this topic from last time, and we (meaning at
> least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
> 2.x for practical reasons. We waited for so long that we had some assurance
> JDK6 was on the outs. Multiple distros also already had bumped their min
> version to JDK7. This is not true this time around. Bumping the JDK version
> is hugely impactful on the end user, and my email on the earlier thread
> still reflects my thoughts on JDK compatibility:
> 
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E
> 
>> .....

> Right now, the incompatible changes would be JDK8, classpath isolation, and
> whatever is already in trunk. I can audit these existing trunk changes when
> branch-3 is cut.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

On Mar 6, 2015, at 5:20 PM, Chris Douglas <cd...@apache.org> wrote:

> On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
>> I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.
> 
> This is a useful exercise, but not a prerequisite to releasing 3.0.0
> as an alpha off of trunk, right? Andrew summarized the operating
> assumptions for anyone working on it: rolling upgrades still work,
> wire compat is preserved, breaking changes may get rolled back when
> branch-3 is in beta (so be very conservative, notify others loudly).
> This applies to branches merged to trunk, also.

Not a prerequisite for alpha releases, yes. But it will be for a 'GA' release, because after that we will be back to restricting incompatible changes on 3.x line and we have to say no to features that need API breakage after that. If others feel there are features that warrant incompatibility, we should hear about them for inclusion in such a 3.x release. Till now, the operating assumption was to not break anything as much as possible. If we are opening the window on incompatibilities in 3.x, might as well get everyone to think about stuff that they want.

>> +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.
> 
> We'll have this discussion again. We don't need to reach consensus on
> the roadmap, just that each artifact reflects the output of the
> project.

Agreed. I wasn't requesting us to reach a consensus on the roadmap. Just requesting others to put their wish list up.

>> Irrespective of that, here is my proposal in the interim:
>> - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
>> - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.
> 
> +1 for 2.x, but again I don't understand the sequencing. -C

There isn't. I was saying "Irrespective of that"..

Thanks,
+Vinod

Re: Looking to a Hadoop 3 release

Posted by Eric Yang <er...@gmail.com>.

Some lesson learn during 2.x.  WebHDFS, HDFS ACL, QJM HA, rolling upgrade
are great features.  Mapreduce 1.x uses resources more efficiently,
containers have rigid constraint, and applications get killed prematurely.
When a node has a lot of containers, YARN takes significant amount of
system resources.  Existing daemon based application to run on top of YARN
without code change is impossible.  It is difficult to pinpoint where
services will run.  Extra routing of client to server code needs to be
written for the application.  Hence, the existing map reduce approach to
spawn off parallelized work load and output result in durable file system
is better.  Client serving service doesn't need to track states but read
from hdfs.  Hence some level of HA for external serving service can achieve
without YARN.  Slider provides a better interface for exposing API to
deploy applications.

It would be nice to support the following in 3.x:

- JDK 8
- Upgrade to most recent version of Jetty, most hang problems or busy cpu
problems comes from Jetty 6.1.x being incompatible with JDK 7 in NIO design.
- Improve default security, there is a gap where Default Container Executor
vs Linux Container Executor.  It would be nicer if default security uses
Linux Container Executor to ensure developer remember to run with doAs when
designing services to run on top of Hadoop.
- Since 3.x is a major release number change.  There maybe backward
compatible API breakage initially in order to gain new functionality.  The
backward compatible patches can be added over time.
- Reduce YARN framework resource usage
- Improve usability of YARN UI.  Drill down from application to container
then back to application view is almost unusable.
- Smarter strategy for containers placement.  Some call this anti-affinity
support for YARN, but there is only a few types to support.  The identified
ones are: shared, silo, and dedicated.  In shared, containers can co-locate
on same node.  In silo, where same type of container can only spawn one per
node.  Dedicated will reserve the entire node for this workload.


regards,
Eric

On Fri, Mar 6, 2015 at 5:20 PM, Chris Douglas <cd...@apache.org> wrote:

> On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
> > I'd encourage everyone to post their wish list on the Roadmap wiki that
> *warrants* making incompatible changes forcing us to go 3.x.
>
> This is a useful exercise, but not a prerequisite to releasing 3.0.0
> as an alpha off of trunk, right? Andrew summarized the operating
> assumptions for anyone working on it: rolling upgrades still work,
> wire compat is preserved, breaking changes may get rolled back when
> branch-3 is in beta (so be very conservative, notify others loudly).
> This applies to branches merged to trunk, also.
>
> > +1 to Jason's comments on general. We can keep rolling alphas that
> downstream can pick up, but I'd also like us to clarify the exit criterion
> for a GA release of 3.0 and its relation to the life of 2.x if we are going
> this route. This brings us back to the roadmap discussion, and a collective
> agreement about a logical step at a future point in time where we say we
> have enough incompatible features in 3.x that we can stop putting more of
> them and start stabilizing it.
>
> We'll have this discussion again. We don't need to reach consensus on
> the roadmap, just that each artifact reflects the output of the
> project.
>
> > Irrespective of that, here is my proposal in the interim:
> >  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before
> for atleast two releases in branch-2: say 2.8 and 2.9 before we consider
> taking up the gauntlet on 3.0.
> >  - Continue working on the classpath isolation effort and try making it
> as compatible as is possible for users to opt in and migrate easily.
>
> +1 for 2.x, but again I don't understand the sequencing. -C
>
> > On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID>
> wrote:
> >
> >> I'm OK with a 3.0.0 release as long as we are minimizing the pain of
> maintaining yet another release line and conscious of the incompatibilities
> going into that release line.
> >> For the former, I would really rather not see a branch-3 cut so soon.
> It's yet another line onto which to cherry-pick, and I don't see why we
> need to add this overhead at such an early phase.  We should only create
> branch-3 when there's an incompatible change that the community wants and
> it should _not_ go into the next major release (i.e.: it's for Hadoop
> 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk
> in the interim.  IMHO we need to stop treating trunk as a place to exile
> patches.
> >>
> >> For the latter, I think as a community we need to evaluate the benefits
> of breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> >> To summarize:+1 for a community-discussed roadmap of what we're
> breaking in Hadoop 3 and why it's worth it for users
> >> -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> >> +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> >> Jason
> >>      From: Andrew Wang <an...@cloudera.com>
> >> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
> >> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "
> yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
> >> Sent: Wednesday, March 4, 2015 12:15 PM
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> Let's not dismiss this quite so handily.
> >>
> >> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while
> we
> >> could make classpath isolation opt-in via configuration, what we really
> >> want longer term is to have it on by default (or just always on). Stack
> in
> >> particular points out the practical difficulties in using an opt-in
> method
> >> in 2.x from a downstream project perspective. It's not pretty.
> >>
> >> The plan that both Sean and Jason propose (which I support) is to have
> an
> >> opt-in solution in 2.x, bake it there, then turn it on by default
> >> (incompatible) in a new major release. I think this lines up well with
> my
> >> proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> >> to help with 2.x release management if that would help with testing this
> >> feature.
> >>
> >> Even setting aside classpath isolation, a new major release is still
> >> justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> >> historically the voice of the user in our community, just highlighted
> it as
> >> a major compatibility issue, and myself and Tucu have also expressed our
> >> very strong concerns about bumping this in a minor release. 2.7's bump
> is a
> >> unique exception, but this is not something to be cited as precedent or
> >> policy.
> >>
> >> Where does this resistance to a new major release stem from? As I've
> >> described from the beginning, this will look basically like a 2.x
> release,
> >> except for the inclusion of classpath isolation by default and target
> >> version JDK8. I've expressed my desire to maintain API and wire
> >> compatibility, and we can audit the set of incompatible changes in
> trunk to
> >> ensure this. My proposal for doing alpha and beta releases leading up
> to GA
> >> also gives downstreams a nice amount of time for testing and validation.
> >>
> >> Regards,
> >> Andrew
> >>
> >>
> >>
> >> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com>
> wrote:
> >>
> >>> Awesome, looks like we can just do this in a compatible manner -
> nothing
> >>> else on the list seems like it warrants a (premature) major release.
> >>>
> >>> Thanks Vinod.
> >>>
> >>> Arun
> >>>
> >>> ________________________________________
> >>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> >>> Sent: Tuesday, March 03, 2015 2:30 PM
> >>> To: common-dev@hadoop.apache.org
> >>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> >>> yarn-dev@hadoop.apache.org
> >>> Subject: Re: Looking to a Hadoop 3 release
> >>>
> >>> I started pitching in more on that JIRA.
> >>>
> >>> To add, I think we can and should strive for doing this in a compatible
> >>> manner, whatever the approach. Marking and calling it incompatible
> before
> >>> we see proposal/patch seems premature to me. Commented the same on
> JIRA:
> >>>
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> >>> .
> >>>
> >>> Thanks
> >>> +Vinod
> >>>
> >>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com
> <mailto:
> >>> andrew.wang@cloudera.com>> wrote:
> >>>
> >>> Regarding classpath isolation, based on what I hear from our customers,
> >>> it's still a big problem (even after the MR classloader work). The
> latest
> >>> Jackson version bump was quite painful for our downstream projects,
> and the
> >>> HDFS client still leaks a lot of dependencies. Would welcome more
> >>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
> already
> >>> chimed in.
> >>>
> >>>
> >>
> >>
> >
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

On Mar 6, 2015, at 5:20 PM, Chris Douglas <cd...@apache.org> wrote:

> On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
>> I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.
> 
> This is a useful exercise, but not a prerequisite to releasing 3.0.0
> as an alpha off of trunk, right? Andrew summarized the operating
> assumptions for anyone working on it: rolling upgrades still work,
> wire compat is preserved, breaking changes may get rolled back when
> branch-3 is in beta (so be very conservative, notify others loudly).
> This applies to branches merged to trunk, also.

Not a prerequisite for alpha releases, yes. But it will be for a 'GA' release, because after that we will be back to restricting incompatible changes on 3.x line and we have to say no to features that need API breakage after that. If others feel there are features that warrant incompatibility, we should hear about them for inclusion in such a 3.x release. Till now, the operating assumption was to not break anything as much as possible. If we are opening the window on incompatibilities in 3.x, might as well get everyone to think about stuff that they want.

>> +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.
> 
> We'll have this discussion again. We don't need to reach consensus on
> the roadmap, just that each artifact reflects the output of the
> project.

Agreed. I wasn't requesting us to reach a consensus on the roadmap. Just requesting others to put their wish list up.

>> Irrespective of that, here is my proposal in the interim:
>> - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
>> - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.
> 
> +1 for 2.x, but again I don't understand the sequencing. -C

There isn't. I was saying "Irrespective of that"..

Thanks,
+Vinod

Re: Looking to a Hadoop 3 release

Posted by Eric Yang <er...@gmail.com>.

Some lesson learn during 2.x.  WebHDFS, HDFS ACL, QJM HA, rolling upgrade
are great features.  Mapreduce 1.x uses resources more efficiently,
containers have rigid constraint, and applications get killed prematurely.
When a node has a lot of containers, YARN takes significant amount of
system resources.  Existing daemon based application to run on top of YARN
without code change is impossible.  It is difficult to pinpoint where
services will run.  Extra routing of client to server code needs to be
written for the application.  Hence, the existing map reduce approach to
spawn off parallelized work load and output result in durable file system
is better.  Client serving service doesn't need to track states but read
from hdfs.  Hence some level of HA for external serving service can achieve
without YARN.  Slider provides a better interface for exposing API to
deploy applications.

It would be nice to support the following in 3.x:

- JDK 8
- Upgrade to most recent version of Jetty, most hang problems or busy cpu
problems comes from Jetty 6.1.x being incompatible with JDK 7 in NIO design.
- Improve default security, there is a gap where Default Container Executor
vs Linux Container Executor.  It would be nicer if default security uses
Linux Container Executor to ensure developer remember to run with doAs when
designing services to run on top of Hadoop.
- Since 3.x is a major release number change.  There maybe backward
compatible API breakage initially in order to gain new functionality.  The
backward compatible patches can be added over time.
- Reduce YARN framework resource usage
- Improve usability of YARN UI.  Drill down from application to container
then back to application view is almost unusable.
- Smarter strategy for containers placement.  Some call this anti-affinity
support for YARN, but there is only a few types to support.  The identified
ones are: shared, silo, and dedicated.  In shared, containers can co-locate
on same node.  In silo, where same type of container can only spawn one per
node.  Dedicated will reserve the entire node for this workload.


regards,
Eric

On Fri, Mar 6, 2015 at 5:20 PM, Chris Douglas <cd...@apache.org> wrote:

> On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
> > I'd encourage everyone to post their wish list on the Roadmap wiki that
> *warrants* making incompatible changes forcing us to go 3.x.
>
> This is a useful exercise, but not a prerequisite to releasing 3.0.0
> as an alpha off of trunk, right? Andrew summarized the operating
> assumptions for anyone working on it: rolling upgrades still work,
> wire compat is preserved, breaking changes may get rolled back when
> branch-3 is in beta (so be very conservative, notify others loudly).
> This applies to branches merged to trunk, also.
>
> > +1 to Jason's comments on general. We can keep rolling alphas that
> downstream can pick up, but I'd also like us to clarify the exit criterion
> for a GA release of 3.0 and its relation to the life of 2.x if we are going
> this route. This brings us back to the roadmap discussion, and a collective
> agreement about a logical step at a future point in time where we say we
> have enough incompatible features in 3.x that we can stop putting more of
> them and start stabilizing it.
>
> We'll have this discussion again. We don't need to reach consensus on
> the roadmap, just that each artifact reflects the output of the
> project.
>
> > Irrespective of that, here is my proposal in the interim:
> >  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before
> for atleast two releases in branch-2: say 2.8 and 2.9 before we consider
> taking up the gauntlet on 3.0.
> >  - Continue working on the classpath isolation effort and try making it
> as compatible as is possible for users to opt in and migrate easily.
>
> +1 for 2.x, but again I don't understand the sequencing. -C
>
> > On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID>
> wrote:
> >
> >> I'm OK with a 3.0.0 release as long as we are minimizing the pain of
> maintaining yet another release line and conscious of the incompatibilities
> going into that release line.
> >> For the former, I would really rather not see a branch-3 cut so soon.
> It's yet another line onto which to cherry-pick, and I don't see why we
> need to add this overhead at such an early phase.  We should only create
> branch-3 when there's an incompatible change that the community wants and
> it should _not_ go into the next major release (i.e.: it's for Hadoop
> 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk
> in the interim.  IMHO we need to stop treating trunk as a place to exile
> patches.
> >>
> >> For the latter, I think as a community we need to evaluate the benefits
> of breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> >> To summarize:+1 for a community-discussed roadmap of what we're
> breaking in Hadoop 3 and why it's worth it for users
> >> -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> >> +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> >> Jason
> >>      From: Andrew Wang <an...@cloudera.com>
> >> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
> >> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "
> yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
> >> Sent: Wednesday, March 4, 2015 12:15 PM
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> Let's not dismiss this quite so handily.
> >>
> >> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while
> we
> >> could make classpath isolation opt-in via configuration, what we really
> >> want longer term is to have it on by default (or just always on). Stack
> in
> >> particular points out the practical difficulties in using an opt-in
> method
> >> in 2.x from a downstream project perspective. It's not pretty.
> >>
> >> The plan that both Sean and Jason propose (which I support) is to have
> an
> >> opt-in solution in 2.x, bake it there, then turn it on by default
> >> (incompatible) in a new major release. I think this lines up well with
> my
> >> proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> >> to help with 2.x release management if that would help with testing this
> >> feature.
> >>
> >> Even setting aside classpath isolation, a new major release is still
> >> justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> >> historically the voice of the user in our community, just highlighted
> it as
> >> a major compatibility issue, and myself and Tucu have also expressed our
> >> very strong concerns about bumping this in a minor release. 2.7's bump
> is a
> >> unique exception, but this is not something to be cited as precedent or
> >> policy.
> >>
> >> Where does this resistance to a new major release stem from? As I've
> >> described from the beginning, this will look basically like a 2.x
> release,
> >> except for the inclusion of classpath isolation by default and target
> >> version JDK8. I've expressed my desire to maintain API and wire
> >> compatibility, and we can audit the set of incompatible changes in
> trunk to
> >> ensure this. My proposal for doing alpha and beta releases leading up
> to GA
> >> also gives downstreams a nice amount of time for testing and validation.
> >>
> >> Regards,
> >> Andrew
> >>
> >>
> >>
> >> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com>
> wrote:
> >>
> >>> Awesome, looks like we can just do this in a compatible manner -
> nothing
> >>> else on the list seems like it warrants a (premature) major release.
> >>>
> >>> Thanks Vinod.
> >>>
> >>> Arun
> >>>
> >>> ________________________________________
> >>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> >>> Sent: Tuesday, March 03, 2015 2:30 PM
> >>> To: common-dev@hadoop.apache.org
> >>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> >>> yarn-dev@hadoop.apache.org
> >>> Subject: Re: Looking to a Hadoop 3 release
> >>>
> >>> I started pitching in more on that JIRA.
> >>>
> >>> To add, I think we can and should strive for doing this in a compatible
> >>> manner, whatever the approach. Marking and calling it incompatible
> before
> >>> we see proposal/patch seems premature to me. Commented the same on
> JIRA:
> >>>
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> >>> .
> >>>
> >>> Thanks
> >>> +Vinod
> >>>
> >>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com
> <mailto:
> >>> andrew.wang@cloudera.com>> wrote:
> >>>
> >>> Regarding classpath isolation, based on what I hear from our customers,
> >>> it's still a big problem (even after the MR classloader work). The
> latest
> >>> Jackson version bump was quite painful for our downstream projects,
> and the
> >>> HDFS client still leaks a lot of dependencies. Would welcome more
> >>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
> already
> >>> chimed in.
> >>>
> >>>
> >>
> >>
> >
>

Re: Looking to a Hadoop 3 release

Posted by Eric Yang <er...@gmail.com>.

Some lesson learn during 2.x.  WebHDFS, HDFS ACL, QJM HA, rolling upgrade
are great features.  Mapreduce 1.x uses resources more efficiently,
containers have rigid constraint, and applications get killed prematurely.
When a node has a lot of containers, YARN takes significant amount of
system resources.  Existing daemon based application to run on top of YARN
without code change is impossible.  It is difficult to pinpoint where
services will run.  Extra routing of client to server code needs to be
written for the application.  Hence, the existing map reduce approach to
spawn off parallelized work load and output result in durable file system
is better.  Client serving service doesn't need to track states but read
from hdfs.  Hence some level of HA for external serving service can achieve
without YARN.  Slider provides a better interface for exposing API to
deploy applications.

It would be nice to support the following in 3.x:

- JDK 8
- Upgrade to most recent version of Jetty, most hang problems or busy cpu
problems comes from Jetty 6.1.x being incompatible with JDK 7 in NIO design.
- Improve default security, there is a gap where Default Container Executor
vs Linux Container Executor.  It would be nicer if default security uses
Linux Container Executor to ensure developer remember to run with doAs when
designing services to run on top of Hadoop.
- Since 3.x is a major release number change.  There maybe backward
compatible API breakage initially in order to gain new functionality.  The
backward compatible patches can be added over time.
- Reduce YARN framework resource usage
- Improve usability of YARN UI.  Drill down from application to container
then back to application view is almost unusable.
- Smarter strategy for containers placement.  Some call this anti-affinity
support for YARN, but there is only a few types to support.  The identified
ones are: shared, silo, and dedicated.  In shared, containers can co-locate
on same node.  In silo, where same type of container can only spawn one per
node.  Dedicated will reserve the entire node for this workload.


regards,
Eric

On Fri, Mar 6, 2015 at 5:20 PM, Chris Douglas <cd...@apache.org> wrote:

> On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
> > I'd encourage everyone to post their wish list on the Roadmap wiki that
> *warrants* making incompatible changes forcing us to go 3.x.
>
> This is a useful exercise, but not a prerequisite to releasing 3.0.0
> as an alpha off of trunk, right? Andrew summarized the operating
> assumptions for anyone working on it: rolling upgrades still work,
> wire compat is preserved, breaking changes may get rolled back when
> branch-3 is in beta (so be very conservative, notify others loudly).
> This applies to branches merged to trunk, also.
>
> > +1 to Jason's comments on general. We can keep rolling alphas that
> downstream can pick up, but I'd also like us to clarify the exit criterion
> for a GA release of 3.0 and its relation to the life of 2.x if we are going
> this route. This brings us back to the roadmap discussion, and a collective
> agreement about a logical step at a future point in time where we say we
> have enough incompatible features in 3.x that we can stop putting more of
> them and start stabilizing it.
>
> We'll have this discussion again. We don't need to reach consensus on
> the roadmap, just that each artifact reflects the output of the
> project.
>
> > Irrespective of that, here is my proposal in the interim:
> >  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before
> for atleast two releases in branch-2: say 2.8 and 2.9 before we consider
> taking up the gauntlet on 3.0.
> >  - Continue working on the classpath isolation effort and try making it
> as compatible as is possible for users to opt in and migrate easily.
>
> +1 for 2.x, but again I don't understand the sequencing. -C
>
> > On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID>
> wrote:
> >
> >> I'm OK with a 3.0.0 release as long as we are minimizing the pain of
> maintaining yet another release line and conscious of the incompatibilities
> going into that release line.
> >> For the former, I would really rather not see a branch-3 cut so soon.
> It's yet another line onto which to cherry-pick, and I don't see why we
> need to add this overhead at such an early phase.  We should only create
> branch-3 when there's an incompatible change that the community wants and
> it should _not_ go into the next major release (i.e.: it's for Hadoop
> 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk
> in the interim.  IMHO we need to stop treating trunk as a place to exile
> patches.
> >>
> >> For the latter, I think as a community we need to evaluate the benefits
> of breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> >> To summarize:+1 for a community-discussed roadmap of what we're
> breaking in Hadoop 3 and why it's worth it for users
> >> -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> >> +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> >> Jason
> >>      From: Andrew Wang <an...@cloudera.com>
> >> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
> >> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "
> yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
> >> Sent: Wednesday, March 4, 2015 12:15 PM
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> Let's not dismiss this quite so handily.
> >>
> >> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while
> we
> >> could make classpath isolation opt-in via configuration, what we really
> >> want longer term is to have it on by default (or just always on). Stack
> in
> >> particular points out the practical difficulties in using an opt-in
> method
> >> in 2.x from a downstream project perspective. It's not pretty.
> >>
> >> The plan that both Sean and Jason propose (which I support) is to have
> an
> >> opt-in solution in 2.x, bake it there, then turn it on by default
> >> (incompatible) in a new major release. I think this lines up well with
> my
> >> proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> >> to help with 2.x release management if that would help with testing this
> >> feature.
> >>
> >> Even setting aside classpath isolation, a new major release is still
> >> justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> >> historically the voice of the user in our community, just highlighted
> it as
> >> a major compatibility issue, and myself and Tucu have also expressed our
> >> very strong concerns about bumping this in a minor release. 2.7's bump
> is a
> >> unique exception, but this is not something to be cited as precedent or
> >> policy.
> >>
> >> Where does this resistance to a new major release stem from? As I've
> >> described from the beginning, this will look basically like a 2.x
> release,
> >> except for the inclusion of classpath isolation by default and target
> >> version JDK8. I've expressed my desire to maintain API and wire
> >> compatibility, and we can audit the set of incompatible changes in
> trunk to
> >> ensure this. My proposal for doing alpha and beta releases leading up
> to GA
> >> also gives downstreams a nice amount of time for testing and validation.
> >>
> >> Regards,
> >> Andrew
> >>
> >>
> >>
> >> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com>
> wrote:
> >>
> >>> Awesome, looks like we can just do this in a compatible manner -
> nothing
> >>> else on the list seems like it warrants a (premature) major release.
> >>>
> >>> Thanks Vinod.
> >>>
> >>> Arun
> >>>
> >>> ________________________________________
> >>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> >>> Sent: Tuesday, March 03, 2015 2:30 PM
> >>> To: common-dev@hadoop.apache.org
> >>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> >>> yarn-dev@hadoop.apache.org
> >>> Subject: Re: Looking to a Hadoop 3 release
> >>>
> >>> I started pitching in more on that JIRA.
> >>>
> >>> To add, I think we can and should strive for doing this in a compatible
> >>> manner, whatever the approach. Marking and calling it incompatible
> before
> >>> we see proposal/patch seems premature to me. Commented the same on
> JIRA:
> >>>
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> >>> .
> >>>
> >>> Thanks
> >>> +Vinod
> >>>
> >>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com
> <mailto:
> >>> andrew.wang@cloudera.com>> wrote:
> >>>
> >>> Regarding classpath isolation, based on what I hear from our customers,
> >>> it's still a big problem (even after the MR classloader work). The
> latest
> >>> Jackson version bump was quite painful for our downstream projects,
> and the
> >>> HDFS client still leaks a lot of dependencies. Would welcome more
> >>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
> already
> >>> chimed in.
> >>>
> >>>
> >>
> >>
> >
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

On Mar 6, 2015, at 5:20 PM, Chris Douglas <cd...@apache.org> wrote:

> On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
>> I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.
> 
> This is a useful exercise, but not a prerequisite to releasing 3.0.0
> as an alpha off of trunk, right? Andrew summarized the operating
> assumptions for anyone working on it: rolling upgrades still work,
> wire compat is preserved, breaking changes may get rolled back when
> branch-3 is in beta (so be very conservative, notify others loudly).
> This applies to branches merged to trunk, also.

Not a prerequisite for alpha releases, yes. But it will be for a 'GA' release, because after that we will be back to restricting incompatible changes on 3.x line and we have to say no to features that need API breakage after that. If others feel there are features that warrant incompatibility, we should hear about them for inclusion in such a 3.x release. Till now, the operating assumption was to not break anything as much as possible. If we are opening the window on incompatibilities in 3.x, might as well get everyone to think about stuff that they want.

>> +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.
> 
> We'll have this discussion again. We don't need to reach consensus on
> the roadmap, just that each artifact reflects the output of the
> project.

Agreed. I wasn't requesting us to reach a consensus on the roadmap. Just requesting others to put their wish list up.

>> Irrespective of that, here is my proposal in the interim:
>> - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
>> - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.
> 
> +1 for 2.x, but again I don't understand the sequencing. -C

There isn't. I was saying "Irrespective of that"..

Thanks,
+Vinod

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

On Mar 6, 2015, at 5:20 PM, Chris Douglas <cd...@apache.org> wrote:

> On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
>> I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.
> 
> This is a useful exercise, but not a prerequisite to releasing 3.0.0
> as an alpha off of trunk, right? Andrew summarized the operating
> assumptions for anyone working on it: rolling upgrades still work,
> wire compat is preserved, breaking changes may get rolled back when
> branch-3 is in beta (so be very conservative, notify others loudly).
> This applies to branches merged to trunk, also.

Not a prerequisite for alpha releases, yes. But it will be for a 'GA' release, because after that we will be back to restricting incompatible changes on 3.x line and we have to say no to features that need API breakage after that. If others feel there are features that warrant incompatibility, we should hear about them for inclusion in such a 3.x release. Till now, the operating assumption was to not break anything as much as possible. If we are opening the window on incompatibilities in 3.x, might as well get everyone to think about stuff that they want.

>> +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.
> 
> We'll have this discussion again. We don't need to reach consensus on
> the roadmap, just that each artifact reflects the output of the
> project.

Agreed. I wasn't requesting us to reach a consensus on the roadmap. Just requesting others to put their wish list up.

>> Irrespective of that, here is my proposal in the interim:
>> - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
>> - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.
> 
> +1 for 2.x, but again I don't understand the sequencing. -C

There isn't. I was saying "Irrespective of that"..

Thanks,
+Vinod

Re: Looking to a Hadoop 3 release

Posted by Eric Yang <er...@gmail.com>.

Some lesson learn during 2.x.  WebHDFS, HDFS ACL, QJM HA, rolling upgrade
are great features.  Mapreduce 1.x uses resources more efficiently,
containers have rigid constraint, and applications get killed prematurely.
When a node has a lot of containers, YARN takes significant amount of
system resources.  Existing daemon based application to run on top of YARN
without code change is impossible.  It is difficult to pinpoint where
services will run.  Extra routing of client to server code needs to be
written for the application.  Hence, the existing map reduce approach to
spawn off parallelized work load and output result in durable file system
is better.  Client serving service doesn't need to track states but read
from hdfs.  Hence some level of HA for external serving service can achieve
without YARN.  Slider provides a better interface for exposing API to
deploy applications.

It would be nice to support the following in 3.x:

- JDK 8
- Upgrade to most recent version of Jetty, most hang problems or busy cpu
problems comes from Jetty 6.1.x being incompatible with JDK 7 in NIO design.
- Improve default security, there is a gap where Default Container Executor
vs Linux Container Executor.  It would be nicer if default security uses
Linux Container Executor to ensure developer remember to run with doAs when
designing services to run on top of Hadoop.
- Since 3.x is a major release number change.  There maybe backward
compatible API breakage initially in order to gain new functionality.  The
backward compatible patches can be added over time.
- Reduce YARN framework resource usage
- Improve usability of YARN UI.  Drill down from application to container
then back to application view is almost unusable.
- Smarter strategy for containers placement.  Some call this anti-affinity
support for YARN, but there is only a few types to support.  The identified
ones are: shared, silo, and dedicated.  In shared, containers can co-locate
on same node.  In silo, where same type of container can only spawn one per
node.  Dedicated will reserve the entire node for this workload.


regards,
Eric

On Fri, Mar 6, 2015 at 5:20 PM, Chris Douglas <cd...@apache.org> wrote:

> On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
> > I'd encourage everyone to post their wish list on the Roadmap wiki that
> *warrants* making incompatible changes forcing us to go 3.x.
>
> This is a useful exercise, but not a prerequisite to releasing 3.0.0
> as an alpha off of trunk, right? Andrew summarized the operating
> assumptions for anyone working on it: rolling upgrades still work,
> wire compat is preserved, breaking changes may get rolled back when
> branch-3 is in beta (so be very conservative, notify others loudly).
> This applies to branches merged to trunk, also.
>
> > +1 to Jason's comments on general. We can keep rolling alphas that
> downstream can pick up, but I'd also like us to clarify the exit criterion
> for a GA release of 3.0 and its relation to the life of 2.x if we are going
> this route. This brings us back to the roadmap discussion, and a collective
> agreement about a logical step at a future point in time where we say we
> have enough incompatible features in 3.x that we can stop putting more of
> them and start stabilizing it.
>
> We'll have this discussion again. We don't need to reach consensus on
> the roadmap, just that each artifact reflects the output of the
> project.
>
> > Irrespective of that, here is my proposal in the interim:
> >  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before
> for atleast two releases in branch-2: say 2.8 and 2.9 before we consider
> taking up the gauntlet on 3.0.
> >  - Continue working on the classpath isolation effort and try making it
> as compatible as is possible for users to opt in and migrate easily.
>
> +1 for 2.x, but again I don't understand the sequencing. -C
>
> > On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID>
> wrote:
> >
> >> I'm OK with a 3.0.0 release as long as we are minimizing the pain of
> maintaining yet another release line and conscious of the incompatibilities
> going into that release line.
> >> For the former, I would really rather not see a branch-3 cut so soon.
> It's yet another line onto which to cherry-pick, and I don't see why we
> need to add this overhead at such an early phase.  We should only create
> branch-3 when there's an incompatible change that the community wants and
> it should _not_ go into the next major release (i.e.: it's for Hadoop
> 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk
> in the interim.  IMHO we need to stop treating trunk as a place to exile
> patches.
> >>
> >> For the latter, I think as a community we need to evaluate the benefits
> of breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> >> To summarize:+1 for a community-discussed roadmap of what we're
> breaking in Hadoop 3 and why it's worth it for users
> >> -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> >> +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> >> Jason
> >>      From: Andrew Wang <an...@cloudera.com>
> >> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
> >> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "
> yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
> >> Sent: Wednesday, March 4, 2015 12:15 PM
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> Let's not dismiss this quite so handily.
> >>
> >> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while
> we
> >> could make classpath isolation opt-in via configuration, what we really
> >> want longer term is to have it on by default (or just always on). Stack
> in
> >> particular points out the practical difficulties in using an opt-in
> method
> >> in 2.x from a downstream project perspective. It's not pretty.
> >>
> >> The plan that both Sean and Jason propose (which I support) is to have
> an
> >> opt-in solution in 2.x, bake it there, then turn it on by default
> >> (incompatible) in a new major release. I think this lines up well with
> my
> >> proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> >> to help with 2.x release management if that would help with testing this
> >> feature.
> >>
> >> Even setting aside classpath isolation, a new major release is still
> >> justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> >> historically the voice of the user in our community, just highlighted
> it as
> >> a major compatibility issue, and myself and Tucu have also expressed our
> >> very strong concerns about bumping this in a minor release. 2.7's bump
> is a
> >> unique exception, but this is not something to be cited as precedent or
> >> policy.
> >>
> >> Where does this resistance to a new major release stem from? As I've
> >> described from the beginning, this will look basically like a 2.x
> release,
> >> except for the inclusion of classpath isolation by default and target
> >> version JDK8. I've expressed my desire to maintain API and wire
> >> compatibility, and we can audit the set of incompatible changes in
> trunk to
> >> ensure this. My proposal for doing alpha and beta releases leading up
> to GA
> >> also gives downstreams a nice amount of time for testing and validation.
> >>
> >> Regards,
> >> Andrew
> >>
> >>
> >>
> >> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com>
> wrote:
> >>
> >>> Awesome, looks like we can just do this in a compatible manner -
> nothing
> >>> else on the list seems like it warrants a (premature) major release.
> >>>
> >>> Thanks Vinod.
> >>>
> >>> Arun
> >>>
> >>> ________________________________________
> >>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> >>> Sent: Tuesday, March 03, 2015 2:30 PM
> >>> To: common-dev@hadoop.apache.org
> >>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> >>> yarn-dev@hadoop.apache.org
> >>> Subject: Re: Looking to a Hadoop 3 release
> >>>
> >>> I started pitching in more on that JIRA.
> >>>
> >>> To add, I think we can and should strive for doing this in a compatible
> >>> manner, whatever the approach. Marking and calling it incompatible
> before
> >>> we see proposal/patch seems premature to me. Commented the same on
> JIRA:
> >>>
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> >>> .
> >>>
> >>> Thanks
> >>> +Vinod
> >>>
> >>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com
> <mailto:
> >>> andrew.wang@cloudera.com>> wrote:
> >>>
> >>> Regarding classpath isolation, based on what I hear from our customers,
> >>> it's still a big problem (even after the MR classloader work). The
> latest
> >>> Jackson version bump was quite painful for our downstream projects,
> and the
> >>> HDFS client still leaks a lot of dependencies. Would welcome more
> >>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
> already
> >>> chimed in.
> >>>
> >>>
> >>
> >>
> >
>

Re: Looking to a Hadoop 3 release

Posted by Chris Douglas <cd...@apache.org>.

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.

This is a useful exercise, but not a prerequisite to releasing 3.0.0
as an alpha off of trunk, right? Andrew summarized the operating
assumptions for anyone working on it: rolling upgrades still work,
wire compat is preserved, breaking changes may get rolled back when
branch-3 is in beta (so be very conservative, notify others loudly).
This applies to branches merged to trunk, also.

> +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.

We'll have this discussion again. We don't need to reach consensus on
the roadmap, just that each artifact reflects the output of the
project.

> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.

+1 for 2.x, but again I don't understand the sequencing. -C

> On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID> wrote:
>
>> I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
>> For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.
>>
>> For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
>> To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
>> -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
>> +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
>> Jason
>>      From: Andrew Wang <an...@cloudera.com>
>> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
>> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
>> Sent: Wednesday, March 4, 2015 12:15 PM
>> Subject: Re: Looking to a Hadoop 3 release
>>
>> Let's not dismiss this quite so handily.
>>
>> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
>> could make classpath isolation opt-in via configuration, what we really
>> want longer term is to have it on by default (or just always on). Stack in
>> particular points out the practical difficulties in using an opt-in method
>> in 2.x from a downstream project perspective. It's not pretty.
>>
>> The plan that both Sean and Jason propose (which I support) is to have an
>> opt-in solution in 2.x, bake it there, then turn it on by default
>> (incompatible) in a new major release. I think this lines up well with my
>> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
>> to help with 2.x release management if that would help with testing this
>> feature.
>>
>> Even setting aside classpath isolation, a new major release is still
>> justified by JDK8. Somehow this is being ignored in the discussion. Allen,
>> historically the voice of the user in our community, just highlighted it as
>> a major compatibility issue, and myself and Tucu have also expressed our
>> very strong concerns about bumping this in a minor release. 2.7's bump is a
>> unique exception, but this is not something to be cited as precedent or
>> policy.
>>
>> Where does this resistance to a new major release stem from? As I've
>> described from the beginning, this will look basically like a 2.x release,
>> except for the inclusion of classpath isolation by default and target
>> version JDK8. I've expressed my desire to maintain API and wire
>> compatibility, and we can audit the set of incompatible changes in trunk to
>> ensure this. My proposal for doing alpha and beta releases leading up to GA
>> also gives downstreams a nice amount of time for testing and validation.
>>
>> Regards,
>> Andrew
>>
>>
>>
>> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
>>
>>> Awesome, looks like we can just do this in a compatible manner - nothing
>>> else on the list seems like it warrants a (premature) major release.
>>>
>>> Thanks Vinod.
>>>
>>> Arun
>>>
>>> ________________________________________
>>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
>>> Sent: Tuesday, March 03, 2015 2:30 PM
>>> To: common-dev@hadoop.apache.org
>>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
>>> yarn-dev@hadoop.apache.org
>>> Subject: Re: Looking to a Hadoop 3 release
>>>
>>> I started pitching in more on that JIRA.
>>>
>>> To add, I think we can and should strive for doing this in a compatible
>>> manner, whatever the approach. Marking and calling it incompatible before
>>> we see proposal/patch seems premature to me. Commented the same on JIRA:
>>> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
>>> .
>>>
>>> Thanks
>>> +Vinod
>>>
>>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
>>> andrew.wang@cloudera.com>> wrote:
>>>
>>> Regarding classpath isolation, based on what I hear from our customers,
>>> it's still a big problem (even after the MR classloader work). The latest
>>> Jackson version bump was quite painful for our downstream projects, and the
>>> HDFS client still leaks a lot of dependencies. Would welcome more
>>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
>>> chimed in.
>>>
>>>
>>
>>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hey Vinod,

I'm roughly okay with that plan. One question though, why gate JDK8 on a
2.8 and 2.9? Based on the status of HADOOP-11090, it sounds like branch-2
already runs okay on JDK8. Our past experience moving from JDK6 to JDK7 was
also very smooth except for JUnit ordering.

As an additional datapoint, Cloudera has already validated CDH5 on JDK8 and
supports it as a runtime:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_req_supported_versions.html?scroll=concept_pdd_kzf_vp_unique_1

Best,
Andrew

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> I'd encourage everyone to post their wish list on the Roadmap wiki that
> *warrants* making incompatible changes forcing us to go 3.x.
>
> +1 to Jason's comments on general. We can keep rolling alphas that
> downstream can pick up, but I'd also like us to clarify the exit criterion
> for a GA release of 3.0 and its relation to the life of 2.x if we are going
> this route. This brings us back to the roadmap discussion, and a collective
> agreement about a logical step at a future point in time where we say we
> have enough incompatible features in 3.x that we can stop putting more of
> them and start stabilizing it.
>
> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before
> for atleast two releases in branch-2: say 2.8 and 2.9 before we consider
> taking up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as
> compatible as is possible for users to opt in and migrate easily.
>
> Thanks,
> +Vinod
>
> On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID>
> wrote:
>
> > I'm OK with a 3.0.0 release as long as we are minimizing the pain of
> maintaining yet another release line and conscious of the incompatibilities
> going into that release line.
> > For the former, I would really rather not see a branch-3 cut so soon.
> It's yet another line onto which to cherry-pick, and I don't see why we
> need to add this overhead at such an early phase.  We should only create
> branch-3 when there's an incompatible change that the community wants and
> it should _not_ go into the next major release (i.e.: it's for Hadoop
> 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk
> in the interim.  IMHO we need to stop treating trunk as a place to exile
> patches.
> >
> > For the latter, I think as a community we need to evaluate the benefits
> of breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> > To summarize:+1 for a community-discussed roadmap of what we're breaking
> in Hadoop 3 and why it's worth it for users
> > -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> > +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> > Jason
> >      From: Andrew Wang <an...@cloudera.com>
> > To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
> > Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "
> yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
> > Sent: Wednesday, March 4, 2015 12:15 PM
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Let's not dismiss this quite so handily.
> >
> > Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> > could make classpath isolation opt-in via configuration, what we really
> > want longer term is to have it on by default (or just always on). Stack
> in
> > particular points out the practical difficulties in using an opt-in
> method
> > in 2.x from a downstream project perspective. It's not pretty.
> >
> > The plan that both Sean and Jason propose (which I support) is to have an
> > opt-in solution in 2.x, bake it there, then turn it on by default
> > (incompatible) in a new major release. I think this lines up well with my
> > proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> > to help with 2.x release management if that would help with testing this
> > feature.
> >
> > Even setting aside classpath isolation, a new major release is still
> > justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> > historically the voice of the user in our community, just highlighted it
> as
> > a major compatibility issue, and myself and Tucu have also expressed our
> > very strong concerns about bumping this in a minor release. 2.7's bump
> is a
> > unique exception, but this is not something to be cited as precedent or
> > policy.
> >
> > Where does this resistance to a new major release stem from? As I've
> > described from the beginning, this will look basically like a 2.x
> release,
> > except for the inclusion of classpath isolation by default and target
> > version JDK8. I've expressed my desire to maintain API and wire
> > compatibility, and we can audit the set of incompatible changes in trunk
> to
> > ensure this. My proposal for doing alpha and beta releases leading up to
> GA
> > also gives downstreams a nice amount of time for testing and validation.
> >
> > Regards,
> > Andrew
> >
> >
> >
> > On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> >
> >> Awesome, looks like we can just do this in a compatible manner - nothing
> >> else on the list seems like it warrants a (premature) major release.
> >>
> >> Thanks Vinod.
> >>
> >> Arun
> >>
> >> ________________________________________
> >> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> >> Sent: Tuesday, March 03, 2015 2:30 PM
> >> To: common-dev@hadoop.apache.org
> >> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> >> yarn-dev@hadoop.apache.org
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> I started pitching in more on that JIRA.
> >>
> >> To add, I think we can and should strive for doing this in a compatible
> >> manner, whatever the approach. Marking and calling it incompatible
> before
> >> we see proposal/patch seems premature to me. Commented the same on JIRA:
> >>
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> >> .
> >>
> >> Thanks
> >> +Vinod
> >>
> >> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com
> <mailto:
> >> andrew.wang@cloudera.com>> wrote:
> >>
> >> Regarding classpath isolation, based on what I hear from our customers,
> >> it's still a big problem (even after the MR classloader work). The
> latest
> >> Jackson version bump was quite painful for our downstream projects, and
> the
> >> HDFS client still leaks a lot of dependencies. Would welcome more
> >> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
> already
> >> chimed in.
> >>
> >>
> >
> >
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hey Vinod,

I'm roughly okay with that plan. One question though, why gate JDK8 on a
2.8 and 2.9? Based on the status of HADOOP-11090, it sounds like branch-2
already runs okay on JDK8. Our past experience moving from JDK6 to JDK7 was
also very smooth except for JUnit ordering.

As an additional datapoint, Cloudera has already validated CDH5 on JDK8 and
supports it as a runtime:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_req_supported_versions.html?scroll=concept_pdd_kzf_vp_unique_1

Best,
Andrew

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> I'd encourage everyone to post their wish list on the Roadmap wiki that
> *warrants* making incompatible changes forcing us to go 3.x.
>
> +1 to Jason's comments on general. We can keep rolling alphas that
> downstream can pick up, but I'd also like us to clarify the exit criterion
> for a GA release of 3.0 and its relation to the life of 2.x if we are going
> this route. This brings us back to the roadmap discussion, and a collective
> agreement about a logical step at a future point in time where we say we
> have enough incompatible features in 3.x that we can stop putting more of
> them and start stabilizing it.
>
> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before
> for atleast two releases in branch-2: say 2.8 and 2.9 before we consider
> taking up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as
> compatible as is possible for users to opt in and migrate easily.
>
> Thanks,
> +Vinod
>
> On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID>
> wrote:
>
> > I'm OK with a 3.0.0 release as long as we are minimizing the pain of
> maintaining yet another release line and conscious of the incompatibilities
> going into that release line.
> > For the former, I would really rather not see a branch-3 cut so soon.
> It's yet another line onto which to cherry-pick, and I don't see why we
> need to add this overhead at such an early phase.  We should only create
> branch-3 when there's an incompatible change that the community wants and
> it should _not_ go into the next major release (i.e.: it's for Hadoop
> 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk
> in the interim.  IMHO we need to stop treating trunk as a place to exile
> patches.
> >
> > For the latter, I think as a community we need to evaluate the benefits
> of breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> > To summarize:+1 for a community-discussed roadmap of what we're breaking
> in Hadoop 3 and why it's worth it for users
> > -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> > +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> > Jason
> >      From: Andrew Wang <an...@cloudera.com>
> > To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
> > Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "
> yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
> > Sent: Wednesday, March 4, 2015 12:15 PM
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Let's not dismiss this quite so handily.
> >
> > Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> > could make classpath isolation opt-in via configuration, what we really
> > want longer term is to have it on by default (or just always on). Stack
> in
> > particular points out the practical difficulties in using an opt-in
> method
> > in 2.x from a downstream project perspective. It's not pretty.
> >
> > The plan that both Sean and Jason propose (which I support) is to have an
> > opt-in solution in 2.x, bake it there, then turn it on by default
> > (incompatible) in a new major release. I think this lines up well with my
> > proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> > to help with 2.x release management if that would help with testing this
> > feature.
> >
> > Even setting aside classpath isolation, a new major release is still
> > justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> > historically the voice of the user in our community, just highlighted it
> as
> > a major compatibility issue, and myself and Tucu have also expressed our
> > very strong concerns about bumping this in a minor release. 2.7's bump
> is a
> > unique exception, but this is not something to be cited as precedent or
> > policy.
> >
> > Where does this resistance to a new major release stem from? As I've
> > described from the beginning, this will look basically like a 2.x
> release,
> > except for the inclusion of classpath isolation by default and target
> > version JDK8. I've expressed my desire to maintain API and wire
> > compatibility, and we can audit the set of incompatible changes in trunk
> to
> > ensure this. My proposal for doing alpha and beta releases leading up to
> GA
> > also gives downstreams a nice amount of time for testing and validation.
> >
> > Regards,
> > Andrew
> >
> >
> >
> > On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> >
> >> Awesome, looks like we can just do this in a compatible manner - nothing
> >> else on the list seems like it warrants a (premature) major release.
> >>
> >> Thanks Vinod.
> >>
> >> Arun
> >>
> >> ________________________________________
> >> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> >> Sent: Tuesday, March 03, 2015 2:30 PM
> >> To: common-dev@hadoop.apache.org
> >> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> >> yarn-dev@hadoop.apache.org
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> I started pitching in more on that JIRA.
> >>
> >> To add, I think we can and should strive for doing this in a compatible
> >> manner, whatever the approach. Marking and calling it incompatible
> before
> >> we see proposal/patch seems premature to me. Commented the same on JIRA:
> >>
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> >> .
> >>
> >> Thanks
> >> +Vinod
> >>
> >> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com
> <mailto:
> >> andrew.wang@cloudera.com>> wrote:
> >>
> >> Regarding classpath isolation, based on what I hear from our customers,
> >> it's still a big problem (even after the MR classloader work). The
> latest
> >> Jackson version bump was quite painful for our downstream projects, and
> the
> >> HDFS client still leaks a lot of dependencies. Would welcome more
> >> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
> already
> >> chimed in.
> >>
> >>
> >
> >
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hey Vinod,

I'm roughly okay with that plan. One question though, why gate JDK8 on a
2.8 and 2.9? Based on the status of HADOOP-11090, it sounds like branch-2
already runs okay on JDK8. Our past experience moving from JDK6 to JDK7 was
also very smooth except for JUnit ordering.

As an additional datapoint, Cloudera has already validated CDH5 on JDK8 and
supports it as a runtime:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_req_supported_versions.html?scroll=concept_pdd_kzf_vp_unique_1

Best,
Andrew

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> I'd encourage everyone to post their wish list on the Roadmap wiki that
> *warrants* making incompatible changes forcing us to go 3.x.
>
> +1 to Jason's comments on general. We can keep rolling alphas that
> downstream can pick up, but I'd also like us to clarify the exit criterion
> for a GA release of 3.0 and its relation to the life of 2.x if we are going
> this route. This brings us back to the roadmap discussion, and a collective
> agreement about a logical step at a future point in time where we say we
> have enough incompatible features in 3.x that we can stop putting more of
> them and start stabilizing it.
>
> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before
> for atleast two releases in branch-2: say 2.8 and 2.9 before we consider
> taking up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as
> compatible as is possible for users to opt in and migrate easily.
>
> Thanks,
> +Vinod
>
> On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID>
> wrote:
>
> > I'm OK with a 3.0.0 release as long as we are minimizing the pain of
> maintaining yet another release line and conscious of the incompatibilities
> going into that release line.
> > For the former, I would really rather not see a branch-3 cut so soon.
> It's yet another line onto which to cherry-pick, and I don't see why we
> need to add this overhead at such an early phase.  We should only create
> branch-3 when there's an incompatible change that the community wants and
> it should _not_ go into the next major release (i.e.: it's for Hadoop
> 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk
> in the interim.  IMHO we need to stop treating trunk as a place to exile
> patches.
> >
> > For the latter, I think as a community we need to evaluate the benefits
> of breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> > To summarize:+1 for a community-discussed roadmap of what we're breaking
> in Hadoop 3 and why it's worth it for users
> > -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> > +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> > Jason
> >      From: Andrew Wang <an...@cloudera.com>
> > To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
> > Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "
> yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
> > Sent: Wednesday, March 4, 2015 12:15 PM
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Let's not dismiss this quite so handily.
> >
> > Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> > could make classpath isolation opt-in via configuration, what we really
> > want longer term is to have it on by default (or just always on). Stack
> in
> > particular points out the practical difficulties in using an opt-in
> method
> > in 2.x from a downstream project perspective. It's not pretty.
> >
> > The plan that both Sean and Jason propose (which I support) is to have an
> > opt-in solution in 2.x, bake it there, then turn it on by default
> > (incompatible) in a new major release. I think this lines up well with my
> > proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> > to help with 2.x release management if that would help with testing this
> > feature.
> >
> > Even setting aside classpath isolation, a new major release is still
> > justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> > historically the voice of the user in our community, just highlighted it
> as
> > a major compatibility issue, and myself and Tucu have also expressed our
> > very strong concerns about bumping this in a minor release. 2.7's bump
> is a
> > unique exception, but this is not something to be cited as precedent or
> > policy.
> >
> > Where does this resistance to a new major release stem from? As I've
> > described from the beginning, this will look basically like a 2.x
> release,
> > except for the inclusion of classpath isolation by default and target
> > version JDK8. I've expressed my desire to maintain API and wire
> > compatibility, and we can audit the set of incompatible changes in trunk
> to
> > ensure this. My proposal for doing alpha and beta releases leading up to
> GA
> > also gives downstreams a nice amount of time for testing and validation.
> >
> > Regards,
> > Andrew
> >
> >
> >
> > On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> >
> >> Awesome, looks like we can just do this in a compatible manner - nothing
> >> else on the list seems like it warrants a (premature) major release.
> >>
> >> Thanks Vinod.
> >>
> >> Arun
> >>
> >> ________________________________________
> >> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> >> Sent: Tuesday, March 03, 2015 2:30 PM
> >> To: common-dev@hadoop.apache.org
> >> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> >> yarn-dev@hadoop.apache.org
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> I started pitching in more on that JIRA.
> >>
> >> To add, I think we can and should strive for doing this in a compatible
> >> manner, whatever the approach. Marking and calling it incompatible
> before
> >> we see proposal/patch seems premature to me. Commented the same on JIRA:
> >>
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> >> .
> >>
> >> Thanks
> >> +Vinod
> >>
> >> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com
> <mailto:
> >> andrew.wang@cloudera.com>> wrote:
> >>
> >> Regarding classpath isolation, based on what I hear from our customers,
> >> it's still a big problem (even after the MR classloader work). The
> latest
> >> Jackson version bump was quite painful for our downstream projects, and
> the
> >> HDFS client still leaks a lot of dependencies. Would welcome more
> >> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
> already
> >> chimed in.
> >>
> >>
> >
> >
>
>

Re: Looking to a Hadoop 3 release

Posted by Chris Douglas <cd...@apache.org>.

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.

This is a useful exercise, but not a prerequisite to releasing 3.0.0
as an alpha off of trunk, right? Andrew summarized the operating
assumptions for anyone working on it: rolling upgrades still work,
wire compat is preserved, breaking changes may get rolled back when
branch-3 is in beta (so be very conservative, notify others loudly).
This applies to branches merged to trunk, also.

> +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.

We'll have this discussion again. We don't need to reach consensus on
the roadmap, just that each artifact reflects the output of the
project.

> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.

+1 for 2.x, but again I don't understand the sequencing. -C

> On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID> wrote:
>
>> I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
>> For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.
>>
>> For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
>> To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
>> -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
>> +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
>> Jason
>>      From: Andrew Wang <an...@cloudera.com>
>> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
>> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
>> Sent: Wednesday, March 4, 2015 12:15 PM
>> Subject: Re: Looking to a Hadoop 3 release
>>
>> Let's not dismiss this quite so handily.
>>
>> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
>> could make classpath isolation opt-in via configuration, what we really
>> want longer term is to have it on by default (or just always on). Stack in
>> particular points out the practical difficulties in using an opt-in method
>> in 2.x from a downstream project perspective. It's not pretty.
>>
>> The plan that both Sean and Jason propose (which I support) is to have an
>> opt-in solution in 2.x, bake it there, then turn it on by default
>> (incompatible) in a new major release. I think this lines up well with my
>> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
>> to help with 2.x release management if that would help with testing this
>> feature.
>>
>> Even setting aside classpath isolation, a new major release is still
>> justified by JDK8. Somehow this is being ignored in the discussion. Allen,
>> historically the voice of the user in our community, just highlighted it as
>> a major compatibility issue, and myself and Tucu have also expressed our
>> very strong concerns about bumping this in a minor release. 2.7's bump is a
>> unique exception, but this is not something to be cited as precedent or
>> policy.
>>
>> Where does this resistance to a new major release stem from? As I've
>> described from the beginning, this will look basically like a 2.x release,
>> except for the inclusion of classpath isolation by default and target
>> version JDK8. I've expressed my desire to maintain API and wire
>> compatibility, and we can audit the set of incompatible changes in trunk to
>> ensure this. My proposal for doing alpha and beta releases leading up to GA
>> also gives downstreams a nice amount of time for testing and validation.
>>
>> Regards,
>> Andrew
>>
>>
>>
>> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
>>
>>> Awesome, looks like we can just do this in a compatible manner - nothing
>>> else on the list seems like it warrants a (premature) major release.
>>>
>>> Thanks Vinod.
>>>
>>> Arun
>>>
>>> ________________________________________
>>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
>>> Sent: Tuesday, March 03, 2015 2:30 PM
>>> To: common-dev@hadoop.apache.org
>>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
>>> yarn-dev@hadoop.apache.org
>>> Subject: Re: Looking to a Hadoop 3 release
>>>
>>> I started pitching in more on that JIRA.
>>>
>>> To add, I think we can and should strive for doing this in a compatible
>>> manner, whatever the approach. Marking and calling it incompatible before
>>> we see proposal/patch seems premature to me. Commented the same on JIRA:
>>> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
>>> .
>>>
>>> Thanks
>>> +Vinod
>>>
>>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
>>> andrew.wang@cloudera.com>> wrote:
>>>
>>> Regarding classpath isolation, based on what I hear from our customers,
>>> it's still a big problem (even after the MR classloader work). The latest
>>> Jackson version bump was quite painful for our downstream projects, and the
>>> HDFS client still leaks a lot of dependencies. Would welcome more
>>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
>>> chimed in.
>>>
>>>
>>
>>
>

Re: Looking to a Hadoop 3 release

Posted by Chris Douglas <cd...@apache.org>.

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.

This is a useful exercise, but not a prerequisite to releasing 3.0.0
as an alpha off of trunk, right? Andrew summarized the operating
assumptions for anyone working on it: rolling upgrades still work,
wire compat is preserved, breaking changes may get rolled back when
branch-3 is in beta (so be very conservative, notify others loudly).
This applies to branches merged to trunk, also.

> +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.

We'll have this discussion again. We don't need to reach consensus on
the roadmap, just that each artifact reflects the output of the
project.

> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.

+1 for 2.x, but again I don't understand the sequencing. -C

> On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID> wrote:
>
>> I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
>> For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.
>>
>> For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
>> To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
>> -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
>> +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
>> Jason
>>      From: Andrew Wang <an...@cloudera.com>
>> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
>> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
>> Sent: Wednesday, March 4, 2015 12:15 PM
>> Subject: Re: Looking to a Hadoop 3 release
>>
>> Let's not dismiss this quite so handily.
>>
>> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
>> could make classpath isolation opt-in via configuration, what we really
>> want longer term is to have it on by default (or just always on). Stack in
>> particular points out the practical difficulties in using an opt-in method
>> in 2.x from a downstream project perspective. It's not pretty.
>>
>> The plan that both Sean and Jason propose (which I support) is to have an
>> opt-in solution in 2.x, bake it there, then turn it on by default
>> (incompatible) in a new major release. I think this lines up well with my
>> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
>> to help with 2.x release management if that would help with testing this
>> feature.
>>
>> Even setting aside classpath isolation, a new major release is still
>> justified by JDK8. Somehow this is being ignored in the discussion. Allen,
>> historically the voice of the user in our community, just highlighted it as
>> a major compatibility issue, and myself and Tucu have also expressed our
>> very strong concerns about bumping this in a minor release. 2.7's bump is a
>> unique exception, but this is not something to be cited as precedent or
>> policy.
>>
>> Where does this resistance to a new major release stem from? As I've
>> described from the beginning, this will look basically like a 2.x release,
>> except for the inclusion of classpath isolation by default and target
>> version JDK8. I've expressed my desire to maintain API and wire
>> compatibility, and we can audit the set of incompatible changes in trunk to
>> ensure this. My proposal for doing alpha and beta releases leading up to GA
>> also gives downstreams a nice amount of time for testing and validation.
>>
>> Regards,
>> Andrew
>>
>>
>>
>> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
>>
>>> Awesome, looks like we can just do this in a compatible manner - nothing
>>> else on the list seems like it warrants a (premature) major release.
>>>
>>> Thanks Vinod.
>>>
>>> Arun
>>>
>>> ________________________________________
>>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
>>> Sent: Tuesday, March 03, 2015 2:30 PM
>>> To: common-dev@hadoop.apache.org
>>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
>>> yarn-dev@hadoop.apache.org
>>> Subject: Re: Looking to a Hadoop 3 release
>>>
>>> I started pitching in more on that JIRA.
>>>
>>> To add, I think we can and should strive for doing this in a compatible
>>> manner, whatever the approach. Marking and calling it incompatible before
>>> we see proposal/patch seems premature to me. Commented the same on JIRA:
>>> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
>>> .
>>>
>>> Thanks
>>> +Vinod
>>>
>>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
>>> andrew.wang@cloudera.com>> wrote:
>>>
>>> Regarding classpath isolation, based on what I hear from our customers,
>>> it's still a big problem (even after the MR classloader work). The latest
>>> Jackson version bump was quite painful for our downstream projects, and the
>>> HDFS client still leaks a lot of dependencies. Would welcome more
>>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
>>> chimed in.
>>>
>>>
>>
>>
>

Re: Looking to a Hadoop 3 release

Posted by Chris Douglas <cd...@apache.org>.

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
> I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.

This is a useful exercise, but not a prerequisite to releasing 3.0.0
as an alpha off of trunk, right? Andrew summarized the operating
assumptions for anyone working on it: rolling upgrades still work,
wire compat is preserved, breaking changes may get rolled back when
branch-3 is in beta (so be very conservative, notify others loudly).
This applies to branches merged to trunk, also.

> +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.

We'll have this discussion again. We don't need to reach consensus on
the roadmap, just that each artifact reflects the output of the
project.

> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.

+1 for 2.x, but again I don't understand the sequencing. -C

> On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID> wrote:
>
>> I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
>> For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.
>>
>> For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
>> To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
>> -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
>> +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
>> Jason
>>      From: Andrew Wang <an...@cloudera.com>
>> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
>> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
>> Sent: Wednesday, March 4, 2015 12:15 PM
>> Subject: Re: Looking to a Hadoop 3 release
>>
>> Let's not dismiss this quite so handily.
>>
>> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
>> could make classpath isolation opt-in via configuration, what we really
>> want longer term is to have it on by default (or just always on). Stack in
>> particular points out the practical difficulties in using an opt-in method
>> in 2.x from a downstream project perspective. It's not pretty.
>>
>> The plan that both Sean and Jason propose (which I support) is to have an
>> opt-in solution in 2.x, bake it there, then turn it on by default
>> (incompatible) in a new major release. I think this lines up well with my
>> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
>> to help with 2.x release management if that would help with testing this
>> feature.
>>
>> Even setting aside classpath isolation, a new major release is still
>> justified by JDK8. Somehow this is being ignored in the discussion. Allen,
>> historically the voice of the user in our community, just highlighted it as
>> a major compatibility issue, and myself and Tucu have also expressed our
>> very strong concerns about bumping this in a minor release. 2.7's bump is a
>> unique exception, but this is not something to be cited as precedent or
>> policy.
>>
>> Where does this resistance to a new major release stem from? As I've
>> described from the beginning, this will look basically like a 2.x release,
>> except for the inclusion of classpath isolation by default and target
>> version JDK8. I've expressed my desire to maintain API and wire
>> compatibility, and we can audit the set of incompatible changes in trunk to
>> ensure this. My proposal for doing alpha and beta releases leading up to GA
>> also gives downstreams a nice amount of time for testing and validation.
>>
>> Regards,
>> Andrew
>>
>>
>>
>> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
>>
>>> Awesome, looks like we can just do this in a compatible manner - nothing
>>> else on the list seems like it warrants a (premature) major release.
>>>
>>> Thanks Vinod.
>>>
>>> Arun
>>>
>>> ________________________________________
>>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
>>> Sent: Tuesday, March 03, 2015 2:30 PM
>>> To: common-dev@hadoop.apache.org
>>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
>>> yarn-dev@hadoop.apache.org
>>> Subject: Re: Looking to a Hadoop 3 release
>>>
>>> I started pitching in more on that JIRA.
>>>
>>> To add, I think we can and should strive for doing this in a compatible
>>> manner, whatever the approach. Marking and calling it incompatible before
>>> we see proposal/patch seems premature to me. Commented the same on JIRA:
>>> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
>>> .
>>>
>>> Thanks
>>> +Vinod
>>>
>>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
>>> andrew.wang@cloudera.com>> wrote:
>>>
>>> Regarding classpath isolation, based on what I hear from our customers,
>>> it's still a big problem (even after the MR classloader work). The latest
>>> Jackson version bump was quite painful for our downstream projects, and the
>>> HDFS client still leaks a lot of dependencies. Would welcome more
>>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
>>> chimed in.
>>>
>>>
>>
>>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hey Vinod,

I'm roughly okay with that plan. One question though, why gate JDK8 on a
2.8 and 2.9? Based on the status of HADOOP-11090, it sounds like branch-2
already runs okay on JDK8. Our past experience moving from JDK6 to JDK7 was
also very smooth except for JUnit ordering.

As an additional datapoint, Cloudera has already validated CDH5 on JDK8 and
supports it as a runtime:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_req_supported_versions.html?scroll=concept_pdd_kzf_vp_unique_1

Best,
Andrew

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> I'd encourage everyone to post their wish list on the Roadmap wiki that
> *warrants* making incompatible changes forcing us to go 3.x.
>
> +1 to Jason's comments on general. We can keep rolling alphas that
> downstream can pick up, but I'd also like us to clarify the exit criterion
> for a GA release of 3.0 and its relation to the life of 2.x if we are going
> this route. This brings us back to the roadmap discussion, and a collective
> agreement about a logical step at a future point in time where we say we
> have enough incompatible features in 3.x that we can stop putting more of
> them and start stabilizing it.
>
> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before
> for atleast two releases in branch-2: say 2.8 and 2.9 before we consider
> taking up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as
> compatible as is possible for users to opt in and migrate easily.
>
> Thanks,
> +Vinod
>
> On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID>
> wrote:
>
> > I'm OK with a 3.0.0 release as long as we are minimizing the pain of
> maintaining yet another release line and conscious of the incompatibilities
> going into that release line.
> > For the former, I would really rather not see a branch-3 cut so soon.
> It's yet another line onto which to cherry-pick, and I don't see why we
> need to add this overhead at such an early phase.  We should only create
> branch-3 when there's an incompatible change that the community wants and
> it should _not_ go into the next major release (i.e.: it's for Hadoop
> 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk
> in the interim.  IMHO we need to stop treating trunk as a place to exile
> patches.
> >
> > For the latter, I think as a community we need to evaluate the benefits
> of breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> > To summarize:+1 for a community-discussed roadmap of what we're breaking
> in Hadoop 3 and why it's worth it for users
> > -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> > +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> > Jason
> >      From: Andrew Wang <an...@cloudera.com>
> > To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org>
> > Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "
> yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>
> > Sent: Wednesday, March 4, 2015 12:15 PM
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Let's not dismiss this quite so handily.
> >
> > Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> > could make classpath isolation opt-in via configuration, what we really
> > want longer term is to have it on by default (or just always on). Stack
> in
> > particular points out the practical difficulties in using an opt-in
> method
> > in 2.x from a downstream project perspective. It's not pretty.
> >
> > The plan that both Sean and Jason propose (which I support) is to have an
> > opt-in solution in 2.x, bake it there, then turn it on by default
> > (incompatible) in a new major release. I think this lines up well with my
> > proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> > to help with 2.x release management if that would help with testing this
> > feature.
> >
> > Even setting aside classpath isolation, a new major release is still
> > justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> > historically the voice of the user in our community, just highlighted it
> as
> > a major compatibility issue, and myself and Tucu have also expressed our
> > very strong concerns about bumping this in a minor release. 2.7's bump
> is a
> > unique exception, but this is not something to be cited as precedent or
> > policy.
> >
> > Where does this resistance to a new major release stem from? As I've
> > described from the beginning, this will look basically like a 2.x
> release,
> > except for the inclusion of classpath isolation by default and target
> > version JDK8. I've expressed my desire to maintain API and wire
> > compatibility, and we can audit the set of incompatible changes in trunk
> to
> > ensure this. My proposal for doing alpha and beta releases leading up to
> GA
> > also gives downstreams a nice amount of time for testing and validation.
> >
> > Regards,
> > Andrew
> >
> >
> >
> > On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> >
> >> Awesome, looks like we can just do this in a compatible manner - nothing
> >> else on the list seems like it warrants a (premature) major release.
> >>
> >> Thanks Vinod.
> >>
> >> Arun
> >>
> >> ________________________________________
> >> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> >> Sent: Tuesday, March 03, 2015 2:30 PM
> >> To: common-dev@hadoop.apache.org
> >> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> >> yarn-dev@hadoop.apache.org
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> I started pitching in more on that JIRA.
> >>
> >> To add, I think we can and should strive for doing this in a compatible
> >> manner, whatever the approach. Marking and calling it incompatible
> before
> >> we see proposal/patch seems premature to me. Commented the same on JIRA:
> >>
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> >> .
> >>
> >> Thanks
> >> +Vinod
> >>
> >> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com
> <mailto:
> >> andrew.wang@cloudera.com>> wrote:
> >>
> >> Regarding classpath isolation, based on what I hear from our customers,
> >> it's still a big problem (even after the MR classloader work). The
> latest
> >> Jackson version bump was quite painful for our downstream projects, and
> the
> >> HDFS client still leaks a lot of dependencies. Would welcome more
> >> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
> already
> >> chimed in.
> >>
> >>
> >
> >
>
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.

+1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.

Irrespective of that, here is my proposal in the interim:
 - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
 - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.

Thanks,
+Vinod

On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID> wrote:

> I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
> For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.
> 
> For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
> To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
> -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
> +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
> Jason
>      From: Andrew Wang <an...@cloudera.com>
> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org> 
> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org> 
> Sent: Wednesday, March 4, 2015 12:15 PM
> Subject: Re: Looking to a Hadoop 3 release
> 
> Let's not dismiss this quite so handily.
> 
> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> could make classpath isolation opt-in via configuration, what we really
> want longer term is to have it on by default (or just always on). Stack in
> particular points out the practical difficulties in using an opt-in method
> in 2.x from a downstream project perspective. It's not pretty.
> 
> The plan that both Sean and Jason propose (which I support) is to have an
> opt-in solution in 2.x, bake it there, then turn it on by default
> (incompatible) in a new major release. I think this lines up well with my
> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
> to help with 2.x release management if that would help with testing this
> feature.
> 
> Even setting aside classpath isolation, a new major release is still
> justified by JDK8. Somehow this is being ignored in the discussion. Allen,
> historically the voice of the user in our community, just highlighted it as
> a major compatibility issue, and myself and Tucu have also expressed our
> very strong concerns about bumping this in a minor release. 2.7's bump is a
> unique exception, but this is not something to be cited as precedent or
> policy.
> 
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
> 
> Regards,
> Andrew
> 
> 
> 
> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> 
>> Awesome, looks like we can just do this in a compatible manner - nothing
>> else on the list seems like it warrants a (premature) major release.
>> 
>> Thanks Vinod.
>> 
>> Arun
>> 
>> ________________________________________
>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
>> Sent: Tuesday, March 03, 2015 2:30 PM
>> To: common-dev@hadoop.apache.org
>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
>> yarn-dev@hadoop.apache.org
>> Subject: Re: Looking to a Hadoop 3 release
>> 
>> I started pitching in more on that JIRA.
>> 
>> To add, I think we can and should strive for doing this in a compatible
>> manner, whatever the approach. Marking and calling it incompatible before
>> we see proposal/patch seems premature to me. Commented the same on JIRA:
>> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
>> .
>> 
>> Thanks
>> +Vinod
>> 
>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
>> andrew.wang@cloudera.com>> wrote:
>> 
>> Regarding classpath isolation, based on what I hear from our customers,
>> it's still a big problem (even after the MR classloader work). The latest
>> Jackson version bump was quite painful for our downstream projects, and the
>> HDFS client still leaks a lot of dependencies. Would welcome more
>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
>> chimed in.
>> 
>> 
> 
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.

+1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.

Irrespective of that, here is my proposal in the interim:
 - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
 - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.

Thanks,
+Vinod

On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID> wrote:

> I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
> For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.
> 
> For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
> To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
> -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
> +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
> Jason
>      From: Andrew Wang <an...@cloudera.com>
> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org> 
> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org> 
> Sent: Wednesday, March 4, 2015 12:15 PM
> Subject: Re: Looking to a Hadoop 3 release
> 
> Let's not dismiss this quite so handily.
> 
> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> could make classpath isolation opt-in via configuration, what we really
> want longer term is to have it on by default (or just always on). Stack in
> particular points out the practical difficulties in using an opt-in method
> in 2.x from a downstream project perspective. It's not pretty.
> 
> The plan that both Sean and Jason propose (which I support) is to have an
> opt-in solution in 2.x, bake it there, then turn it on by default
> (incompatible) in a new major release. I think this lines up well with my
> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
> to help with 2.x release management if that would help with testing this
> feature.
> 
> Even setting aside classpath isolation, a new major release is still
> justified by JDK8. Somehow this is being ignored in the discussion. Allen,
> historically the voice of the user in our community, just highlighted it as
> a major compatibility issue, and myself and Tucu have also expressed our
> very strong concerns about bumping this in a minor release. 2.7's bump is a
> unique exception, but this is not something to be cited as precedent or
> policy.
> 
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
> 
> Regards,
> Andrew
> 
> 
> 
> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> 
>> Awesome, looks like we can just do this in a compatible manner - nothing
>> else on the list seems like it warrants a (premature) major release.
>> 
>> Thanks Vinod.
>> 
>> Arun
>> 
>> ________________________________________
>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
>> Sent: Tuesday, March 03, 2015 2:30 PM
>> To: common-dev@hadoop.apache.org
>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
>> yarn-dev@hadoop.apache.org
>> Subject: Re: Looking to a Hadoop 3 release
>> 
>> I started pitching in more on that JIRA.
>> 
>> To add, I think we can and should strive for doing this in a compatible
>> manner, whatever the approach. Marking and calling it incompatible before
>> we see proposal/patch seems premature to me. Commented the same on JIRA:
>> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
>> .
>> 
>> Thanks
>> +Vinod
>> 
>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
>> andrew.wang@cloudera.com>> wrote:
>> 
>> Regarding classpath isolation, based on what I hear from our customers,
>> it's still a big problem (even after the MR classloader work). The latest
>> Jackson version bump was quite painful for our downstream projects, and the
>> HDFS client still leaks a lot of dependencies. Would welcome more
>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
>> chimed in.
>> 
>> 
> 
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.

+1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.

Irrespective of that, here is my proposal in the interim:
 - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
 - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.

Thanks,
+Vinod

On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID> wrote:

> I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
> For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.
> 
> For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
> To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
> -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
> +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
> Jason
>      From: Andrew Wang <an...@cloudera.com>
> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org> 
> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org> 
> Sent: Wednesday, March 4, 2015 12:15 PM
> Subject: Re: Looking to a Hadoop 3 release
> 
> Let's not dismiss this quite so handily.
> 
> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> could make classpath isolation opt-in via configuration, what we really
> want longer term is to have it on by default (or just always on). Stack in
> particular points out the practical difficulties in using an opt-in method
> in 2.x from a downstream project perspective. It's not pretty.
> 
> The plan that both Sean and Jason propose (which I support) is to have an
> opt-in solution in 2.x, bake it there, then turn it on by default
> (incompatible) in a new major release. I think this lines up well with my
> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
> to help with 2.x release management if that would help with testing this
> feature.
> 
> Even setting aside classpath isolation, a new major release is still
> justified by JDK8. Somehow this is being ignored in the discussion. Allen,
> historically the voice of the user in our community, just highlighted it as
> a major compatibility issue, and myself and Tucu have also expressed our
> very strong concerns about bumping this in a minor release. 2.7's bump is a
> unique exception, but this is not something to be cited as precedent or
> policy.
> 
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
> 
> Regards,
> Andrew
> 
> 
> 
> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> 
>> Awesome, looks like we can just do this in a compatible manner - nothing
>> else on the list seems like it warrants a (premature) major release.
>> 
>> Thanks Vinod.
>> 
>> Arun
>> 
>> ________________________________________
>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
>> Sent: Tuesday, March 03, 2015 2:30 PM
>> To: common-dev@hadoop.apache.org
>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
>> yarn-dev@hadoop.apache.org
>> Subject: Re: Looking to a Hadoop 3 release
>> 
>> I started pitching in more on that JIRA.
>> 
>> To add, I think we can and should strive for doing this in a compatible
>> manner, whatever the approach. Marking and calling it incompatible before
>> we see proposal/patch seems premature to me. Commented the same on JIRA:
>> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
>> .
>> 
>> Thanks
>> +Vinod
>> 
>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
>> andrew.wang@cloudera.com>> wrote:
>> 
>> Regarding classpath isolation, based on what I hear from our customers,
>> it's still a big problem (even after the MR classloader work). The latest
>> Jackson version bump was quite painful for our downstream projects, and the
>> HDFS client still leaks a lot of dependencies. Would welcome more
>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
>> chimed in.
>> 
>> 
> 
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x.

+1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it.

Irrespective of that, here is my proposal in the interim:
 - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0.
 - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily.

Thanks,
+Vinod

On Mar 5, 2015, at 1:44 PM, Jason Lowe <jl...@yahoo-inc.com.INVALID> wrote:

> I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
> For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.
> 
> For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
> To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
> -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
> +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
> Jason
>      From: Andrew Wang <an...@cloudera.com>
> To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org> 
> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org> 
> Sent: Wednesday, March 4, 2015 12:15 PM
> Subject: Re: Looking to a Hadoop 3 release
> 
> Let's not dismiss this quite so handily.
> 
> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> could make classpath isolation opt-in via configuration, what we really
> want longer term is to have it on by default (or just always on). Stack in
> particular points out the practical difficulties in using an opt-in method
> in 2.x from a downstream project perspective. It's not pretty.
> 
> The plan that both Sean and Jason propose (which I support) is to have an
> opt-in solution in 2.x, bake it there, then turn it on by default
> (incompatible) in a new major release. I think this lines up well with my
> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
> to help with 2.x release management if that would help with testing this
> feature.
> 
> Even setting aside classpath isolation, a new major release is still
> justified by JDK8. Somehow this is being ignored in the discussion. Allen,
> historically the voice of the user in our community, just highlighted it as
> a major compatibility issue, and myself and Tucu have also expressed our
> very strong concerns about bumping this in a minor release. 2.7's bump is a
> unique exception, but this is not something to be cited as precedent or
> policy.
> 
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
> 
> Regards,
> Andrew
> 
> 
> 
> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> 
>> Awesome, looks like we can just do this in a compatible manner - nothing
>> else on the list seems like it warrants a (premature) major release.
>> 
>> Thanks Vinod.
>> 
>> Arun
>> 
>> ________________________________________
>> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
>> Sent: Tuesday, March 03, 2015 2:30 PM
>> To: common-dev@hadoop.apache.org
>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
>> yarn-dev@hadoop.apache.org
>> Subject: Re: Looking to a Hadoop 3 release
>> 
>> I started pitching in more on that JIRA.
>> 
>> To add, I think we can and should strive for doing this in a compatible
>> manner, whatever the approach. Marking and calling it incompatible before
>> we see proposal/patch seems premature to me. Commented the same on JIRA:
>> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
>> .
>> 
>> Thanks
>> +Vinod
>> 
>> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
>> andrew.wang@cloudera.com>> wrote:
>> 
>> Regarding classpath isolation, based on what I hear from our customers,
>> it's still a big problem (even after the MR classloader work). The latest
>> Jackson version bump was quite painful for our downstream projects, and the
>> HDFS client still leaks a lot of dependencies. Would welcome more
>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
>> chimed in.
>> 
>> 
> 
>

Re: Looking to a Hadoop 3 release

Posted by Jason Lowe <jl...@yahoo-inc.com.INVALID>.

I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.

For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
-1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
+1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
Jason
      From: Andrew Wang <an...@cloudera.com>
 To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org> 
Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org> 
 Sent: Wednesday, March 4, 2015 12:15 PM
 Subject: Re: Looking to a Hadoop 3 release

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> ________________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

Posted by Jason Lowe <jl...@yahoo-inc.com.INVALID>.

I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.

For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
-1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
+1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
Jason
      From: Andrew Wang <an...@cloudera.com>
 To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org> 
Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org> 
 Sent: Wednesday, March 4, 2015 12:15 PM
 Subject: Re: Looking to a Hadoop 3 release

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> ________________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

Posted by Jason Lowe <jl...@yahoo-inc.com.INVALID>.

I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.

For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
-1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
+1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
Jason
      From: Andrew Wang <an...@cloudera.com>
 To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org> 
Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org> 
 Sent: Wednesday, March 4, 2015 12:15 PM
 Subject: Re: Looking to a Hadoop 3 release

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> ________________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

Posted by Jason Lowe <jl...@yahoo-inc.com.INVALID>.

I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line.
For the former, I would really rather not see a branch-3 cut so soon.  It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase.  We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk in the interim.  IMHO we need to stop treating trunk as a place to exile patches.

For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating.  Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time.  For example, wire-compatibility has been mentioned as part of this.  Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump.
To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users
-1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives
+1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0
Jason
      From: Andrew Wang <an...@cloudera.com>
 To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org> 
Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org> 
 Sent: Wednesday, March 4, 2015 12:15 PM
 Subject: Re: Looking to a Hadoop 3 release

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> ________________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Since these dependency bumps are very disruptive to downstreams, I want to
predicate upgrading our deps on having classpath isolation on. I think
that's what Tucu was getting at.

Best,
Andrew

On Fri, Mar 6, 2015 at 8:01 AM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> Right, but that doesn't really answer the question….
>
> On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>
> > If classloader isolation is in place, then dependency versions can freely
> > be upgraded as won't pollute apps space (things get trickier if there is
> an
> > ON/OFF switch).
> >
> > On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com>
> wrote:
> >
> >>
> >> Is there going to be a general upgrade of dependencies?  I'm thinking of
> >> jetty & jackson in particular.
> >>
> >> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> >>
> >>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> >>> page. In addition to the two things I've been pushing, I also looked
> >>> through Allen's list (thanks Allen for making this) and picked out the
> >>> shell script rewrite and the removal of HFTP as big changes. This would
> >> be
> >>> the place to propose features for inclusion in 3.x, I'd particularly
> >>> appreciate help on the YARN/MR side.
> >>>
> >>> Based on what I'm hearing, let me modulate my proposal to the
> following:
> >>>
> >>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> >>> changes don't look that scary, so I think this is fine. This does mean
> we
> >>> need to be more rigorous before merging branches to trunk. I think
> >>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> >> would
> >>> be very helpful in this regard.
> >>> - We do not include anything to break wire compatibility unless (as
> Jason
> >>> says) it's an unbelievably awesome feature.
> >>> - No harm in rolling alphas from trunk, as it doesn't lock us to
> anything
> >>> compatibility wise. Downstreams like releases.
> >>>
> >>> I'll take Steve's advice about not locking GA to a given date, but I
> also
> >>> share his belief that we can alpha/beta/GA faster than it took for
> Hadoop
> >>> 2. Let's roll some intermediate releases, work on the roadmap items,
> and
> >>> see how we're feeling in a few months.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> >>>
> >>>> I think it'll be useful to have a discussion about what else people
> >> would
> >>>> like to see in Hadoop 3.x - especially if the change is potentially
> >>>> incompatible. Also, what we expect the release schedule to be for
> major
> >>>> releases and what triggers them - JVM version, major features, the
> need
> >> for
> >>>> incompatible changes ? Assuming major versions will not be released
> >> every 6
> >>>> months/1 year (adoption time, fairly disruptive for downstream
> projects,
> >>>> and users) -  considering additional features/incompatible changes for
> >> 3.x
> >>>> would be useful.
> >>>>
> >>>> Some features that come to mind immediately would be
> >>>> 1) enhancements to the RPC mechanics - specifically support for
> AsynRPC
> >> /
> >>>> two way communication. There's a lot of places where we re-use
> >> heartbeats
> >>>> to send more information than what would be done if the PRC layer
> >> supported
> >>>> these features. Some of this can be done in a compatible manner to the
> >>>> existing RPC sub-system. Others like 2 way communication probably
> >> cannot.
> >>>> After this, having HDFS/YARN actually make use of these changes. The
> >> other
> >>>> consideration is adoption of an alternate system ike gRpc which would
> be
> >>>> incompatible.
> >>>> 2) Simplification of configs - potentially separating client side
> >> configs
> >>>> and those used by daemons. This is another source of perpetual
> confusion
> >>>> for users.
> >>>>
> >>>> Thanks
> >>>> - Sid
> >>>>
> >>>>
> >>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <
> stevel@hortonworks.com>
> >>>> wrote:
> >>>>
> >>>>> Sorry, outlook dequoted Alejandros's comments.
> >>>>>
> >>>>> Let me try again with his comments in italic and proofreading of mine
> >>>>>
> >>>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com
> <mailto:
> >>>>> stevel@hortonworks.com>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> >>>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
> >>>>>
> >>>>> IMO, if part of the community wants to take on the responsibility and
> >>>> work
> >>>>> that takes to do a new major release, we should not discourage them
> >> from
> >>>>> doing that.
> >>>>>
> >>>>> Having multiple major branches active is a standard practice.
> >>>>>
> >>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take
> a
> >>>>> long time to get out, and during that time 0.21, 0.22, got released
> and
> >>>>> ignored; 0.23 picked up and used in production.
> >>>>>
> >>>>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>>>> widely enough to be used in products, and changes were made between
> >> that
> >>>>> alpha & 2.2 itself which raised compatibility issues.
> >>>>>
> >>>>> For 3.x I'd propose
> >>>>>
> >>>>>
> >>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
> >>>>> 2.  Make clear there are no guarantees of compatibility from
> >> alpha/beta
> >>>>> releases to shipping. Best effort, but not to the extent that it gets
> >> in
> >>>>> the way. More succinctly: we will care more about seamless migration
> >> from
> >>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >>>> alpha/beta
> >>>>> phase
> >>>>>
> >>>>> As well as backwards compatibility, we need to think about Forwards
> >>>>> compatibility, with the goal being:
> >>>>>
> >>>>> Any app written/shipped with the 3.x release binaries (JAR and
> native)
> >>>>> will work in and against a 3.y Hadoop cluster, for all x, y in
> Natural
> >>>>> where y>=x  and is-release(x) and is-release(y)
> >>>>>
> >>>>> That's important, as it means all server-side changes in 3.x which
> are
> >>>>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>>>> decoding, security features, must be considered complete and stable
> >>>> before
> >>>>> we can say is-release(x). In an ideal world, we'll even get the
> >> semantics
> >>>>> right with tests to show this.
> >>>>>
> >>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >>>> But:
> >>>>> it's only one of the features, and given there's not any design doc
> on
> >>>> that
> >>>>> JIRA, way too immature to set a release schedule on. An alpha
> schedule
> >>>> with
> >>>>> no-guarantees and a regular alpha roll, could be viable, as new
> >> features
> >>>> go
> >>>>> in and can then be used to experimentally try this stuff in branches
> of
> >>>>> Hbase (well volunteered, Stack!), etc. Of course instability
> guarantees
> >>>>> will be transitive downstream.
> >>>>>
> >>>>>
> >>>>> This time around we are not replacing the guts as we did from Hadoop
> 1
> >> to
> >>>>> Hadoop 2, but superficial surgery to address issues were not
> considered
> >>>> (or
> >>>>> was too much to take on top of the guts transplant).
> >>>>>
> >>>>> For the split brain concern, we did a great of job maintaining
> Hadoop 1
> >>>> and
> >>>>> Hadoop 2 until Hadoop 1 faded away.
> >>>>>
> >>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>>>> compatibility.
> >>>>>
> >>>>>
> >>>>> Based on that experience I would say that the coexistence of Hadoop 2
> >> and
> >>>>> Hadoop 3 will be much less demanding/traumatic.
> >>>>>
> >>>>> The re-layout of all the source trees was a major change there,
> >> assuming
> >>>>> there's no refactoring or switch of build tools then picking things
> >> back
> >>>>> will be tractable
> >>>>>
> >>>>>
> >>>>> Also, to facilitate the coexistence we should limit Java language
> >>>> features
> >>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >>>> anymore
> >>>>> we can remove this limitation.
> >>>>>
> >>>>> +1; setting javac.version will fix this
> >>>>>
> >>>>> What is nice about having java 8 as the base JVM is that it means you
> >> can
> >>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> >> apps
> >>>>> and libs can use all Java 8 features they want to.
> >>>>>
> >>>>> There's one policy change to consider there which is possibly, just
> >>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>>>> languages early, provided everyone recognised that "backport to
> >> branch-2"
> >>>>> isn't going to happen.
> >>>>>
> >>>>> -Steve
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Since these dependency bumps are very disruptive to downstreams, I want to
predicate upgrading our deps on having classpath isolation on. I think
that's what Tucu was getting at.

Best,
Andrew

On Fri, Mar 6, 2015 at 8:01 AM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> Right, but that doesn't really answer the question….
>
> On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>
> > If classloader isolation is in place, then dependency versions can freely
> > be upgraded as won't pollute apps space (things get trickier if there is
> an
> > ON/OFF switch).
> >
> > On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com>
> wrote:
> >
> >>
> >> Is there going to be a general upgrade of dependencies?  I'm thinking of
> >> jetty & jackson in particular.
> >>
> >> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> >>
> >>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> >>> page. In addition to the two things I've been pushing, I also looked
> >>> through Allen's list (thanks Allen for making this) and picked out the
> >>> shell script rewrite and the removal of HFTP as big changes. This would
> >> be
> >>> the place to propose features for inclusion in 3.x, I'd particularly
> >>> appreciate help on the YARN/MR side.
> >>>
> >>> Based on what I'm hearing, let me modulate my proposal to the
> following:
> >>>
> >>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> >>> changes don't look that scary, so I think this is fine. This does mean
> we
> >>> need to be more rigorous before merging branches to trunk. I think
> >>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> >> would
> >>> be very helpful in this regard.
> >>> - We do not include anything to break wire compatibility unless (as
> Jason
> >>> says) it's an unbelievably awesome feature.
> >>> - No harm in rolling alphas from trunk, as it doesn't lock us to
> anything
> >>> compatibility wise. Downstreams like releases.
> >>>
> >>> I'll take Steve's advice about not locking GA to a given date, but I
> also
> >>> share his belief that we can alpha/beta/GA faster than it took for
> Hadoop
> >>> 2. Let's roll some intermediate releases, work on the roadmap items,
> and
> >>> see how we're feeling in a few months.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> >>>
> >>>> I think it'll be useful to have a discussion about what else people
> >> would
> >>>> like to see in Hadoop 3.x - especially if the change is potentially
> >>>> incompatible. Also, what we expect the release schedule to be for
> major
> >>>> releases and what triggers them - JVM version, major features, the
> need
> >> for
> >>>> incompatible changes ? Assuming major versions will not be released
> >> every 6
> >>>> months/1 year (adoption time, fairly disruptive for downstream
> projects,
> >>>> and users) -  considering additional features/incompatible changes for
> >> 3.x
> >>>> would be useful.
> >>>>
> >>>> Some features that come to mind immediately would be
> >>>> 1) enhancements to the RPC mechanics - specifically support for
> AsynRPC
> >> /
> >>>> two way communication. There's a lot of places where we re-use
> >> heartbeats
> >>>> to send more information than what would be done if the PRC layer
> >> supported
> >>>> these features. Some of this can be done in a compatible manner to the
> >>>> existing RPC sub-system. Others like 2 way communication probably
> >> cannot.
> >>>> After this, having HDFS/YARN actually make use of these changes. The
> >> other
> >>>> consideration is adoption of an alternate system ike gRpc which would
> be
> >>>> incompatible.
> >>>> 2) Simplification of configs - potentially separating client side
> >> configs
> >>>> and those used by daemons. This is another source of perpetual
> confusion
> >>>> for users.
> >>>>
> >>>> Thanks
> >>>> - Sid
> >>>>
> >>>>
> >>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <
> stevel@hortonworks.com>
> >>>> wrote:
> >>>>
> >>>>> Sorry, outlook dequoted Alejandros's comments.
> >>>>>
> >>>>> Let me try again with his comments in italic and proofreading of mine
> >>>>>
> >>>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com
> <mailto:
> >>>>> stevel@hortonworks.com>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> >>>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
> >>>>>
> >>>>> IMO, if part of the community wants to take on the responsibility and
> >>>> work
> >>>>> that takes to do a new major release, we should not discourage them
> >> from
> >>>>> doing that.
> >>>>>
> >>>>> Having multiple major branches active is a standard practice.
> >>>>>
> >>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take
> a
> >>>>> long time to get out, and during that time 0.21, 0.22, got released
> and
> >>>>> ignored; 0.23 picked up and used in production.
> >>>>>
> >>>>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>>>> widely enough to be used in products, and changes were made between
> >> that
> >>>>> alpha & 2.2 itself which raised compatibility issues.
> >>>>>
> >>>>> For 3.x I'd propose
> >>>>>
> >>>>>
> >>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
> >>>>> 2.  Make clear there are no guarantees of compatibility from
> >> alpha/beta
> >>>>> releases to shipping. Best effort, but not to the extent that it gets
> >> in
> >>>>> the way. More succinctly: we will care more about seamless migration
> >> from
> >>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >>>> alpha/beta
> >>>>> phase
> >>>>>
> >>>>> As well as backwards compatibility, we need to think about Forwards
> >>>>> compatibility, with the goal being:
> >>>>>
> >>>>> Any app written/shipped with the 3.x release binaries (JAR and
> native)
> >>>>> will work in and against a 3.y Hadoop cluster, for all x, y in
> Natural
> >>>>> where y>=x  and is-release(x) and is-release(y)
> >>>>>
> >>>>> That's important, as it means all server-side changes in 3.x which
> are
> >>>>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>>>> decoding, security features, must be considered complete and stable
> >>>> before
> >>>>> we can say is-release(x). In an ideal world, we'll even get the
> >> semantics
> >>>>> right with tests to show this.
> >>>>>
> >>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >>>> But:
> >>>>> it's only one of the features, and given there's not any design doc
> on
> >>>> that
> >>>>> JIRA, way too immature to set a release schedule on. An alpha
> schedule
> >>>> with
> >>>>> no-guarantees and a regular alpha roll, could be viable, as new
> >> features
> >>>> go
> >>>>> in and can then be used to experimentally try this stuff in branches
> of
> >>>>> Hbase (well volunteered, Stack!), etc. Of course instability
> guarantees
> >>>>> will be transitive downstream.
> >>>>>
> >>>>>
> >>>>> This time around we are not replacing the guts as we did from Hadoop
> 1
> >> to
> >>>>> Hadoop 2, but superficial surgery to address issues were not
> considered
> >>>> (or
> >>>>> was too much to take on top of the guts transplant).
> >>>>>
> >>>>> For the split brain concern, we did a great of job maintaining
> Hadoop 1
> >>>> and
> >>>>> Hadoop 2 until Hadoop 1 faded away.
> >>>>>
> >>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>>>> compatibility.
> >>>>>
> >>>>>
> >>>>> Based on that experience I would say that the coexistence of Hadoop 2
> >> and
> >>>>> Hadoop 3 will be much less demanding/traumatic.
> >>>>>
> >>>>> The re-layout of all the source trees was a major change there,
> >> assuming
> >>>>> there's no refactoring or switch of build tools then picking things
> >> back
> >>>>> will be tractable
> >>>>>
> >>>>>
> >>>>> Also, to facilitate the coexistence we should limit Java language
> >>>> features
> >>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >>>> anymore
> >>>>> we can remove this limitation.
> >>>>>
> >>>>> +1; setting javac.version will fix this
> >>>>>
> >>>>> What is nice about having java 8 as the base JVM is that it means you
> >> can
> >>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> >> apps
> >>>>> and libs can use all Java 8 features they want to.
> >>>>>
> >>>>> There's one policy change to consider there which is possibly, just
> >>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>>>> languages early, provided everyone recognised that "backport to
> >> branch-2"
> >>>>> isn't going to happen.
> >>>>>
> >>>>> -Steve
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Since these dependency bumps are very disruptive to downstreams, I want to
predicate upgrading our deps on having classpath isolation on. I think
that's what Tucu was getting at.

Best,
Andrew

On Fri, Mar 6, 2015 at 8:01 AM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> Right, but that doesn't really answer the question….
>
> On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>
> > If classloader isolation is in place, then dependency versions can freely
> > be upgraded as won't pollute apps space (things get trickier if there is
> an
> > ON/OFF switch).
> >
> > On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com>
> wrote:
> >
> >>
> >> Is there going to be a general upgrade of dependencies?  I'm thinking of
> >> jetty & jackson in particular.
> >>
> >> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> >>
> >>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> >>> page. In addition to the two things I've been pushing, I also looked
> >>> through Allen's list (thanks Allen for making this) and picked out the
> >>> shell script rewrite and the removal of HFTP as big changes. This would
> >> be
> >>> the place to propose features for inclusion in 3.x, I'd particularly
> >>> appreciate help on the YARN/MR side.
> >>>
> >>> Based on what I'm hearing, let me modulate my proposal to the
> following:
> >>>
> >>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> >>> changes don't look that scary, so I think this is fine. This does mean
> we
> >>> need to be more rigorous before merging branches to trunk. I think
> >>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> >> would
> >>> be very helpful in this regard.
> >>> - We do not include anything to break wire compatibility unless (as
> Jason
> >>> says) it's an unbelievably awesome feature.
> >>> - No harm in rolling alphas from trunk, as it doesn't lock us to
> anything
> >>> compatibility wise. Downstreams like releases.
> >>>
> >>> I'll take Steve's advice about not locking GA to a given date, but I
> also
> >>> share his belief that we can alpha/beta/GA faster than it took for
> Hadoop
> >>> 2. Let's roll some intermediate releases, work on the roadmap items,
> and
> >>> see how we're feeling in a few months.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> >>>
> >>>> I think it'll be useful to have a discussion about what else people
> >> would
> >>>> like to see in Hadoop 3.x - especially if the change is potentially
> >>>> incompatible. Also, what we expect the release schedule to be for
> major
> >>>> releases and what triggers them - JVM version, major features, the
> need
> >> for
> >>>> incompatible changes ? Assuming major versions will not be released
> >> every 6
> >>>> months/1 year (adoption time, fairly disruptive for downstream
> projects,
> >>>> and users) -  considering additional features/incompatible changes for
> >> 3.x
> >>>> would be useful.
> >>>>
> >>>> Some features that come to mind immediately would be
> >>>> 1) enhancements to the RPC mechanics - specifically support for
> AsynRPC
> >> /
> >>>> two way communication. There's a lot of places where we re-use
> >> heartbeats
> >>>> to send more information than what would be done if the PRC layer
> >> supported
> >>>> these features. Some of this can be done in a compatible manner to the
> >>>> existing RPC sub-system. Others like 2 way communication probably
> >> cannot.
> >>>> After this, having HDFS/YARN actually make use of these changes. The
> >> other
> >>>> consideration is adoption of an alternate system ike gRpc which would
> be
> >>>> incompatible.
> >>>> 2) Simplification of configs - potentially separating client side
> >> configs
> >>>> and those used by daemons. This is another source of perpetual
> confusion
> >>>> for users.
> >>>>
> >>>> Thanks
> >>>> - Sid
> >>>>
> >>>>
> >>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <
> stevel@hortonworks.com>
> >>>> wrote:
> >>>>
> >>>>> Sorry, outlook dequoted Alejandros's comments.
> >>>>>
> >>>>> Let me try again with his comments in italic and proofreading of mine
> >>>>>
> >>>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com
> <mailto:
> >>>>> stevel@hortonworks.com>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> >>>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
> >>>>>
> >>>>> IMO, if part of the community wants to take on the responsibility and
> >>>> work
> >>>>> that takes to do a new major release, we should not discourage them
> >> from
> >>>>> doing that.
> >>>>>
> >>>>> Having multiple major branches active is a standard practice.
> >>>>>
> >>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take
> a
> >>>>> long time to get out, and during that time 0.21, 0.22, got released
> and
> >>>>> ignored; 0.23 picked up and used in production.
> >>>>>
> >>>>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>>>> widely enough to be used in products, and changes were made between
> >> that
> >>>>> alpha & 2.2 itself which raised compatibility issues.
> >>>>>
> >>>>> For 3.x I'd propose
> >>>>>
> >>>>>
> >>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
> >>>>> 2.  Make clear there are no guarantees of compatibility from
> >> alpha/beta
> >>>>> releases to shipping. Best effort, but not to the extent that it gets
> >> in
> >>>>> the way. More succinctly: we will care more about seamless migration
> >> from
> >>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >>>> alpha/beta
> >>>>> phase
> >>>>>
> >>>>> As well as backwards compatibility, we need to think about Forwards
> >>>>> compatibility, with the goal being:
> >>>>>
> >>>>> Any app written/shipped with the 3.x release binaries (JAR and
> native)
> >>>>> will work in and against a 3.y Hadoop cluster, for all x, y in
> Natural
> >>>>> where y>=x  and is-release(x) and is-release(y)
> >>>>>
> >>>>> That's important, as it means all server-side changes in 3.x which
> are
> >>>>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>>>> decoding, security features, must be considered complete and stable
> >>>> before
> >>>>> we can say is-release(x). In an ideal world, we'll even get the
> >> semantics
> >>>>> right with tests to show this.
> >>>>>
> >>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >>>> But:
> >>>>> it's only one of the features, and given there's not any design doc
> on
> >>>> that
> >>>>> JIRA, way too immature to set a release schedule on. An alpha
> schedule
> >>>> with
> >>>>> no-guarantees and a regular alpha roll, could be viable, as new
> >> features
> >>>> go
> >>>>> in and can then be used to experimentally try this stuff in branches
> of
> >>>>> Hbase (well volunteered, Stack!), etc. Of course instability
> guarantees
> >>>>> will be transitive downstream.
> >>>>>
> >>>>>
> >>>>> This time around we are not replacing the guts as we did from Hadoop
> 1
> >> to
> >>>>> Hadoop 2, but superficial surgery to address issues were not
> considered
> >>>> (or
> >>>>> was too much to take on top of the guts transplant).
> >>>>>
> >>>>> For the split brain concern, we did a great of job maintaining
> Hadoop 1
> >>>> and
> >>>>> Hadoop 2 until Hadoop 1 faded away.
> >>>>>
> >>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>>>> compatibility.
> >>>>>
> >>>>>
> >>>>> Based on that experience I would say that the coexistence of Hadoop 2
> >> and
> >>>>> Hadoop 3 will be much less demanding/traumatic.
> >>>>>
> >>>>> The re-layout of all the source trees was a major change there,
> >> assuming
> >>>>> there's no refactoring or switch of build tools then picking things
> >> back
> >>>>> will be tractable
> >>>>>
> >>>>>
> >>>>> Also, to facilitate the coexistence we should limit Java language
> >>>> features
> >>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >>>> anymore
> >>>>> we can remove this limitation.
> >>>>>
> >>>>> +1; setting javac.version will fix this
> >>>>>
> >>>>> What is nice about having java 8 as the base JVM is that it means you
> >> can
> >>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> >> apps
> >>>>> and libs can use all Java 8 features they want to.
> >>>>>
> >>>>> There's one policy change to consider there which is possibly, just
> >>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>>>> languages early, provided everyone recognised that "backport to
> >> branch-2"
> >>>>> isn't going to happen.
> >>>>>
> >>>>> -Steve
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Since these dependency bumps are very disruptive to downstreams, I want to
predicate upgrading our deps on having classpath isolation on. I think
that's what Tucu was getting at.

Best,
Andrew

On Fri, Mar 6, 2015 at 8:01 AM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> Right, but that doesn't really answer the question….
>
> On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:
>
> > If classloader isolation is in place, then dependency versions can freely
> > be upgraded as won't pollute apps space (things get trickier if there is
> an
> > ON/OFF switch).
> >
> > On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com>
> wrote:
> >
> >>
> >> Is there going to be a general upgrade of dependencies?  I'm thinking of
> >> jetty & jackson in particular.
> >>
> >> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> >>
> >>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> >>> page. In addition to the two things I've been pushing, I also looked
> >>> through Allen's list (thanks Allen for making this) and picked out the
> >>> shell script rewrite and the removal of HFTP as big changes. This would
> >> be
> >>> the place to propose features for inclusion in 3.x, I'd particularly
> >>> appreciate help on the YARN/MR side.
> >>>
> >>> Based on what I'm hearing, let me modulate my proposal to the
> following:
> >>>
> >>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> >>> changes don't look that scary, so I think this is fine. This does mean
> we
> >>> need to be more rigorous before merging branches to trunk. I think
> >>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> >> would
> >>> be very helpful in this regard.
> >>> - We do not include anything to break wire compatibility unless (as
> Jason
> >>> says) it's an unbelievably awesome feature.
> >>> - No harm in rolling alphas from trunk, as it doesn't lock us to
> anything
> >>> compatibility wise. Downstreams like releases.
> >>>
> >>> I'll take Steve's advice about not locking GA to a given date, but I
> also
> >>> share his belief that we can alpha/beta/GA faster than it took for
> Hadoop
> >>> 2. Let's roll some intermediate releases, work on the roadmap items,
> and
> >>> see how we're feeling in a few months.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> >>>
> >>>> I think it'll be useful to have a discussion about what else people
> >> would
> >>>> like to see in Hadoop 3.x - especially if the change is potentially
> >>>> incompatible. Also, what we expect the release schedule to be for
> major
> >>>> releases and what triggers them - JVM version, major features, the
> need
> >> for
> >>>> incompatible changes ? Assuming major versions will not be released
> >> every 6
> >>>> months/1 year (adoption time, fairly disruptive for downstream
> projects,
> >>>> and users) -  considering additional features/incompatible changes for
> >> 3.x
> >>>> would be useful.
> >>>>
> >>>> Some features that come to mind immediately would be
> >>>> 1) enhancements to the RPC mechanics - specifically support for
> AsynRPC
> >> /
> >>>> two way communication. There's a lot of places where we re-use
> >> heartbeats
> >>>> to send more information than what would be done if the PRC layer
> >> supported
> >>>> these features. Some of this can be done in a compatible manner to the
> >>>> existing RPC sub-system. Others like 2 way communication probably
> >> cannot.
> >>>> After this, having HDFS/YARN actually make use of these changes. The
> >> other
> >>>> consideration is adoption of an alternate system ike gRpc which would
> be
> >>>> incompatible.
> >>>> 2) Simplification of configs - potentially separating client side
> >> configs
> >>>> and those used by daemons. This is another source of perpetual
> confusion
> >>>> for users.
> >>>>
> >>>> Thanks
> >>>> - Sid
> >>>>
> >>>>
> >>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <
> stevel@hortonworks.com>
> >>>> wrote:
> >>>>
> >>>>> Sorry, outlook dequoted Alejandros's comments.
> >>>>>
> >>>>> Let me try again with his comments in italic and proofreading of mine
> >>>>>
> >>>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com
> <mailto:
> >>>>> stevel@hortonworks.com>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> >>>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
> >>>>>
> >>>>> IMO, if part of the community wants to take on the responsibility and
> >>>> work
> >>>>> that takes to do a new major release, we should not discourage them
> >> from
> >>>>> doing that.
> >>>>>
> >>>>> Having multiple major branches active is a standard practice.
> >>>>>
> >>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take
> a
> >>>>> long time to get out, and during that time 0.21, 0.22, got released
> and
> >>>>> ignored; 0.23 picked up and used in production.
> >>>>>
> >>>>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>>>> widely enough to be used in products, and changes were made between
> >> that
> >>>>> alpha & 2.2 itself which raised compatibility issues.
> >>>>>
> >>>>> For 3.x I'd propose
> >>>>>
> >>>>>
> >>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
> >>>>> 2.  Make clear there are no guarantees of compatibility from
> >> alpha/beta
> >>>>> releases to shipping. Best effort, but not to the extent that it gets
> >> in
> >>>>> the way. More succinctly: we will care more about seamless migration
> >> from
> >>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >>>> alpha/beta
> >>>>> phase
> >>>>>
> >>>>> As well as backwards compatibility, we need to think about Forwards
> >>>>> compatibility, with the goal being:
> >>>>>
> >>>>> Any app written/shipped with the 3.x release binaries (JAR and
> native)
> >>>>> will work in and against a 3.y Hadoop cluster, for all x, y in
> Natural
> >>>>> where y>=x  and is-release(x) and is-release(y)
> >>>>>
> >>>>> That's important, as it means all server-side changes in 3.x which
> are
> >>>>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>>>> decoding, security features, must be considered complete and stable
> >>>> before
> >>>>> we can say is-release(x). In an ideal world, we'll even get the
> >> semantics
> >>>>> right with tests to show this.
> >>>>>
> >>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >>>> But:
> >>>>> it's only one of the features, and given there's not any design doc
> on
> >>>> that
> >>>>> JIRA, way too immature to set a release schedule on. An alpha
> schedule
> >>>> with
> >>>>> no-guarantees and a regular alpha roll, could be viable, as new
> >> features
> >>>> go
> >>>>> in and can then be used to experimentally try this stuff in branches
> of
> >>>>> Hbase (well volunteered, Stack!), etc. Of course instability
> guarantees
> >>>>> will be transitive downstream.
> >>>>>
> >>>>>
> >>>>> This time around we are not replacing the guts as we did from Hadoop
> 1
> >> to
> >>>>> Hadoop 2, but superficial surgery to address issues were not
> considered
> >>>> (or
> >>>>> was too much to take on top of the guts transplant).
> >>>>>
> >>>>> For the split brain concern, we did a great of job maintaining
> Hadoop 1
> >>>> and
> >>>>> Hadoop 2 until Hadoop 1 faded away.
> >>>>>
> >>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>>>> compatibility.
> >>>>>
> >>>>>
> >>>>> Based on that experience I would say that the coexistence of Hadoop 2
> >> and
> >>>>> Hadoop 3 will be much less demanding/traumatic.
> >>>>>
> >>>>> The re-layout of all the source trees was a major change there,
> >> assuming
> >>>>> there's no refactoring or switch of build tools then picking things
> >> back
> >>>>> will be tractable
> >>>>>
> >>>>>
> >>>>> Also, to facilitate the coexistence we should limit Java language
> >>>> features
> >>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >>>> anymore
> >>>>> we can remove this limitation.
> >>>>>
> >>>>> +1; setting javac.version will fix this
> >>>>>
> >>>>> What is nice about having java 8 as the base JVM is that it means you
> >> can
> >>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> >> apps
> >>>>> and libs can use all Java 8 features they want to.
> >>>>>
> >>>>> There's one policy change to consider there which is possibly, just
> >>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>>>> languages early, provided everyone recognised that "backport to
> >> branch-2"
> >>>>> isn't going to happen.
> >>>>>
> >>>>> -Steve
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Right, but that doesn't really answer the question….

On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:

> If classloader isolation is in place, then dependency versions can freely
> be upgraded as won't pollute apps space (things get trickier if there is an
> ON/OFF switch).
> 
> On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
> 
>> 
>> Is there going to be a general upgrade of dependencies?  I'm thinking of
>> jetty & jackson in particular.
>> 
>> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:
>> 
>>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
>>> page. In addition to the two things I've been pushing, I also looked
>>> through Allen's list (thanks Allen for making this) and picked out the
>>> shell script rewrite and the removal of HFTP as big changes. This would
>> be
>>> the place to propose features for inclusion in 3.x, I'd particularly
>>> appreciate help on the YARN/MR side.
>>> 
>>> Based on what I'm hearing, let me modulate my proposal to the following:
>>> 
>>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
>>> changes don't look that scary, so I think this is fine. This does mean we
>>> need to be more rigorous before merging branches to trunk. I think
>>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
>> would
>>> be very helpful in this regard.
>>> - We do not include anything to break wire compatibility unless (as Jason
>>> says) it's an unbelievably awesome feature.
>>> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
>>> compatibility wise. Downstreams like releases.
>>> 
>>> I'll take Steve's advice about not locking GA to a given date, but I also
>>> share his belief that we can alpha/beta/GA faster than it took for Hadoop
>>> 2. Let's roll some intermediate releases, work on the roadmap items, and
>>> see how we're feeling in a few months.
>>> 
>>> Best,
>>> Andrew
>>> 
>>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
>>> 
>>>> I think it'll be useful to have a discussion about what else people
>> would
>>>> like to see in Hadoop 3.x - especially if the change is potentially
>>>> incompatible. Also, what we expect the release schedule to be for major
>>>> releases and what triggers them - JVM version, major features, the need
>> for
>>>> incompatible changes ? Assuming major versions will not be released
>> every 6
>>>> months/1 year (adoption time, fairly disruptive for downstream projects,
>>>> and users) -  considering additional features/incompatible changes for
>> 3.x
>>>> would be useful.
>>>> 
>>>> Some features that come to mind immediately would be
>>>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
>> /
>>>> two way communication. There's a lot of places where we re-use
>> heartbeats
>>>> to send more information than what would be done if the PRC layer
>> supported
>>>> these features. Some of this can be done in a compatible manner to the
>>>> existing RPC sub-system. Others like 2 way communication probably
>> cannot.
>>>> After this, having HDFS/YARN actually make use of these changes. The
>> other
>>>> consideration is adoption of an alternate system ike gRpc which would be
>>>> incompatible.
>>>> 2) Simplification of configs - potentially separating client side
>> configs
>>>> and those used by daemons. This is another source of perpetual confusion
>>>> for users.
>>>> 
>>>> Thanks
>>>> - Sid
>>>> 
>>>> 
>>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
>>>> wrote:
>>>> 
>>>>> Sorry, outlook dequoted Alejandros's comments.
>>>>> 
>>>>> Let me try again with his comments in italic and proofreading of mine
>>>>> 
>>>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>>>>> stevel@hortonworks.com>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>>>>> 
>>>>> IMO, if part of the community wants to take on the responsibility and
>>>> work
>>>>> that takes to do a new major release, we should not discourage them
>> from
>>>>> doing that.
>>>>> 
>>>>> Having multiple major branches active is a standard practice.
>>>>> 
>>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>>>>> long time to get out, and during that time 0.21, 0.22, got released and
>>>>> ignored; 0.23 picked up and used in production.
>>>>> 
>>>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>>>> widely enough to be used in products, and changes were made between
>> that
>>>>> alpha & 2.2 itself which raised compatibility issues.
>>>>> 
>>>>> For 3.x I'd propose
>>>>> 
>>>>> 
>>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
>>>>> 2.  Make clear there are no guarantees of compatibility from
>> alpha/beta
>>>>> releases to shipping. Best effort, but not to the extent that it gets
>> in
>>>>> the way. More succinctly: we will care more about seamless migration
>> from
>>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>>>> alpha/beta
>>>>> phase
>>>>> 
>>>>> As well as backwards compatibility, we need to think about Forwards
>>>>> compatibility, with the goal being:
>>>>> 
>>>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>>>> where y>=x  and is-release(x) and is-release(y)
>>>>> 
>>>>> That's important, as it means all server-side changes in 3.x which are
>>>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>>>> decoding, security features, must be considered complete and stable
>>>> before
>>>>> we can say is-release(x). In an ideal world, we'll even get the
>> semantics
>>>>> right with tests to show this.
>>>>> 
>>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
>>>> But:
>>>>> it's only one of the features, and given there's not any design doc on
>>>> that
>>>>> JIRA, way too immature to set a release schedule on. An alpha schedule
>>>> with
>>>>> no-guarantees and a regular alpha roll, could be viable, as new
>> features
>>>> go
>>>>> in and can then be used to experimentally try this stuff in branches of
>>>>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>>>>> will be transitive downstream.
>>>>> 
>>>>> 
>>>>> This time around we are not replacing the guts as we did from Hadoop 1
>> to
>>>>> Hadoop 2, but superficial surgery to address issues were not considered
>>>> (or
>>>>> was too much to take on top of the guts transplant).
>>>>> 
>>>>> For the split brain concern, we did a great of job maintaining Hadoop 1
>>>> and
>>>>> Hadoop 2 until Hadoop 1 faded away.
>>>>> 
>>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>>>> compatibility.
>>>>> 
>>>>> 
>>>>> Based on that experience I would say that the coexistence of Hadoop 2
>> and
>>>>> Hadoop 3 will be much less demanding/traumatic.
>>>>> 
>>>>> The re-layout of all the source trees was a major change there,
>> assuming
>>>>> there's no refactoring or switch of build tools then picking things
>> back
>>>>> will be tractable
>>>>> 
>>>>> 
>>>>> Also, to facilitate the coexistence we should limit Java language
>>>> features
>>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>>>> anymore
>>>>> we can remove this limitation.
>>>>> 
>>>>> +1; setting javac.version will fix this
>>>>> 
>>>>> What is nice about having java 8 as the base JVM is that it means you
>> can
>>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
>> apps
>>>>> and libs can use all Java 8 features they want to.
>>>>> 
>>>>> There's one policy change to consider there which is possibly, just
>>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>>>> languages early, provided everyone recognised that "backport to
>> branch-2"
>>>>> isn't going to happen.
>>>>> 
>>>>> -Steve
>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Right, but that doesn't really answer the question….

On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:

> If classloader isolation is in place, then dependency versions can freely
> be upgraded as won't pollute apps space (things get trickier if there is an
> ON/OFF switch).
> 
> On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
> 
>> 
>> Is there going to be a general upgrade of dependencies?  I'm thinking of
>> jetty & jackson in particular.
>> 
>> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:
>> 
>>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
>>> page. In addition to the two things I've been pushing, I also looked
>>> through Allen's list (thanks Allen for making this) and picked out the
>>> shell script rewrite and the removal of HFTP as big changes. This would
>> be
>>> the place to propose features for inclusion in 3.x, I'd particularly
>>> appreciate help on the YARN/MR side.
>>> 
>>> Based on what I'm hearing, let me modulate my proposal to the following:
>>> 
>>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
>>> changes don't look that scary, so I think this is fine. This does mean we
>>> need to be more rigorous before merging branches to trunk. I think
>>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
>> would
>>> be very helpful in this regard.
>>> - We do not include anything to break wire compatibility unless (as Jason
>>> says) it's an unbelievably awesome feature.
>>> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
>>> compatibility wise. Downstreams like releases.
>>> 
>>> I'll take Steve's advice about not locking GA to a given date, but I also
>>> share his belief that we can alpha/beta/GA faster than it took for Hadoop
>>> 2. Let's roll some intermediate releases, work on the roadmap items, and
>>> see how we're feeling in a few months.
>>> 
>>> Best,
>>> Andrew
>>> 
>>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
>>> 
>>>> I think it'll be useful to have a discussion about what else people
>> would
>>>> like to see in Hadoop 3.x - especially if the change is potentially
>>>> incompatible. Also, what we expect the release schedule to be for major
>>>> releases and what triggers them - JVM version, major features, the need
>> for
>>>> incompatible changes ? Assuming major versions will not be released
>> every 6
>>>> months/1 year (adoption time, fairly disruptive for downstream projects,
>>>> and users) -  considering additional features/incompatible changes for
>> 3.x
>>>> would be useful.
>>>> 
>>>> Some features that come to mind immediately would be
>>>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
>> /
>>>> two way communication. There's a lot of places where we re-use
>> heartbeats
>>>> to send more information than what would be done if the PRC layer
>> supported
>>>> these features. Some of this can be done in a compatible manner to the
>>>> existing RPC sub-system. Others like 2 way communication probably
>> cannot.
>>>> After this, having HDFS/YARN actually make use of these changes. The
>> other
>>>> consideration is adoption of an alternate system ike gRpc which would be
>>>> incompatible.
>>>> 2) Simplification of configs - potentially separating client side
>> configs
>>>> and those used by daemons. This is another source of perpetual confusion
>>>> for users.
>>>> 
>>>> Thanks
>>>> - Sid
>>>> 
>>>> 
>>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
>>>> wrote:
>>>> 
>>>>> Sorry, outlook dequoted Alejandros's comments.
>>>>> 
>>>>> Let me try again with his comments in italic and proofreading of mine
>>>>> 
>>>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>>>>> stevel@hortonworks.com>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>>>>> 
>>>>> IMO, if part of the community wants to take on the responsibility and
>>>> work
>>>>> that takes to do a new major release, we should not discourage them
>> from
>>>>> doing that.
>>>>> 
>>>>> Having multiple major branches active is a standard practice.
>>>>> 
>>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>>>>> long time to get out, and during that time 0.21, 0.22, got released and
>>>>> ignored; 0.23 picked up and used in production.
>>>>> 
>>>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>>>> widely enough to be used in products, and changes were made between
>> that
>>>>> alpha & 2.2 itself which raised compatibility issues.
>>>>> 
>>>>> For 3.x I'd propose
>>>>> 
>>>>> 
>>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
>>>>> 2.  Make clear there are no guarantees of compatibility from
>> alpha/beta
>>>>> releases to shipping. Best effort, but not to the extent that it gets
>> in
>>>>> the way. More succinctly: we will care more about seamless migration
>> from
>>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>>>> alpha/beta
>>>>> phase
>>>>> 
>>>>> As well as backwards compatibility, we need to think about Forwards
>>>>> compatibility, with the goal being:
>>>>> 
>>>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>>>> where y>=x  and is-release(x) and is-release(y)
>>>>> 
>>>>> That's important, as it means all server-side changes in 3.x which are
>>>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>>>> decoding, security features, must be considered complete and stable
>>>> before
>>>>> we can say is-release(x). In an ideal world, we'll even get the
>> semantics
>>>>> right with tests to show this.
>>>>> 
>>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
>>>> But:
>>>>> it's only one of the features, and given there's not any design doc on
>>>> that
>>>>> JIRA, way too immature to set a release schedule on. An alpha schedule
>>>> with
>>>>> no-guarantees and a regular alpha roll, could be viable, as new
>> features
>>>> go
>>>>> in and can then be used to experimentally try this stuff in branches of
>>>>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>>>>> will be transitive downstream.
>>>>> 
>>>>> 
>>>>> This time around we are not replacing the guts as we did from Hadoop 1
>> to
>>>>> Hadoop 2, but superficial surgery to address issues were not considered
>>>> (or
>>>>> was too much to take on top of the guts transplant).
>>>>> 
>>>>> For the split brain concern, we did a great of job maintaining Hadoop 1
>>>> and
>>>>> Hadoop 2 until Hadoop 1 faded away.
>>>>> 
>>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>>>> compatibility.
>>>>> 
>>>>> 
>>>>> Based on that experience I would say that the coexistence of Hadoop 2
>> and
>>>>> Hadoop 3 will be much less demanding/traumatic.
>>>>> 
>>>>> The re-layout of all the source trees was a major change there,
>> assuming
>>>>> there's no refactoring or switch of build tools then picking things
>> back
>>>>> will be tractable
>>>>> 
>>>>> 
>>>>> Also, to facilitate the coexistence we should limit Java language
>>>> features
>>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>>>> anymore
>>>>> we can remove this limitation.
>>>>> 
>>>>> +1; setting javac.version will fix this
>>>>> 
>>>>> What is nice about having java 8 as the base JVM is that it means you
>> can
>>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
>> apps
>>>>> and libs can use all Java 8 features they want to.
>>>>> 
>>>>> There's one policy change to consider there which is possibly, just
>>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>>>> languages early, provided everyone recognised that "backport to
>> branch-2"
>>>>> isn't going to happen.
>>>>> 
>>>>> -Steve
>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Right, but that doesn't really answer the question….

On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:

> If classloader isolation is in place, then dependency versions can freely
> be upgraded as won't pollute apps space (things get trickier if there is an
> ON/OFF switch).
> 
> On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
> 
>> 
>> Is there going to be a general upgrade of dependencies?  I'm thinking of
>> jetty & jackson in particular.
>> 
>> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:
>> 
>>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
>>> page. In addition to the two things I've been pushing, I also looked
>>> through Allen's list (thanks Allen for making this) and picked out the
>>> shell script rewrite and the removal of HFTP as big changes. This would
>> be
>>> the place to propose features for inclusion in 3.x, I'd particularly
>>> appreciate help on the YARN/MR side.
>>> 
>>> Based on what I'm hearing, let me modulate my proposal to the following:
>>> 
>>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
>>> changes don't look that scary, so I think this is fine. This does mean we
>>> need to be more rigorous before merging branches to trunk. I think
>>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
>> would
>>> be very helpful in this regard.
>>> - We do not include anything to break wire compatibility unless (as Jason
>>> says) it's an unbelievably awesome feature.
>>> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
>>> compatibility wise. Downstreams like releases.
>>> 
>>> I'll take Steve's advice about not locking GA to a given date, but I also
>>> share his belief that we can alpha/beta/GA faster than it took for Hadoop
>>> 2. Let's roll some intermediate releases, work on the roadmap items, and
>>> see how we're feeling in a few months.
>>> 
>>> Best,
>>> Andrew
>>> 
>>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
>>> 
>>>> I think it'll be useful to have a discussion about what else people
>> would
>>>> like to see in Hadoop 3.x - especially if the change is potentially
>>>> incompatible. Also, what we expect the release schedule to be for major
>>>> releases and what triggers them - JVM version, major features, the need
>> for
>>>> incompatible changes ? Assuming major versions will not be released
>> every 6
>>>> months/1 year (adoption time, fairly disruptive for downstream projects,
>>>> and users) -  considering additional features/incompatible changes for
>> 3.x
>>>> would be useful.
>>>> 
>>>> Some features that come to mind immediately would be
>>>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
>> /
>>>> two way communication. There's a lot of places where we re-use
>> heartbeats
>>>> to send more information than what would be done if the PRC layer
>> supported
>>>> these features. Some of this can be done in a compatible manner to the
>>>> existing RPC sub-system. Others like 2 way communication probably
>> cannot.
>>>> After this, having HDFS/YARN actually make use of these changes. The
>> other
>>>> consideration is adoption of an alternate system ike gRpc which would be
>>>> incompatible.
>>>> 2) Simplification of configs - potentially separating client side
>> configs
>>>> and those used by daemons. This is another source of perpetual confusion
>>>> for users.
>>>> 
>>>> Thanks
>>>> - Sid
>>>> 
>>>> 
>>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
>>>> wrote:
>>>> 
>>>>> Sorry, outlook dequoted Alejandros's comments.
>>>>> 
>>>>> Let me try again with his comments in italic and proofreading of mine
>>>>> 
>>>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>>>>> stevel@hortonworks.com>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>>>>> 
>>>>> IMO, if part of the community wants to take on the responsibility and
>>>> work
>>>>> that takes to do a new major release, we should not discourage them
>> from
>>>>> doing that.
>>>>> 
>>>>> Having multiple major branches active is a standard practice.
>>>>> 
>>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>>>>> long time to get out, and during that time 0.21, 0.22, got released and
>>>>> ignored; 0.23 picked up and used in production.
>>>>> 
>>>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>>>> widely enough to be used in products, and changes were made between
>> that
>>>>> alpha & 2.2 itself which raised compatibility issues.
>>>>> 
>>>>> For 3.x I'd propose
>>>>> 
>>>>> 
>>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
>>>>> 2.  Make clear there are no guarantees of compatibility from
>> alpha/beta
>>>>> releases to shipping. Best effort, but not to the extent that it gets
>> in
>>>>> the way. More succinctly: we will care more about seamless migration
>> from
>>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>>>> alpha/beta
>>>>> phase
>>>>> 
>>>>> As well as backwards compatibility, we need to think about Forwards
>>>>> compatibility, with the goal being:
>>>>> 
>>>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>>>> where y>=x  and is-release(x) and is-release(y)
>>>>> 
>>>>> That's important, as it means all server-side changes in 3.x which are
>>>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>>>> decoding, security features, must be considered complete and stable
>>>> before
>>>>> we can say is-release(x). In an ideal world, we'll even get the
>> semantics
>>>>> right with tests to show this.
>>>>> 
>>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
>>>> But:
>>>>> it's only one of the features, and given there's not any design doc on
>>>> that
>>>>> JIRA, way too immature to set a release schedule on. An alpha schedule
>>>> with
>>>>> no-guarantees and a regular alpha roll, could be viable, as new
>> features
>>>> go
>>>>> in and can then be used to experimentally try this stuff in branches of
>>>>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>>>>> will be transitive downstream.
>>>>> 
>>>>> 
>>>>> This time around we are not replacing the guts as we did from Hadoop 1
>> to
>>>>> Hadoop 2, but superficial surgery to address issues were not considered
>>>> (or
>>>>> was too much to take on top of the guts transplant).
>>>>> 
>>>>> For the split brain concern, we did a great of job maintaining Hadoop 1
>>>> and
>>>>> Hadoop 2 until Hadoop 1 faded away.
>>>>> 
>>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>>>> compatibility.
>>>>> 
>>>>> 
>>>>> Based on that experience I would say that the coexistence of Hadoop 2
>> and
>>>>> Hadoop 3 will be much less demanding/traumatic.
>>>>> 
>>>>> The re-layout of all the source trees was a major change there,
>> assuming
>>>>> there's no refactoring or switch of build tools then picking things
>> back
>>>>> will be tractable
>>>>> 
>>>>> 
>>>>> Also, to facilitate the coexistence we should limit Java language
>>>> features
>>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>>>> anymore
>>>>> we can remove this limitation.
>>>>> 
>>>>> +1; setting javac.version will fix this
>>>>> 
>>>>> What is nice about having java 8 as the base JVM is that it means you
>> can
>>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
>> apps
>>>>> and libs can use all Java 8 features they want to.
>>>>> 
>>>>> There's one policy change to consider there which is possibly, just
>>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>>>> languages early, provided everyone recognised that "backport to
>> branch-2"
>>>>> isn't going to happen.
>>>>> 
>>>>> -Steve
>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Right, but that doesn't really answer the question….

On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tu...@gmail.com> wrote:

> If classloader isolation is in place, then dependency versions can freely
> be upgraded as won't pollute apps space (things get trickier if there is an
> ON/OFF switch).
> 
> On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
> 
>> 
>> Is there going to be a general upgrade of dependencies?  I'm thinking of
>> jetty & jackson in particular.
>> 
>> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:
>> 
>>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
>>> page. In addition to the two things I've been pushing, I also looked
>>> through Allen's list (thanks Allen for making this) and picked out the
>>> shell script rewrite and the removal of HFTP as big changes. This would
>> be
>>> the place to propose features for inclusion in 3.x, I'd particularly
>>> appreciate help on the YARN/MR side.
>>> 
>>> Based on what I'm hearing, let me modulate my proposal to the following:
>>> 
>>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
>>> changes don't look that scary, so I think this is fine. This does mean we
>>> need to be more rigorous before merging branches to trunk. I think
>>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
>> would
>>> be very helpful in this regard.
>>> - We do not include anything to break wire compatibility unless (as Jason
>>> says) it's an unbelievably awesome feature.
>>> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
>>> compatibility wise. Downstreams like releases.
>>> 
>>> I'll take Steve's advice about not locking GA to a given date, but I also
>>> share his belief that we can alpha/beta/GA faster than it took for Hadoop
>>> 2. Let's roll some intermediate releases, work on the roadmap items, and
>>> see how we're feeling in a few months.
>>> 
>>> Best,
>>> Andrew
>>> 
>>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
>>> 
>>>> I think it'll be useful to have a discussion about what else people
>> would
>>>> like to see in Hadoop 3.x - especially if the change is potentially
>>>> incompatible. Also, what we expect the release schedule to be for major
>>>> releases and what triggers them - JVM version, major features, the need
>> for
>>>> incompatible changes ? Assuming major versions will not be released
>> every 6
>>>> months/1 year (adoption time, fairly disruptive for downstream projects,
>>>> and users) -  considering additional features/incompatible changes for
>> 3.x
>>>> would be useful.
>>>> 
>>>> Some features that come to mind immediately would be
>>>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
>> /
>>>> two way communication. There's a lot of places where we re-use
>> heartbeats
>>>> to send more information than what would be done if the PRC layer
>> supported
>>>> these features. Some of this can be done in a compatible manner to the
>>>> existing RPC sub-system. Others like 2 way communication probably
>> cannot.
>>>> After this, having HDFS/YARN actually make use of these changes. The
>> other
>>>> consideration is adoption of an alternate system ike gRpc which would be
>>>> incompatible.
>>>> 2) Simplification of configs - potentially separating client side
>> configs
>>>> and those used by daemons. This is another source of perpetual confusion
>>>> for users.
>>>> 
>>>> Thanks
>>>> - Sid
>>>> 
>>>> 
>>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
>>>> wrote:
>>>> 
>>>>> Sorry, outlook dequoted Alejandros's comments.
>>>>> 
>>>>> Let me try again with his comments in italic and proofreading of mine
>>>>> 
>>>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>>>>> stevel@hortonworks.com>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>>>>> 
>>>>> IMO, if part of the community wants to take on the responsibility and
>>>> work
>>>>> that takes to do a new major release, we should not discourage them
>> from
>>>>> doing that.
>>>>> 
>>>>> Having multiple major branches active is a standard practice.
>>>>> 
>>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>>>>> long time to get out, and during that time 0.21, 0.22, got released and
>>>>> ignored; 0.23 picked up and used in production.
>>>>> 
>>>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>>>> widely enough to be used in products, and changes were made between
>> that
>>>>> alpha & 2.2 itself which raised compatibility issues.
>>>>> 
>>>>> For 3.x I'd propose
>>>>> 
>>>>> 
>>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
>>>>> 2.  Make clear there are no guarantees of compatibility from
>> alpha/beta
>>>>> releases to shipping. Best effort, but not to the extent that it gets
>> in
>>>>> the way. More succinctly: we will care more about seamless migration
>> from
>>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>>>> alpha/beta
>>>>> phase
>>>>> 
>>>>> As well as backwards compatibility, we need to think about Forwards
>>>>> compatibility, with the goal being:
>>>>> 
>>>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>>>> where y>=x  and is-release(x) and is-release(y)
>>>>> 
>>>>> That's important, as it means all server-side changes in 3.x which are
>>>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>>>> decoding, security features, must be considered complete and stable
>>>> before
>>>>> we can say is-release(x). In an ideal world, we'll even get the
>> semantics
>>>>> right with tests to show this.
>>>>> 
>>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
>>>> But:
>>>>> it's only one of the features, and given there's not any design doc on
>>>> that
>>>>> JIRA, way too immature to set a release schedule on. An alpha schedule
>>>> with
>>>>> no-guarantees and a regular alpha roll, could be viable, as new
>> features
>>>> go
>>>>> in and can then be used to experimentally try this stuff in branches of
>>>>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>>>>> will be transitive downstream.
>>>>> 
>>>>> 
>>>>> This time around we are not replacing the guts as we did from Hadoop 1
>> to
>>>>> Hadoop 2, but superficial surgery to address issues were not considered
>>>> (or
>>>>> was too much to take on top of the guts transplant).
>>>>> 
>>>>> For the split brain concern, we did a great of job maintaining Hadoop 1
>>>> and
>>>>> Hadoop 2 until Hadoop 1 faded away.
>>>>> 
>>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>>>> compatibility.
>>>>> 
>>>>> 
>>>>> Based on that experience I would say that the coexistence of Hadoop 2
>> and
>>>>> Hadoop 3 will be much less demanding/traumatic.
>>>>> 
>>>>> The re-layout of all the source trees was a major change there,
>> assuming
>>>>> there's no refactoring or switch of build tools then picking things
>> back
>>>>> will be tractable
>>>>> 
>>>>> 
>>>>> Also, to facilitate the coexistence we should limit Java language
>>>> features
>>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>>>> anymore
>>>>> we can remove this limitation.
>>>>> 
>>>>> +1; setting javac.version will fix this
>>>>> 
>>>>> What is nice about having java 8 as the base JVM is that it means you
>> can
>>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
>> apps
>>>>> and libs can use all Java 8 features they want to.
>>>>> 
>>>>> There's one policy change to consider there which is possibly, just
>>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>>>> languages early, provided everyone recognised that "backport to
>> branch-2"
>>>>> isn't going to happen.
>>>>> 
>>>>> -Steve
>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Alejandro Abdelnur <tu...@gmail.com>.

If classloader isolation is in place, then dependency versions can freely
be upgraded as won't pollute apps space (things get trickier if there is an
ON/OFF switch).

On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> Is there going to be a general upgrade of dependencies?  I'm thinking of
> jetty & jackson in particular.
>
> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:
>
> > I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> > page. In addition to the two things I've been pushing, I also looked
> > through Allen's list (thanks Allen for making this) and picked out the
> > shell script rewrite and the removal of HFTP as big changes. This would
> be
> > the place to propose features for inclusion in 3.x, I'd particularly
> > appreciate help on the YARN/MR side.
> >
> > Based on what I'm hearing, let me modulate my proposal to the following:
> >
> > - We avoid cutting branch-3, and release off of trunk. The trunk-only
> > changes don't look that scary, so I think this is fine. This does mean we
> > need to be more rigorous before merging branches to trunk. I think
> > Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> would
> > be very helpful in this regard.
> > - We do not include anything to break wire compatibility unless (as Jason
> > says) it's an unbelievably awesome feature.
> > - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> > compatibility wise. Downstreams like releases.
> >
> > I'll take Steve's advice about not locking GA to a given date, but I also
> > share his belief that we can alpha/beta/GA faster than it took for Hadoop
> > 2. Let's roll some intermediate releases, work on the roadmap items, and
> > see how we're feeling in a few months.
> >
> > Best,
> > Andrew
> >
> > On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >
> >> I think it'll be useful to have a discussion about what else people
> would
> >> like to see in Hadoop 3.x - especially if the change is potentially
> >> incompatible. Also, what we expect the release schedule to be for major
> >> releases and what triggers them - JVM version, major features, the need
> for
> >> incompatible changes ? Assuming major versions will not be released
> every 6
> >> months/1 year (adoption time, fairly disruptive for downstream projects,
> >> and users) -  considering additional features/incompatible changes for
> 3.x
> >> would be useful.
> >>
> >> Some features that come to mind immediately would be
> >> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
> /
> >> two way communication. There's a lot of places where we re-use
> heartbeats
> >> to send more information than what would be done if the PRC layer
> supported
> >> these features. Some of this can be done in a compatible manner to the
> >> existing RPC sub-system. Others like 2 way communication probably
> cannot.
> >> After this, having HDFS/YARN actually make use of these changes. The
> other
> >> consideration is adoption of an alternate system ike gRpc which would be
> >> incompatible.
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> >>
> >> Thanks
> >> - Sid
> >>
> >>
> >> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> >> wrote:
> >>
> >>> Sorry, outlook dequoted Alejandros's comments.
> >>>
> >>> Let me try again with his comments in italic and proofreading of mine
> >>>
> >>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> >>> stevel@hortonworks.com>> wrote:
> >>>
> >>>
> >>>
> >>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> >>> tucu00@gmail.com><ma...@gmail.com>> wrote:
> >>>
> >>> IMO, if part of the community wants to take on the responsibility and
> >> work
> >>> that takes to do a new major release, we should not discourage them
> from
> >>> doing that.
> >>>
> >>> Having multiple major branches active is a standard practice.
> >>>
> >>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> >>> long time to get out, and during that time 0.21, 0.22, got released and
> >>> ignored; 0.23 picked up and used in production.
> >>>
> >>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>> widely enough to be used in products, and changes were made between
> that
> >>> alpha & 2.2 itself which raised compatibility issues.
> >>>
> >>> For 3.x I'd propose
> >>>
> >>>
> >>>  1.  Have less longevity of 3.x alpha/beta artifacts
> >>>  2.  Make clear there are no guarantees of compatibility from
> alpha/beta
> >>> releases to shipping. Best effort, but not to the extent that it gets
> in
> >>> the way. More succinctly: we will care more about seamless migration
> from
> >>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >> alpha/beta
> >>> phase
> >>>
> >>> As well as backwards compatibility, we need to think about Forwards
> >>> compatibility, with the goal being:
> >>>
> >>> Any app written/shipped with the 3.x release binaries (JAR and native)
> >>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> >>> where y>=x  and is-release(x) and is-release(y)
> >>>
> >>> That's important, as it means all server-side changes in 3.x which are
> >>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>> decoding, security features, must be considered complete and stable
> >> before
> >>> we can say is-release(x). In an ideal world, we'll even get the
> semantics
> >>> right with tests to show this.
> >>>
> >>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >> But:
> >>> it's only one of the features, and given there's not any design doc on
> >> that
> >>> JIRA, way too immature to set a release schedule on. An alpha schedule
> >> with
> >>> no-guarantees and a regular alpha roll, could be viable, as new
> features
> >> go
> >>> in and can then be used to experimentally try this stuff in branches of
> >>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> >>> will be transitive downstream.
> >>>
> >>>
> >>> This time around we are not replacing the guts as we did from Hadoop 1
> to
> >>> Hadoop 2, but superficial surgery to address issues were not considered
> >> (or
> >>> was too much to take on top of the guts transplant).
> >>>
> >>> For the split brain concern, we did a great of job maintaining Hadoop 1
> >> and
> >>> Hadoop 2 until Hadoop 1 faded away.
> >>>
> >>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>> compatibility.
> >>>
> >>>
> >>> Based on that experience I would say that the coexistence of Hadoop 2
> and
> >>> Hadoop 3 will be much less demanding/traumatic.
> >>>
> >>> The re-layout of all the source trees was a major change there,
> assuming
> >>> there's no refactoring or switch of build tools then picking things
> back
> >>> will be tractable
> >>>
> >>>
> >>> Also, to facilitate the coexistence we should limit Java language
> >> features
> >>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >> anymore
> >>> we can remove this limitation.
> >>>
> >>> +1; setting javac.version will fix this
> >>>
> >>> What is nice about having java 8 as the base JVM is that it means you
> can
> >>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> apps
> >>> and libs can use all Java 8 features they want to.
> >>>
> >>> There's one policy change to consider there which is possibly, just
> >>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>> languages early, provided everyone recognised that "backport to
> branch-2"
> >>> isn't going to happen.
> >>>
> >>> -Steve
> >>>
> >>>
> >>
>
>

Re: Looking to a Hadoop 3 release

Posted by Alejandro Abdelnur <tu...@gmail.com>.

If classloader isolation is in place, then dependency versions can freely
be upgraded as won't pollute apps space (things get trickier if there is an
ON/OFF switch).

On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> Is there going to be a general upgrade of dependencies?  I'm thinking of
> jetty & jackson in particular.
>
> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:
>
> > I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> > page. In addition to the two things I've been pushing, I also looked
> > through Allen's list (thanks Allen for making this) and picked out the
> > shell script rewrite and the removal of HFTP as big changes. This would
> be
> > the place to propose features for inclusion in 3.x, I'd particularly
> > appreciate help on the YARN/MR side.
> >
> > Based on what I'm hearing, let me modulate my proposal to the following:
> >
> > - We avoid cutting branch-3, and release off of trunk. The trunk-only
> > changes don't look that scary, so I think this is fine. This does mean we
> > need to be more rigorous before merging branches to trunk. I think
> > Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> would
> > be very helpful in this regard.
> > - We do not include anything to break wire compatibility unless (as Jason
> > says) it's an unbelievably awesome feature.
> > - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> > compatibility wise. Downstreams like releases.
> >
> > I'll take Steve's advice about not locking GA to a given date, but I also
> > share his belief that we can alpha/beta/GA faster than it took for Hadoop
> > 2. Let's roll some intermediate releases, work on the roadmap items, and
> > see how we're feeling in a few months.
> >
> > Best,
> > Andrew
> >
> > On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >
> >> I think it'll be useful to have a discussion about what else people
> would
> >> like to see in Hadoop 3.x - especially if the change is potentially
> >> incompatible. Also, what we expect the release schedule to be for major
> >> releases and what triggers them - JVM version, major features, the need
> for
> >> incompatible changes ? Assuming major versions will not be released
> every 6
> >> months/1 year (adoption time, fairly disruptive for downstream projects,
> >> and users) -  considering additional features/incompatible changes for
> 3.x
> >> would be useful.
> >>
> >> Some features that come to mind immediately would be
> >> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
> /
> >> two way communication. There's a lot of places where we re-use
> heartbeats
> >> to send more information than what would be done if the PRC layer
> supported
> >> these features. Some of this can be done in a compatible manner to the
> >> existing RPC sub-system. Others like 2 way communication probably
> cannot.
> >> After this, having HDFS/YARN actually make use of these changes. The
> other
> >> consideration is adoption of an alternate system ike gRpc which would be
> >> incompatible.
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> >>
> >> Thanks
> >> - Sid
> >>
> >>
> >> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> >> wrote:
> >>
> >>> Sorry, outlook dequoted Alejandros's comments.
> >>>
> >>> Let me try again with his comments in italic and proofreading of mine
> >>>
> >>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> >>> stevel@hortonworks.com>> wrote:
> >>>
> >>>
> >>>
> >>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> >>> tucu00@gmail.com><ma...@gmail.com>> wrote:
> >>>
> >>> IMO, if part of the community wants to take on the responsibility and
> >> work
> >>> that takes to do a new major release, we should not discourage them
> from
> >>> doing that.
> >>>
> >>> Having multiple major branches active is a standard practice.
> >>>
> >>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> >>> long time to get out, and during that time 0.21, 0.22, got released and
> >>> ignored; 0.23 picked up and used in production.
> >>>
> >>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>> widely enough to be used in products, and changes were made between
> that
> >>> alpha & 2.2 itself which raised compatibility issues.
> >>>
> >>> For 3.x I'd propose
> >>>
> >>>
> >>>  1.  Have less longevity of 3.x alpha/beta artifacts
> >>>  2.  Make clear there are no guarantees of compatibility from
> alpha/beta
> >>> releases to shipping. Best effort, but not to the extent that it gets
> in
> >>> the way. More succinctly: we will care more about seamless migration
> from
> >>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >> alpha/beta
> >>> phase
> >>>
> >>> As well as backwards compatibility, we need to think about Forwards
> >>> compatibility, with the goal being:
> >>>
> >>> Any app written/shipped with the 3.x release binaries (JAR and native)
> >>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> >>> where y>=x  and is-release(x) and is-release(y)
> >>>
> >>> That's important, as it means all server-side changes in 3.x which are
> >>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>> decoding, security features, must be considered complete and stable
> >> before
> >>> we can say is-release(x). In an ideal world, we'll even get the
> semantics
> >>> right with tests to show this.
> >>>
> >>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >> But:
> >>> it's only one of the features, and given there's not any design doc on
> >> that
> >>> JIRA, way too immature to set a release schedule on. An alpha schedule
> >> with
> >>> no-guarantees and a regular alpha roll, could be viable, as new
> features
> >> go
> >>> in and can then be used to experimentally try this stuff in branches of
> >>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> >>> will be transitive downstream.
> >>>
> >>>
> >>> This time around we are not replacing the guts as we did from Hadoop 1
> to
> >>> Hadoop 2, but superficial surgery to address issues were not considered
> >> (or
> >>> was too much to take on top of the guts transplant).
> >>>
> >>> For the split brain concern, we did a great of job maintaining Hadoop 1
> >> and
> >>> Hadoop 2 until Hadoop 1 faded away.
> >>>
> >>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>> compatibility.
> >>>
> >>>
> >>> Based on that experience I would say that the coexistence of Hadoop 2
> and
> >>> Hadoop 3 will be much less demanding/traumatic.
> >>>
> >>> The re-layout of all the source trees was a major change there,
> assuming
> >>> there's no refactoring or switch of build tools then picking things
> back
> >>> will be tractable
> >>>
> >>>
> >>> Also, to facilitate the coexistence we should limit Java language
> >> features
> >>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >> anymore
> >>> we can remove this limitation.
> >>>
> >>> +1; setting javac.version will fix this
> >>>
> >>> What is nice about having java 8 as the base JVM is that it means you
> can
> >>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> apps
> >>> and libs can use all Java 8 features they want to.
> >>>
> >>> There's one policy change to consider there which is possibly, just
> >>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>> languages early, provided everyone recognised that "backport to
> branch-2"
> >>> isn't going to happen.
> >>>
> >>> -Steve
> >>>
> >>>
> >>
>
>

Re: Looking to a Hadoop 3 release

Posted by Alejandro Abdelnur <tu...@gmail.com>.

If classloader isolation is in place, then dependency versions can freely
be upgraded as won't pollute apps space (things get trickier if there is an
ON/OFF switch).

On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> Is there going to be a general upgrade of dependencies?  I'm thinking of
> jetty & jackson in particular.
>
> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:
>
> > I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> > page. In addition to the two things I've been pushing, I also looked
> > through Allen's list (thanks Allen for making this) and picked out the
> > shell script rewrite and the removal of HFTP as big changes. This would
> be
> > the place to propose features for inclusion in 3.x, I'd particularly
> > appreciate help on the YARN/MR side.
> >
> > Based on what I'm hearing, let me modulate my proposal to the following:
> >
> > - We avoid cutting branch-3, and release off of trunk. The trunk-only
> > changes don't look that scary, so I think this is fine. This does mean we
> > need to be more rigorous before merging branches to trunk. I think
> > Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> would
> > be very helpful in this regard.
> > - We do not include anything to break wire compatibility unless (as Jason
> > says) it's an unbelievably awesome feature.
> > - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> > compatibility wise. Downstreams like releases.
> >
> > I'll take Steve's advice about not locking GA to a given date, but I also
> > share his belief that we can alpha/beta/GA faster than it took for Hadoop
> > 2. Let's roll some intermediate releases, work on the roadmap items, and
> > see how we're feeling in a few months.
> >
> > Best,
> > Andrew
> >
> > On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >
> >> I think it'll be useful to have a discussion about what else people
> would
> >> like to see in Hadoop 3.x - especially if the change is potentially
> >> incompatible. Also, what we expect the release schedule to be for major
> >> releases and what triggers them - JVM version, major features, the need
> for
> >> incompatible changes ? Assuming major versions will not be released
> every 6
> >> months/1 year (adoption time, fairly disruptive for downstream projects,
> >> and users) -  considering additional features/incompatible changes for
> 3.x
> >> would be useful.
> >>
> >> Some features that come to mind immediately would be
> >> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
> /
> >> two way communication. There's a lot of places where we re-use
> heartbeats
> >> to send more information than what would be done if the PRC layer
> supported
> >> these features. Some of this can be done in a compatible manner to the
> >> existing RPC sub-system. Others like 2 way communication probably
> cannot.
> >> After this, having HDFS/YARN actually make use of these changes. The
> other
> >> consideration is adoption of an alternate system ike gRpc which would be
> >> incompatible.
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> >>
> >> Thanks
> >> - Sid
> >>
> >>
> >> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> >> wrote:
> >>
> >>> Sorry, outlook dequoted Alejandros's comments.
> >>>
> >>> Let me try again with his comments in italic and proofreading of mine
> >>>
> >>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> >>> stevel@hortonworks.com>> wrote:
> >>>
> >>>
> >>>
> >>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> >>> tucu00@gmail.com><ma...@gmail.com>> wrote:
> >>>
> >>> IMO, if part of the community wants to take on the responsibility and
> >> work
> >>> that takes to do a new major release, we should not discourage them
> from
> >>> doing that.
> >>>
> >>> Having multiple major branches active is a standard practice.
> >>>
> >>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> >>> long time to get out, and during that time 0.21, 0.22, got released and
> >>> ignored; 0.23 picked up and used in production.
> >>>
> >>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>> widely enough to be used in products, and changes were made between
> that
> >>> alpha & 2.2 itself which raised compatibility issues.
> >>>
> >>> For 3.x I'd propose
> >>>
> >>>
> >>>  1.  Have less longevity of 3.x alpha/beta artifacts
> >>>  2.  Make clear there are no guarantees of compatibility from
> alpha/beta
> >>> releases to shipping. Best effort, but not to the extent that it gets
> in
> >>> the way. More succinctly: we will care more about seamless migration
> from
> >>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >> alpha/beta
> >>> phase
> >>>
> >>> As well as backwards compatibility, we need to think about Forwards
> >>> compatibility, with the goal being:
> >>>
> >>> Any app written/shipped with the 3.x release binaries (JAR and native)
> >>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> >>> where y>=x  and is-release(x) and is-release(y)
> >>>
> >>> That's important, as it means all server-side changes in 3.x which are
> >>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>> decoding, security features, must be considered complete and stable
> >> before
> >>> we can say is-release(x). In an ideal world, we'll even get the
> semantics
> >>> right with tests to show this.
> >>>
> >>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >> But:
> >>> it's only one of the features, and given there's not any design doc on
> >> that
> >>> JIRA, way too immature to set a release schedule on. An alpha schedule
> >> with
> >>> no-guarantees and a regular alpha roll, could be viable, as new
> features
> >> go
> >>> in and can then be used to experimentally try this stuff in branches of
> >>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> >>> will be transitive downstream.
> >>>
> >>>
> >>> This time around we are not replacing the guts as we did from Hadoop 1
> to
> >>> Hadoop 2, but superficial surgery to address issues were not considered
> >> (or
> >>> was too much to take on top of the guts transplant).
> >>>
> >>> For the split brain concern, we did a great of job maintaining Hadoop 1
> >> and
> >>> Hadoop 2 until Hadoop 1 faded away.
> >>>
> >>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>> compatibility.
> >>>
> >>>
> >>> Based on that experience I would say that the coexistence of Hadoop 2
> and
> >>> Hadoop 3 will be much less demanding/traumatic.
> >>>
> >>> The re-layout of all the source trees was a major change there,
> assuming
> >>> there's no refactoring or switch of build tools then picking things
> back
> >>> will be tractable
> >>>
> >>>
> >>> Also, to facilitate the coexistence we should limit Java language
> >> features
> >>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >> anymore
> >>> we can remove this limitation.
> >>>
> >>> +1; setting javac.version will fix this
> >>>
> >>> What is nice about having java 8 as the base JVM is that it means you
> can
> >>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> apps
> >>> and libs can use all Java 8 features they want to.
> >>>
> >>> There's one policy change to consider there which is possibly, just
> >>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>> languages early, provided everyone recognised that "backport to
> branch-2"
> >>> isn't going to happen.
> >>>
> >>> -Steve
> >>>
> >>>
> >>
>
>

Re: Looking to a Hadoop 3 release

Posted by Alejandro Abdelnur <tu...@gmail.com>.

If classloader isolation is in place, then dependency versions can freely
be upgraded as won't pollute apps space (things get trickier if there is an
ON/OFF switch).

On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> Is there going to be a general upgrade of dependencies?  I'm thinking of
> jetty & jackson in particular.
>
> On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:
>
> > I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> > page. In addition to the two things I've been pushing, I also looked
> > through Allen's list (thanks Allen for making this) and picked out the
> > shell script rewrite and the removal of HFTP as big changes. This would
> be
> > the place to propose features for inclusion in 3.x, I'd particularly
> > appreciate help on the YARN/MR side.
> >
> > Based on what I'm hearing, let me modulate my proposal to the following:
> >
> > - We avoid cutting branch-3, and release off of trunk. The trunk-only
> > changes don't look that scary, so I think this is fine. This does mean we
> > need to be more rigorous before merging branches to trunk. I think
> > Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> would
> > be very helpful in this regard.
> > - We do not include anything to break wire compatibility unless (as Jason
> > says) it's an unbelievably awesome feature.
> > - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> > compatibility wise. Downstreams like releases.
> >
> > I'll take Steve's advice about not locking GA to a given date, but I also
> > share his belief that we can alpha/beta/GA faster than it took for Hadoop
> > 2. Let's roll some intermediate releases, work on the roadmap items, and
> > see how we're feeling in a few months.
> >
> > Best,
> > Andrew
> >
> > On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >
> >> I think it'll be useful to have a discussion about what else people
> would
> >> like to see in Hadoop 3.x - especially if the change is potentially
> >> incompatible. Also, what we expect the release schedule to be for major
> >> releases and what triggers them - JVM version, major features, the need
> for
> >> incompatible changes ? Assuming major versions will not be released
> every 6
> >> months/1 year (adoption time, fairly disruptive for downstream projects,
> >> and users) -  considering additional features/incompatible changes for
> 3.x
> >> would be useful.
> >>
> >> Some features that come to mind immediately would be
> >> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
> /
> >> two way communication. There's a lot of places where we re-use
> heartbeats
> >> to send more information than what would be done if the PRC layer
> supported
> >> these features. Some of this can be done in a compatible manner to the
> >> existing RPC sub-system. Others like 2 way communication probably
> cannot.
> >> After this, having HDFS/YARN actually make use of these changes. The
> other
> >> consideration is adoption of an alternate system ike gRpc which would be
> >> incompatible.
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> >>
> >> Thanks
> >> - Sid
> >>
> >>
> >> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> >> wrote:
> >>
> >>> Sorry, outlook dequoted Alejandros's comments.
> >>>
> >>> Let me try again with his comments in italic and proofreading of mine
> >>>
> >>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> >>> stevel@hortonworks.com>> wrote:
> >>>
> >>>
> >>>
> >>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> >>> tucu00@gmail.com><ma...@gmail.com>> wrote:
> >>>
> >>> IMO, if part of the community wants to take on the responsibility and
> >> work
> >>> that takes to do a new major release, we should not discourage them
> from
> >>> doing that.
> >>>
> >>> Having multiple major branches active is a standard practice.
> >>>
> >>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> >>> long time to get out, and during that time 0.21, 0.22, got released and
> >>> ignored; 0.23 picked up and used in production.
> >>>
> >>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>> widely enough to be used in products, and changes were made between
> that
> >>> alpha & 2.2 itself which raised compatibility issues.
> >>>
> >>> For 3.x I'd propose
> >>>
> >>>
> >>>  1.  Have less longevity of 3.x alpha/beta artifacts
> >>>  2.  Make clear there are no guarantees of compatibility from
> alpha/beta
> >>> releases to shipping. Best effort, but not to the extent that it gets
> in
> >>> the way. More succinctly: we will care more about seamless migration
> from
> >>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >> alpha/beta
> >>> phase
> >>>
> >>> As well as backwards compatibility, we need to think about Forwards
> >>> compatibility, with the goal being:
> >>>
> >>> Any app written/shipped with the 3.x release binaries (JAR and native)
> >>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> >>> where y>=x  and is-release(x) and is-release(y)
> >>>
> >>> That's important, as it means all server-side changes in 3.x which are
> >>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>> decoding, security features, must be considered complete and stable
> >> before
> >>> we can say is-release(x). In an ideal world, we'll even get the
> semantics
> >>> right with tests to show this.
> >>>
> >>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >> But:
> >>> it's only one of the features, and given there's not any design doc on
> >> that
> >>> JIRA, way too immature to set a release schedule on. An alpha schedule
> >> with
> >>> no-guarantees and a regular alpha roll, could be viable, as new
> features
> >> go
> >>> in and can then be used to experimentally try this stuff in branches of
> >>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> >>> will be transitive downstream.
> >>>
> >>>
> >>> This time around we are not replacing the guts as we did from Hadoop 1
> to
> >>> Hadoop 2, but superficial surgery to address issues were not considered
> >> (or
> >>> was too much to take on top of the guts transplant).
> >>>
> >>> For the split brain concern, we did a great of job maintaining Hadoop 1
> >> and
> >>> Hadoop 2 until Hadoop 1 faded away.
> >>>
> >>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>> compatibility.
> >>>
> >>>
> >>> Based on that experience I would say that the coexistence of Hadoop 2
> and
> >>> Hadoop 3 will be much less demanding/traumatic.
> >>>
> >>> The re-layout of all the source trees was a major change there,
> assuming
> >>> there's no refactoring or switch of build tools then picking things
> back
> >>> will be tractable
> >>>
> >>>
> >>> Also, to facilitate the coexistence we should limit Java language
> >> features
> >>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >> anymore
> >>> we can remove this limitation.
> >>>
> >>> +1; setting javac.version will fix this
> >>>
> >>> What is nice about having java 8 as the base JVM is that it means you
> can
> >>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> apps
> >>> and libs can use all Java 8 features they want to.
> >>>
> >>> There's one policy change to consider there which is possibly, just
> >>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>> languages early, provided everyone recognised that "backport to
> branch-2"
> >>> isn't going to happen.
> >>>
> >>> -Steve
> >>>
> >>>
> >>
>
>

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Is there going to be a general upgrade of dependencies?  I'm thinking of jetty & jackson in particular.

On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:

> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> page. In addition to the two things I've been pushing, I also looked
> through Allen's list (thanks Allen for making this) and picked out the
> shell script rewrite and the removal of HFTP as big changes. This would be
> the place to propose features for inclusion in 3.x, I'd particularly
> appreciate help on the YARN/MR side.
> 
> Based on what I'm hearing, let me modulate my proposal to the following:
> 
> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> changes don't look that scary, so I think this is fine. This does mean we
> need to be more rigorous before merging branches to trunk. I think
> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
> be very helpful in this regard.
> - We do not include anything to break wire compatibility unless (as Jason
> says) it's an unbelievably awesome feature.
> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> compatibility wise. Downstreams like releases.
> 
> I'll take Steve's advice about not locking GA to a given date, but I also
> share his belief that we can alpha/beta/GA faster than it took for Hadoop
> 2. Let's roll some intermediate releases, work on the roadmap items, and
> see how we're feeling in a few months.
> 
> Best,
> Andrew
> 
> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> 
>> I think it'll be useful to have a discussion about what else people would
>> like to see in Hadoop 3.x - especially if the change is potentially
>> incompatible. Also, what we expect the release schedule to be for major
>> releases and what triggers them - JVM version, major features, the need for
>> incompatible changes ? Assuming major versions will not be released every 6
>> months/1 year (adoption time, fairly disruptive for downstream projects,
>> and users) -  considering additional features/incompatible changes for 3.x
>> would be useful.
>> 
>> Some features that come to mind immediately would be
>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
>> two way communication. There's a lot of places where we re-use heartbeats
>> to send more information than what would be done if the PRC layer supported
>> these features. Some of this can be done in a compatible manner to the
>> existing RPC sub-system. Others like 2 way communication probably cannot.
>> After this, having HDFS/YARN actually make use of these changes. The other
>> consideration is adoption of an alternate system ike gRpc which would be
>> incompatible.
>> 2) Simplification of configs - potentially separating client side configs
>> and those used by daemons. This is another source of perpetual confusion
>> for users.
>> 
>> Thanks
>> - Sid
>> 
>> 
>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> 
>>> Sorry, outlook dequoted Alejandros's comments.
>>> 
>>> Let me try again with his comments in italic and proofreading of mine
>>> 
>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>>> stevel@hortonworks.com>> wrote:
>>> 
>>> 
>>> 
>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>>> 
>>> IMO, if part of the community wants to take on the responsibility and
>> work
>>> that takes to do a new major release, we should not discourage them from
>>> doing that.
>>> 
>>> Having multiple major branches active is a standard practice.
>>> 
>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>>> long time to get out, and during that time 0.21, 0.22, got released and
>>> ignored; 0.23 picked up and used in production.
>>> 
>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>> widely enough to be used in products, and changes were made between that
>>> alpha & 2.2 itself which raised compatibility issues.
>>> 
>>> For 3.x I'd propose
>>> 
>>> 
>>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>>> releases to shipping. Best effort, but not to the extent that it gets in
>>> the way. More succinctly: we will care more about seamless migration from
>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>> alpha/beta
>>> phase
>>> 
>>> As well as backwards compatibility, we need to think about Forwards
>>> compatibility, with the goal being:
>>> 
>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>> where y>=x  and is-release(x) and is-release(y)
>>> 
>>> That's important, as it means all server-side changes in 3.x which are
>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>> decoding, security features, must be considered complete and stable
>> before
>>> we can say is-release(x). In an ideal world, we'll even get the semantics
>>> right with tests to show this.
>>> 
>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
>> But:
>>> it's only one of the features, and given there's not any design doc on
>> that
>>> JIRA, way too immature to set a release schedule on. An alpha schedule
>> with
>>> no-guarantees and a regular alpha roll, could be viable, as new features
>> go
>>> in and can then be used to experimentally try this stuff in branches of
>>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>>> will be transitive downstream.
>>> 
>>> 
>>> This time around we are not replacing the guts as we did from Hadoop 1 to
>>> Hadoop 2, but superficial surgery to address issues were not considered
>> (or
>>> was too much to take on top of the guts transplant).
>>> 
>>> For the split brain concern, we did a great of job maintaining Hadoop 1
>> and
>>> Hadoop 2 until Hadoop 1 faded away.
>>> 
>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>> compatibility.
>>> 
>>> 
>>> Based on that experience I would say that the coexistence of Hadoop 2 and
>>> Hadoop 3 will be much less demanding/traumatic.
>>> 
>>> The re-layout of all the source trees was a major change there, assuming
>>> there's no refactoring or switch of build tools then picking things back
>>> will be tractable
>>> 
>>> 
>>> Also, to facilitate the coexistence we should limit Java language
>> features
>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>> anymore
>>> we can remove this limitation.
>>> 
>>> +1; setting javac.version will fix this
>>> 
>>> What is nice about having java 8 as the base JVM is that it means you can
>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>>> and libs can use all Java 8 features they want to.
>>> 
>>> There's one policy change to consider there which is possibly, just
>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>> languages early, provided everyone recognised that "backport to branch-2"
>>> isn't going to happen.
>>> 
>>> -Steve
>>> 
>>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Is there going to be a general upgrade of dependencies?  I'm thinking of jetty & jackson in particular.

On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:

> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> page. In addition to the two things I've been pushing, I also looked
> through Allen's list (thanks Allen for making this) and picked out the
> shell script rewrite and the removal of HFTP as big changes. This would be
> the place to propose features for inclusion in 3.x, I'd particularly
> appreciate help on the YARN/MR side.
> 
> Based on what I'm hearing, let me modulate my proposal to the following:
> 
> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> changes don't look that scary, so I think this is fine. This does mean we
> need to be more rigorous before merging branches to trunk. I think
> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
> be very helpful in this regard.
> - We do not include anything to break wire compatibility unless (as Jason
> says) it's an unbelievably awesome feature.
> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> compatibility wise. Downstreams like releases.
> 
> I'll take Steve's advice about not locking GA to a given date, but I also
> share his belief that we can alpha/beta/GA faster than it took for Hadoop
> 2. Let's roll some intermediate releases, work on the roadmap items, and
> see how we're feeling in a few months.
> 
> Best,
> Andrew
> 
> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> 
>> I think it'll be useful to have a discussion about what else people would
>> like to see in Hadoop 3.x - especially if the change is potentially
>> incompatible. Also, what we expect the release schedule to be for major
>> releases and what triggers them - JVM version, major features, the need for
>> incompatible changes ? Assuming major versions will not be released every 6
>> months/1 year (adoption time, fairly disruptive for downstream projects,
>> and users) -  considering additional features/incompatible changes for 3.x
>> would be useful.
>> 
>> Some features that come to mind immediately would be
>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
>> two way communication. There's a lot of places where we re-use heartbeats
>> to send more information than what would be done if the PRC layer supported
>> these features. Some of this can be done in a compatible manner to the
>> existing RPC sub-system. Others like 2 way communication probably cannot.
>> After this, having HDFS/YARN actually make use of these changes. The other
>> consideration is adoption of an alternate system ike gRpc which would be
>> incompatible.
>> 2) Simplification of configs - potentially separating client side configs
>> and those used by daemons. This is another source of perpetual confusion
>> for users.
>> 
>> Thanks
>> - Sid
>> 
>> 
>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> 
>>> Sorry, outlook dequoted Alejandros's comments.
>>> 
>>> Let me try again with his comments in italic and proofreading of mine
>>> 
>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>>> stevel@hortonworks.com>> wrote:
>>> 
>>> 
>>> 
>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>>> 
>>> IMO, if part of the community wants to take on the responsibility and
>> work
>>> that takes to do a new major release, we should not discourage them from
>>> doing that.
>>> 
>>> Having multiple major branches active is a standard practice.
>>> 
>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>>> long time to get out, and during that time 0.21, 0.22, got released and
>>> ignored; 0.23 picked up and used in production.
>>> 
>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>> widely enough to be used in products, and changes were made between that
>>> alpha & 2.2 itself which raised compatibility issues.
>>> 
>>> For 3.x I'd propose
>>> 
>>> 
>>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>>> releases to shipping. Best effort, but not to the extent that it gets in
>>> the way. More succinctly: we will care more about seamless migration from
>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>> alpha/beta
>>> phase
>>> 
>>> As well as backwards compatibility, we need to think about Forwards
>>> compatibility, with the goal being:
>>> 
>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>> where y>=x  and is-release(x) and is-release(y)
>>> 
>>> That's important, as it means all server-side changes in 3.x which are
>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>> decoding, security features, must be considered complete and stable
>> before
>>> we can say is-release(x). In an ideal world, we'll even get the semantics
>>> right with tests to show this.
>>> 
>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
>> But:
>>> it's only one of the features, and given there's not any design doc on
>> that
>>> JIRA, way too immature to set a release schedule on. An alpha schedule
>> with
>>> no-guarantees and a regular alpha roll, could be viable, as new features
>> go
>>> in and can then be used to experimentally try this stuff in branches of
>>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>>> will be transitive downstream.
>>> 
>>> 
>>> This time around we are not replacing the guts as we did from Hadoop 1 to
>>> Hadoop 2, but superficial surgery to address issues were not considered
>> (or
>>> was too much to take on top of the guts transplant).
>>> 
>>> For the split brain concern, we did a great of job maintaining Hadoop 1
>> and
>>> Hadoop 2 until Hadoop 1 faded away.
>>> 
>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>> compatibility.
>>> 
>>> 
>>> Based on that experience I would say that the coexistence of Hadoop 2 and
>>> Hadoop 3 will be much less demanding/traumatic.
>>> 
>>> The re-layout of all the source trees was a major change there, assuming
>>> there's no refactoring or switch of build tools then picking things back
>>> will be tractable
>>> 
>>> 
>>> Also, to facilitate the coexistence we should limit Java language
>> features
>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>> anymore
>>> we can remove this limitation.
>>> 
>>> +1; setting javac.version will fix this
>>> 
>>> What is nice about having java 8 as the base JVM is that it means you can
>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>>> and libs can use all Java 8 features they want to.
>>> 
>>> There's one policy change to consider there which is possibly, just
>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>> languages early, provided everyone recognised that "backport to branch-2"
>>> isn't going to happen.
>>> 
>>> -Steve
>>> 
>>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Is there going to be a general upgrade of dependencies?  I'm thinking of jetty & jackson in particular.

On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:

> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> page. In addition to the two things I've been pushing, I also looked
> through Allen's list (thanks Allen for making this) and picked out the
> shell script rewrite and the removal of HFTP as big changes. This would be
> the place to propose features for inclusion in 3.x, I'd particularly
> appreciate help on the YARN/MR side.
> 
> Based on what I'm hearing, let me modulate my proposal to the following:
> 
> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> changes don't look that scary, so I think this is fine. This does mean we
> need to be more rigorous before merging branches to trunk. I think
> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
> be very helpful in this regard.
> - We do not include anything to break wire compatibility unless (as Jason
> says) it's an unbelievably awesome feature.
> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> compatibility wise. Downstreams like releases.
> 
> I'll take Steve's advice about not locking GA to a given date, but I also
> share his belief that we can alpha/beta/GA faster than it took for Hadoop
> 2. Let's roll some intermediate releases, work on the roadmap items, and
> see how we're feeling in a few months.
> 
> Best,
> Andrew
> 
> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> 
>> I think it'll be useful to have a discussion about what else people would
>> like to see in Hadoop 3.x - especially if the change is potentially
>> incompatible. Also, what we expect the release schedule to be for major
>> releases and what triggers them - JVM version, major features, the need for
>> incompatible changes ? Assuming major versions will not be released every 6
>> months/1 year (adoption time, fairly disruptive for downstream projects,
>> and users) -  considering additional features/incompatible changes for 3.x
>> would be useful.
>> 
>> Some features that come to mind immediately would be
>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
>> two way communication. There's a lot of places where we re-use heartbeats
>> to send more information than what would be done if the PRC layer supported
>> these features. Some of this can be done in a compatible manner to the
>> existing RPC sub-system. Others like 2 way communication probably cannot.
>> After this, having HDFS/YARN actually make use of these changes. The other
>> consideration is adoption of an alternate system ike gRpc which would be
>> incompatible.
>> 2) Simplification of configs - potentially separating client side configs
>> and those used by daemons. This is another source of perpetual confusion
>> for users.
>> 
>> Thanks
>> - Sid
>> 
>> 
>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> 
>>> Sorry, outlook dequoted Alejandros's comments.
>>> 
>>> Let me try again with his comments in italic and proofreading of mine
>>> 
>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>>> stevel@hortonworks.com>> wrote:
>>> 
>>> 
>>> 
>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>>> 
>>> IMO, if part of the community wants to take on the responsibility and
>> work
>>> that takes to do a new major release, we should not discourage them from
>>> doing that.
>>> 
>>> Having multiple major branches active is a standard practice.
>>> 
>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>>> long time to get out, and during that time 0.21, 0.22, got released and
>>> ignored; 0.23 picked up and used in production.
>>> 
>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>> widely enough to be used in products, and changes were made between that
>>> alpha & 2.2 itself which raised compatibility issues.
>>> 
>>> For 3.x I'd propose
>>> 
>>> 
>>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>>> releases to shipping. Best effort, but not to the extent that it gets in
>>> the way. More succinctly: we will care more about seamless migration from
>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>> alpha/beta
>>> phase
>>> 
>>> As well as backwards compatibility, we need to think about Forwards
>>> compatibility, with the goal being:
>>> 
>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>> where y>=x  and is-release(x) and is-release(y)
>>> 
>>> That's important, as it means all server-side changes in 3.x which are
>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>> decoding, security features, must be considered complete and stable
>> before
>>> we can say is-release(x). In an ideal world, we'll even get the semantics
>>> right with tests to show this.
>>> 
>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
>> But:
>>> it's only one of the features, and given there's not any design doc on
>> that
>>> JIRA, way too immature to set a release schedule on. An alpha schedule
>> with
>>> no-guarantees and a regular alpha roll, could be viable, as new features
>> go
>>> in and can then be used to experimentally try this stuff in branches of
>>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>>> will be transitive downstream.
>>> 
>>> 
>>> This time around we are not replacing the guts as we did from Hadoop 1 to
>>> Hadoop 2, but superficial surgery to address issues were not considered
>> (or
>>> was too much to take on top of the guts transplant).
>>> 
>>> For the split brain concern, we did a great of job maintaining Hadoop 1
>> and
>>> Hadoop 2 until Hadoop 1 faded away.
>>> 
>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>> compatibility.
>>> 
>>> 
>>> Based on that experience I would say that the coexistence of Hadoop 2 and
>>> Hadoop 3 will be much less demanding/traumatic.
>>> 
>>> The re-layout of all the source trees was a major change there, assuming
>>> there's no refactoring or switch of build tools then picking things back
>>> will be tractable
>>> 
>>> 
>>> Also, to facilitate the coexistence we should limit Java language
>> features
>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>> anymore
>>> we can remove this limitation.
>>> 
>>> +1; setting javac.version will fix this
>>> 
>>> What is nice about having java 8 as the base JVM is that it means you can
>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>>> and libs can use all Java 8 features they want to.
>>> 
>>> There's one policy change to consider there which is possibly, just
>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>> languages early, provided everyone recognised that "backport to branch-2"
>>> isn't going to happen.
>>> 
>>> -Steve
>>> 
>>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

Is there going to be a general upgrade of dependencies?  I'm thinking of jetty & jackson in particular.

On Mar 5, 2015, at 5:24 PM, Andrew Wang <an...@cloudera.com> wrote:

> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> page. In addition to the two things I've been pushing, I also looked
> through Allen's list (thanks Allen for making this) and picked out the
> shell script rewrite and the removal of HFTP as big changes. This would be
> the place to propose features for inclusion in 3.x, I'd particularly
> appreciate help on the YARN/MR side.
> 
> Based on what I'm hearing, let me modulate my proposal to the following:
> 
> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> changes don't look that scary, so I think this is fine. This does mean we
> need to be more rigorous before merging branches to trunk. I think
> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
> be very helpful in this regard.
> - We do not include anything to break wire compatibility unless (as Jason
> says) it's an unbelievably awesome feature.
> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> compatibility wise. Downstreams like releases.
> 
> I'll take Steve's advice about not locking GA to a given date, but I also
> share his belief that we can alpha/beta/GA faster than it took for Hadoop
> 2. Let's roll some intermediate releases, work on the roadmap items, and
> see how we're feeling in a few months.
> 
> Best,
> Andrew
> 
> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> 
>> I think it'll be useful to have a discussion about what else people would
>> like to see in Hadoop 3.x - especially if the change is potentially
>> incompatible. Also, what we expect the release schedule to be for major
>> releases and what triggers them - JVM version, major features, the need for
>> incompatible changes ? Assuming major versions will not be released every 6
>> months/1 year (adoption time, fairly disruptive for downstream projects,
>> and users) -  considering additional features/incompatible changes for 3.x
>> would be useful.
>> 
>> Some features that come to mind immediately would be
>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
>> two way communication. There's a lot of places where we re-use heartbeats
>> to send more information than what would be done if the PRC layer supported
>> these features. Some of this can be done in a compatible manner to the
>> existing RPC sub-system. Others like 2 way communication probably cannot.
>> After this, having HDFS/YARN actually make use of these changes. The other
>> consideration is adoption of an alternate system ike gRpc which would be
>> incompatible.
>> 2) Simplification of configs - potentially separating client side configs
>> and those used by daemons. This is another source of perpetual confusion
>> for users.
>> 
>> Thanks
>> - Sid
>> 
>> 
>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> 
>>> Sorry, outlook dequoted Alejandros's comments.
>>> 
>>> Let me try again with his comments in italic and proofreading of mine
>>> 
>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>>> stevel@hortonworks.com>> wrote:
>>> 
>>> 
>>> 
>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>>> 
>>> IMO, if part of the community wants to take on the responsibility and
>> work
>>> that takes to do a new major release, we should not discourage them from
>>> doing that.
>>> 
>>> Having multiple major branches active is a standard practice.
>>> 
>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>>> long time to get out, and during that time 0.21, 0.22, got released and
>>> ignored; 0.23 picked up and used in production.
>>> 
>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>> widely enough to be used in products, and changes were made between that
>>> alpha & 2.2 itself which raised compatibility issues.
>>> 
>>> For 3.x I'd propose
>>> 
>>> 
>>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>>> releases to shipping. Best effort, but not to the extent that it gets in
>>> the way. More succinctly: we will care more about seamless migration from
>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>> alpha/beta
>>> phase
>>> 
>>> As well as backwards compatibility, we need to think about Forwards
>>> compatibility, with the goal being:
>>> 
>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>> where y>=x  and is-release(x) and is-release(y)
>>> 
>>> That's important, as it means all server-side changes in 3.x which are
>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>> decoding, security features, must be considered complete and stable
>> before
>>> we can say is-release(x). In an ideal world, we'll even get the semantics
>>> right with tests to show this.
>>> 
>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
>> But:
>>> it's only one of the features, and given there's not any design doc on
>> that
>>> JIRA, way too immature to set a release schedule on. An alpha schedule
>> with
>>> no-guarantees and a regular alpha roll, could be viable, as new features
>> go
>>> in and can then be used to experimentally try this stuff in branches of
>>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>>> will be transitive downstream.
>>> 
>>> 
>>> This time around we are not replacing the guts as we did from Hadoop 1 to
>>> Hadoop 2, but superficial surgery to address issues were not considered
>> (or
>>> was too much to take on top of the guts transplant).
>>> 
>>> For the split brain concern, we did a great of job maintaining Hadoop 1
>> and
>>> Hadoop 2 until Hadoop 1 faded away.
>>> 
>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>> compatibility.
>>> 
>>> 
>>> Based on that experience I would say that the coexistence of Hadoop 2 and
>>> Hadoop 3 will be much less demanding/traumatic.
>>> 
>>> The re-layout of all the source trees was a major change there, assuming
>>> there's no refactoring or switch of build tools then picking things back
>>> will be tractable
>>> 
>>> 
>>> Also, to facilitate the coexistence we should limit Java language
>> features
>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>> anymore
>>> we can remove this limitation.
>>> 
>>> +1; setting javac.version will fix this
>>> 
>>> What is nice about having java 8 as the base JVM is that it means you can
>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>>> and libs can use all Java 8 features they want to.
>>> 
>>> There's one policy change to consider there which is possibly, just
>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>> languages early, provided everyone recognised that "backport to branch-2"
>>> isn't going to happen.
>>> 
>>> -Steve
>>> 
>>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
page. In addition to the two things I've been pushing, I also looked
through Allen's list (thanks Allen for making this) and picked out the
shell script rewrite and the removal of HFTP as big changes. This would be
the place to propose features for inclusion in 3.x, I'd particularly
appreciate help on the YARN/MR side.

Based on what I'm hearing, let me modulate my proposal to the following:

- We avoid cutting branch-3, and release off of trunk. The trunk-only
changes don't look that scary, so I think this is fine. This does mean we
need to be more rigorous before merging branches to trunk. I think
Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
be very helpful in this regard.
- We do not include anything to break wire compatibility unless (as Jason
says) it's an unbelievably awesome feature.
- No harm in rolling alphas from trunk, as it doesn't lock us to anything
compatibility wise. Downstreams like releases.

I'll take Steve's advice about not locking GA to a given date, but I also
share his belief that we can alpha/beta/GA faster than it took for Hadoop
2. Let's roll some intermediate releases, work on the roadmap items, and
see how we're feeling in a few months.

Best,
Andrew

On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:

> I think it'll be useful to have a discussion about what else people would
> like to see in Hadoop 3.x - especially if the change is potentially
> incompatible. Also, what we expect the release schedule to be for major
> releases and what triggers them - JVM version, major features, the need for
> incompatible changes ? Assuming major versions will not be released every 6
> months/1 year (adoption time, fairly disruptive for downstream projects,
> and users) -  considering additional features/incompatible changes for 3.x
> would be useful.
>
> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
>
> Thanks
> - Sid
>
>
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> > Sorry, outlook dequoted Alejandros's comments.
> >
> > Let me try again with his comments in italic and proofreading of mine
> >
> > On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> > stevel@hortonworks.com>> wrote:
> >
> >
> >
> > On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> > tucu00@gmail.com><ma...@gmail.com>> wrote:
> >
> > IMO, if part of the community wants to take on the responsibility and
> work
> > that takes to do a new major release, we should not discourage them from
> > doing that.
> >
> > Having multiple major branches active is a standard practice.
> >
> > Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> > long time to get out, and during that time 0.21, 0.22, got released and
> > ignored; 0.23 picked up and used in production.
> >
> > The 2.04-alpha release was more of a troublespot as it got picked up
> > widely enough to be used in products, and changes were made between that
> > alpha & 2.2 itself which raised compatibility issues.
> >
> > For 3.x I'd propose
> >
> >
> >   1.  Have less longevity of 3.x alpha/beta artifacts
> >   2.  Make clear there are no guarantees of compatibility from alpha/beta
> > releases to shipping. Best effort, but not to the extent that it gets in
> > the way. More succinctly: we will care more about seamless migration from
> > 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> > accept policy (2). Hadoop's "instability guarantee" for the 3.x
> alpha/beta
> > phase
> >
> > As well as backwards compatibility, we need to think about Forwards
> > compatibility, with the goal being:
> >
> > Any app written/shipped with the 3.x release binaries (JAR and native)
> > will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> > where y>=x  and is-release(x) and is-release(y)
> >
> > That's important, as it means all server-side changes in 3.x which are
> > expected to to mandate client-side updates: protocols, HDFS erasure
> > decoding, security features, must be considered complete and stable
> before
> > we can say is-release(x). In an ideal world, we'll even get the semantics
> > right with tests to show this.
> >
> > Fixing classpath hell downstream is certainly one feature I am +1 on.
> But:
> > it's only one of the features, and given there's not any design doc on
> that
> > JIRA, way too immature to set a release schedule on. An alpha schedule
> with
> > no-guarantees and a regular alpha roll, could be viable, as new features
> go
> > in and can then be used to experimentally try this stuff in branches of
> > Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> > will be transitive downstream.
> >
> >
> > This time around we are not replacing the guts as we did from Hadoop 1 to
> > Hadoop 2, but superficial surgery to address issues were not considered
> (or
> > was too much to take on top of the guts transplant).
> >
> > For the split brain concern, we did a great of job maintaining Hadoop 1
> and
> > Hadoop 2 until Hadoop 1 faded away.
> >
> > And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> > compatibility.
> >
> >
> > Based on that experience I would say that the coexistence of Hadoop 2 and
> > Hadoop 3 will be much less demanding/traumatic.
> >
> > The re-layout of all the source trees was a major change there, assuming
> > there's no refactoring or switch of build tools then picking things back
> > will be tractable
> >
> >
> > Also, to facilitate the coexistence we should limit Java language
> features
> > to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> anymore
> > we can remove this limitation.
> >
> > +1; setting javac.version will fix this
> >
> > What is nice about having java 8 as the base JVM is that it means you can
> > be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> > and libs can use all Java 8 features they want to.
> >
> > There's one policy change to consider there which is possibly, just
> > possibly, we could allow new modules in hadoop-tools to adopt Java 8
> > languages early, provided everyone recognised that "backport to branch-2"
> > isn't going to happen.
> >
> > -Steve
> >
> >
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
page. In addition to the two things I've been pushing, I also looked
through Allen's list (thanks Allen for making this) and picked out the
shell script rewrite and the removal of HFTP as big changes. This would be
the place to propose features for inclusion in 3.x, I'd particularly
appreciate help on the YARN/MR side.

Based on what I'm hearing, let me modulate my proposal to the following:

- We avoid cutting branch-3, and release off of trunk. The trunk-only
changes don't look that scary, so I think this is fine. This does mean we
need to be more rigorous before merging branches to trunk. I think
Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
be very helpful in this regard.
- We do not include anything to break wire compatibility unless (as Jason
says) it's an unbelievably awesome feature.
- No harm in rolling alphas from trunk, as it doesn't lock us to anything
compatibility wise. Downstreams like releases.

I'll take Steve's advice about not locking GA to a given date, but I also
share his belief that we can alpha/beta/GA faster than it took for Hadoop
2. Let's roll some intermediate releases, work on the roadmap items, and
see how we're feeling in a few months.

Best,
Andrew

On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:

> I think it'll be useful to have a discussion about what else people would
> like to see in Hadoop 3.x - especially if the change is potentially
> incompatible. Also, what we expect the release schedule to be for major
> releases and what triggers them - JVM version, major features, the need for
> incompatible changes ? Assuming major versions will not be released every 6
> months/1 year (adoption time, fairly disruptive for downstream projects,
> and users) -  considering additional features/incompatible changes for 3.x
> would be useful.
>
> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
>
> Thanks
> - Sid
>
>
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> > Sorry, outlook dequoted Alejandros's comments.
> >
> > Let me try again with his comments in italic and proofreading of mine
> >
> > On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> > stevel@hortonworks.com>> wrote:
> >
> >
> >
> > On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> > tucu00@gmail.com><ma...@gmail.com>> wrote:
> >
> > IMO, if part of the community wants to take on the responsibility and
> work
> > that takes to do a new major release, we should not discourage them from
> > doing that.
> >
> > Having multiple major branches active is a standard practice.
> >
> > Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> > long time to get out, and during that time 0.21, 0.22, got released and
> > ignored; 0.23 picked up and used in production.
> >
> > The 2.04-alpha release was more of a troublespot as it got picked up
> > widely enough to be used in products, and changes were made between that
> > alpha & 2.2 itself which raised compatibility issues.
> >
> > For 3.x I'd propose
> >
> >
> >   1.  Have less longevity of 3.x alpha/beta artifacts
> >   2.  Make clear there are no guarantees of compatibility from alpha/beta
> > releases to shipping. Best effort, but not to the extent that it gets in
> > the way. More succinctly: we will care more about seamless migration from
> > 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> > accept policy (2). Hadoop's "instability guarantee" for the 3.x
> alpha/beta
> > phase
> >
> > As well as backwards compatibility, we need to think about Forwards
> > compatibility, with the goal being:
> >
> > Any app written/shipped with the 3.x release binaries (JAR and native)
> > will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> > where y>=x  and is-release(x) and is-release(y)
> >
> > That's important, as it means all server-side changes in 3.x which are
> > expected to to mandate client-side updates: protocols, HDFS erasure
> > decoding, security features, must be considered complete and stable
> before
> > we can say is-release(x). In an ideal world, we'll even get the semantics
> > right with tests to show this.
> >
> > Fixing classpath hell downstream is certainly one feature I am +1 on.
> But:
> > it's only one of the features, and given there's not any design doc on
> that
> > JIRA, way too immature to set a release schedule on. An alpha schedule
> with
> > no-guarantees and a regular alpha roll, could be viable, as new features
> go
> > in and can then be used to experimentally try this stuff in branches of
> > Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> > will be transitive downstream.
> >
> >
> > This time around we are not replacing the guts as we did from Hadoop 1 to
> > Hadoop 2, but superficial surgery to address issues were not considered
> (or
> > was too much to take on top of the guts transplant).
> >
> > For the split brain concern, we did a great of job maintaining Hadoop 1
> and
> > Hadoop 2 until Hadoop 1 faded away.
> >
> > And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> > compatibility.
> >
> >
> > Based on that experience I would say that the coexistence of Hadoop 2 and
> > Hadoop 3 will be much less demanding/traumatic.
> >
> > The re-layout of all the source trees was a major change there, assuming
> > there's no refactoring or switch of build tools then picking things back
> > will be tractable
> >
> >
> > Also, to facilitate the coexistence we should limit Java language
> features
> > to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> anymore
> > we can remove this limitation.
> >
> > +1; setting javac.version will fix this
> >
> > What is nice about having java 8 as the base JVM is that it means you can
> > be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> > and libs can use all Java 8 features they want to.
> >
> > There's one policy change to consider there which is possibly, just
> > possibly, we could allow new modules in hadoop-tools to adopt Java 8
> > languages early, provided everyone recognised that "backport to branch-2"
> > isn't going to happen.
> >
> > -Steve
> >
> >
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Kai,

Sure, I'm open to it. It's a new major release, so we're allowed to make
these kinds of big changes. The idea behind the extended alpha cycle is
that downstreams can give us feedback. This way if we do anything too
radical, we can address it in the next alpha and have downstreams re-test.

Best,
Andrew

On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <ka...@intel.com> wrote:

> Thanks Andrew for driving this. Wonder if it's a good chance for
> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's
> not an incompatible change, but feel better to be done in the major release.
>
> Regards,
> Kai
>
> -----Original Message-----
> From: Andrew Wang [mailto:andrew.wang@cloudera.com]
> Sent: Friday, February 19, 2016 7:04 AM
> To: hdfs-dev@hadoop.apache.org; Kihwal Lee <ki...@yahoo-inc.com>
> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Hi Kihwal,
>
> I think there's still value in continuing the 2.x releases. 3.x comes with
> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> be beta or GA for some number of months. In the meanwhile, it'd be good to
> keep putting out regular, stable 2.x releases.
>
> Best,
> Andrew
>
>
> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
> wrote:
>
> > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > motivations, are we getting rid of branch-2.8?
> >
> > Kihwal
> >
> >       From: Andrew Wang <an...@cloudera.com>
> >  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> > Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> > mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> > hdfs-dev <hd...@hadoop.apache.org>
> >  Sent: Thursday, February 18, 2016 4:35 PM
> >  Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi all,
> >
> > Reviving this thread. I've seen renewed interest in a trunk release
> > since HDFS erasure coding has not yet made it to branch-2. Along with
> > JDK8, the shell script rewrite, and many other improvements, I think
> > it's time to revisit Hadoop 3.0 release plans.
> >
> > My overall plan is still the same as in my original email: a series of
> > regular alpha releases leading up to beta and GA. Alpha releases make
> > it easier for downstreams to integrate with our code, and making them
> > regular means features can be included when they are ready.
> >
> > I know there are some incompatible changes waiting in the wings (i.e.
> > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > If you have changes like this, please set the target version to 3.0.0
> > and mark them "Incompatible". We can use this JIRA query to track:
> >
> >
> > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
> >
> > There's some release-related stuff that needs to be sorted out
> > (namely, the new CHANGES.txt and release note generation from Yetus),
> > but I'd tentatively like to roll the first alpha a month out, so third
> > week of March.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> > > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > > source version to JDK8.
> > >
> > > Also, note that releasing from trunk is a way of achieving #3, it's
> > > not a way of abandoning it.
> > >
> > >
> > >
> > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang
> > > <an...@cloudera.com>
> > > wrote:
> > > > Hi Raymie,
> > > >
> > > > Konst proposed just releasing off of trunk rather than cutting a
> > > branch-2,
> > > > and there was general agreement there. So, consider #3 abandoned.
> > > > 1&2
> > can
> > > > be achieved at the same time, we just need to avoid using JDK8
> > > > language features in trunk so things can be backported.
> > > >
> > > > Best,
> > > > Andrew
> > > >
> > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata
> > > > <rs...@altiscale.com>
> > > wrote:
> > > >
> > > >> In this (and the related threads), I see the following three
> > > requirements:
> > > >>
> > > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > > >>
> > > >> 2. "We'll still be releasing 2.x releases for a while, with
> > > >> similar feature sets as 3.x."
> > > >>
> > > >> 3. Avoid the "risk of split-brain behavior" by "minimize
> > > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is
> already tedious.
> > > >> Adding a branch-3, branch-3.x would be obnoxious."
> > > >>
> > > >> These three cannot be achieved at the same time.  Which do we
> abandon?
> > > >>
> > > >>
> > > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
> > > >> <sa...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> > wrote:
> > > >> >>
> > > >> >> 2) Simplification of configs - potentially separating client
> > > >> >> side
> > > >> configs
> > > >> >> and those used by daemons. This is another source of perpetual
> > > confusion
> > > >> >> for users.
> > > >> > + 1 on this.
> > > >> >
> > > >> > sanjay
> > > >>
> > >
> >
> >
> >
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Kai,

Sure, I'm open to it. It's a new major release, so we're allowed to make
these kinds of big changes. The idea behind the extended alpha cycle is
that downstreams can give us feedback. This way if we do anything too
radical, we can address it in the next alpha and have downstreams re-test.

Best,
Andrew

On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <ka...@intel.com> wrote:

> Thanks Andrew for driving this. Wonder if it's a good chance for
> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's
> not an incompatible change, but feel better to be done in the major release.
>
> Regards,
> Kai
>
> -----Original Message-----
> From: Andrew Wang [mailto:andrew.wang@cloudera.com]
> Sent: Friday, February 19, 2016 7:04 AM
> To: hdfs-dev@hadoop.apache.org; Kihwal Lee <ki...@yahoo-inc.com>
> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Hi Kihwal,
>
> I think there's still value in continuing the 2.x releases. 3.x comes with
> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> be beta or GA for some number of months. In the meanwhile, it'd be good to
> keep putting out regular, stable 2.x releases.
>
> Best,
> Andrew
>
>
> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
> wrote:
>
> > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > motivations, are we getting rid of branch-2.8?
> >
> > Kihwal
> >
> >       From: Andrew Wang <an...@cloudera.com>
> >  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> > Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> > mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> > hdfs-dev <hd...@hadoop.apache.org>
> >  Sent: Thursday, February 18, 2016 4:35 PM
> >  Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi all,
> >
> > Reviving this thread. I've seen renewed interest in a trunk release
> > since HDFS erasure coding has not yet made it to branch-2. Along with
> > JDK8, the shell script rewrite, and many other improvements, I think
> > it's time to revisit Hadoop 3.0 release plans.
> >
> > My overall plan is still the same as in my original email: a series of
> > regular alpha releases leading up to beta and GA. Alpha releases make
> > it easier for downstreams to integrate with our code, and making them
> > regular means features can be included when they are ready.
> >
> > I know there are some incompatible changes waiting in the wings (i.e.
> > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > If you have changes like this, please set the target version to 3.0.0
> > and mark them "Incompatible". We can use this JIRA query to track:
> >
> >
> > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
> >
> > There's some release-related stuff that needs to be sorted out
> > (namely, the new CHANGES.txt and release note generation from Yetus),
> > but I'd tentatively like to roll the first alpha a month out, so third
> > week of March.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> > > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > > source version to JDK8.
> > >
> > > Also, note that releasing from trunk is a way of achieving #3, it's
> > > not a way of abandoning it.
> > >
> > >
> > >
> > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang
> > > <an...@cloudera.com>
> > > wrote:
> > > > Hi Raymie,
> > > >
> > > > Konst proposed just releasing off of trunk rather than cutting a
> > > branch-2,
> > > > and there was general agreement there. So, consider #3 abandoned.
> > > > 1&2
> > can
> > > > be achieved at the same time, we just need to avoid using JDK8
> > > > language features in trunk so things can be backported.
> > > >
> > > > Best,
> > > > Andrew
> > > >
> > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata
> > > > <rs...@altiscale.com>
> > > wrote:
> > > >
> > > >> In this (and the related threads), I see the following three
> > > requirements:
> > > >>
> > > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > > >>
> > > >> 2. "We'll still be releasing 2.x releases for a while, with
> > > >> similar feature sets as 3.x."
> > > >>
> > > >> 3. Avoid the "risk of split-brain behavior" by "minimize
> > > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is
> already tedious.
> > > >> Adding a branch-3, branch-3.x would be obnoxious."
> > > >>
> > > >> These three cannot be achieved at the same time.  Which do we
> abandon?
> > > >>
> > > >>
> > > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
> > > >> <sa...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> > wrote:
> > > >> >>
> > > >> >> 2) Simplification of configs - potentially separating client
> > > >> >> side
> > > >> configs
> > > >> >> and those used by daemons. This is another source of perpetual
> > > confusion
> > > >> >> for users.
> > > >> > + 1 on this.
> > > >> >
> > > >> > sanjay
> > > >>
> > >
> >
> >
> >
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Kai,

Sure, I'm open to it. It's a new major release, so we're allowed to make
these kinds of big changes. The idea behind the extended alpha cycle is
that downstreams can give us feedback. This way if we do anything too
radical, we can address it in the next alpha and have downstreams re-test.

Best,
Andrew

On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <ka...@intel.com> wrote:

> Thanks Andrew for driving this. Wonder if it's a good chance for
> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's
> not an incompatible change, but feel better to be done in the major release.
>
> Regards,
> Kai
>
> -----Original Message-----
> From: Andrew Wang [mailto:andrew.wang@cloudera.com]
> Sent: Friday, February 19, 2016 7:04 AM
> To: hdfs-dev@hadoop.apache.org; Kihwal Lee <ki...@yahoo-inc.com>
> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Hi Kihwal,
>
> I think there's still value in continuing the 2.x releases. 3.x comes with
> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> be beta or GA for some number of months. In the meanwhile, it'd be good to
> keep putting out regular, stable 2.x releases.
>
> Best,
> Andrew
>
>
> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
> wrote:
>
> > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > motivations, are we getting rid of branch-2.8?
> >
> > Kihwal
> >
> >       From: Andrew Wang <an...@cloudera.com>
> >  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> > Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> > mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> > hdfs-dev <hd...@hadoop.apache.org>
> >  Sent: Thursday, February 18, 2016 4:35 PM
> >  Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi all,
> >
> > Reviving this thread. I've seen renewed interest in a trunk release
> > since HDFS erasure coding has not yet made it to branch-2. Along with
> > JDK8, the shell script rewrite, and many other improvements, I think
> > it's time to revisit Hadoop 3.0 release plans.
> >
> > My overall plan is still the same as in my original email: a series of
> > regular alpha releases leading up to beta and GA. Alpha releases make
> > it easier for downstreams to integrate with our code, and making them
> > regular means features can be included when they are ready.
> >
> > I know there are some incompatible changes waiting in the wings (i.e.
> > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > If you have changes like this, please set the target version to 3.0.0
> > and mark them "Incompatible". We can use this JIRA query to track:
> >
> >
> > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
> >
> > There's some release-related stuff that needs to be sorted out
> > (namely, the new CHANGES.txt and release note generation from Yetus),
> > but I'd tentatively like to roll the first alpha a month out, so third
> > week of March.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> > > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > > source version to JDK8.
> > >
> > > Also, note that releasing from trunk is a way of achieving #3, it's
> > > not a way of abandoning it.
> > >
> > >
> > >
> > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang
> > > <an...@cloudera.com>
> > > wrote:
> > > > Hi Raymie,
> > > >
> > > > Konst proposed just releasing off of trunk rather than cutting a
> > > branch-2,
> > > > and there was general agreement there. So, consider #3 abandoned.
> > > > 1&2
> > can
> > > > be achieved at the same time, we just need to avoid using JDK8
> > > > language features in trunk so things can be backported.
> > > >
> > > > Best,
> > > > Andrew
> > > >
> > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata
> > > > <rs...@altiscale.com>
> > > wrote:
> > > >
> > > >> In this (and the related threads), I see the following three
> > > requirements:
> > > >>
> > > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > > >>
> > > >> 2. "We'll still be releasing 2.x releases for a while, with
> > > >> similar feature sets as 3.x."
> > > >>
> > > >> 3. Avoid the "risk of split-brain behavior" by "minimize
> > > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is
> already tedious.
> > > >> Adding a branch-3, branch-3.x would be obnoxious."
> > > >>
> > > >> These three cannot be achieved at the same time.  Which do we
> abandon?
> > > >>
> > > >>
> > > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
> > > >> <sa...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> > wrote:
> > > >> >>
> > > >> >> 2) Simplification of configs - potentially separating client
> > > >> >> side
> > > >> configs
> > > >> >> and those used by daemons. This is another source of perpetual
> > > confusion
> > > >> >> for users.
> > > >> > + 1 on this.
> > > >> >
> > > >> > sanjay
> > > >>
> > >
> >
> >
> >
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Kai,

Sure, I'm open to it. It's a new major release, so we're allowed to make
these kinds of big changes. The idea behind the extended alpha cycle is
that downstreams can give us feedback. This way if we do anything too
radical, we can address it in the next alpha and have downstreams re-test.

Best,
Andrew

On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <ka...@intel.com> wrote:

> Thanks Andrew for driving this. Wonder if it's a good chance for
> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's
> not an incompatible change, but feel better to be done in the major release.
>
> Regards,
> Kai
>
> -----Original Message-----
> From: Andrew Wang [mailto:andrew.wang@cloudera.com]
> Sent: Friday, February 19, 2016 7:04 AM
> To: hdfs-dev@hadoop.apache.org; Kihwal Lee <ki...@yahoo-inc.com>
> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Hi Kihwal,
>
> I think there's still value in continuing the 2.x releases. 3.x comes with
> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> be beta or GA for some number of months. In the meanwhile, it'd be good to
> keep putting out regular, stable 2.x releases.
>
> Best,
> Andrew
>
>
> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
> wrote:
>
> > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > motivations, are we getting rid of branch-2.8?
> >
> > Kihwal
> >
> >       From: Andrew Wang <an...@cloudera.com>
> >  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> > Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> > mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> > hdfs-dev <hd...@hadoop.apache.org>
> >  Sent: Thursday, February 18, 2016 4:35 PM
> >  Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi all,
> >
> > Reviving this thread. I've seen renewed interest in a trunk release
> > since HDFS erasure coding has not yet made it to branch-2. Along with
> > JDK8, the shell script rewrite, and many other improvements, I think
> > it's time to revisit Hadoop 3.0 release plans.
> >
> > My overall plan is still the same as in my original email: a series of
> > regular alpha releases leading up to beta and GA. Alpha releases make
> > it easier for downstreams to integrate with our code, and making them
> > regular means features can be included when they are ready.
> >
> > I know there are some incompatible changes waiting in the wings (i.e.
> > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > If you have changes like this, please set the target version to 3.0.0
> > and mark them "Incompatible". We can use this JIRA query to track:
> >
> >
> > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
> >
> > There's some release-related stuff that needs to be sorted out
> > (namely, the new CHANGES.txt and release note generation from Yetus),
> > but I'd tentatively like to roll the first alpha a month out, so third
> > week of March.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> > > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > > source version to JDK8.
> > >
> > > Also, note that releasing from trunk is a way of achieving #3, it's
> > > not a way of abandoning it.
> > >
> > >
> > >
> > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang
> > > <an...@cloudera.com>
> > > wrote:
> > > > Hi Raymie,
> > > >
> > > > Konst proposed just releasing off of trunk rather than cutting a
> > > branch-2,
> > > > and there was general agreement there. So, consider #3 abandoned.
> > > > 1&2
> > can
> > > > be achieved at the same time, we just need to avoid using JDK8
> > > > language features in trunk so things can be backported.
> > > >
> > > > Best,
> > > > Andrew
> > > >
> > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata
> > > > <rs...@altiscale.com>
> > > wrote:
> > > >
> > > >> In this (and the related threads), I see the following three
> > > requirements:
> > > >>
> > > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > > >>
> > > >> 2. "We'll still be releasing 2.x releases for a while, with
> > > >> similar feature sets as 3.x."
> > > >>
> > > >> 3. Avoid the "risk of split-brain behavior" by "minimize
> > > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is
> already tedious.
> > > >> Adding a branch-3, branch-3.x would be obnoxious."
> > > >>
> > > >> These three cannot be achieved at the same time.  Which do we
> abandon?
> > > >>
> > > >>
> > > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
> > > >> <sa...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> > wrote:
> > > >> >>
> > > >> >> 2) Simplification of configs - potentially separating client
> > > >> >> side
> > > >> configs
> > > >> >> and those used by daemons. This is another source of perpetual
> > > confusion
> > > >> >> for users.
> > > >> > + 1 on this.
> > > >> >
> > > >> > sanjay
> > > >>
> > >
> >
> >
> >
>

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's not an incompatible change, but feel better to be done in the major release.

Regards,
Kai

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Friday, February 19, 2016 7:04 AM
To: hdfs-dev@hadoop.apache.org; Kihwal Lee <ki...@yahoo-inc.com>
Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta or GA for some number of months. In the meanwhile, it'd be good to keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main 
> motivations, are we getting rid of branch-2.8?
>
> Kihwal
>
>       From: Andrew Wang <an...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> hdfs-dev <hd...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release 
> since HDFS erasure coding has not yet made it to branch-2. Along with 
> JDK8, the shell script rewrite, and many other improvements, I think 
> it's time to revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of 
> regular alpha releases leading up to beta and GA. Alpha releases make 
> it easier for downstreams to integrate with our code, and making them 
> regular means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings (i.e. 
> HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. 
> If you have changes like this, please set the target version to 3.0.0 
> and mark them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out 
> (namely, the new CHANGES.txt and release note generation from Yetus), 
> but I'd tentatively like to roll the first alpha a month out, so third 
> week of March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs) 
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK 
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's 
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> > <an...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 
> > > 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 
> > > language features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> > > <rs...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with 
> > >> similar feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize 
> > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> > >> <sa...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client 
> > >> >> side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's not an incompatible change, but feel better to be done in the major release.

Regards,
Kai

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Friday, February 19, 2016 7:04 AM
To: hdfs-dev@hadoop.apache.org; Kihwal Lee <ki...@yahoo-inc.com>
Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta or GA for some number of months. In the meanwhile, it'd be good to keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main 
> motivations, are we getting rid of branch-2.8?
>
> Kihwal
>
>       From: Andrew Wang <an...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> hdfs-dev <hd...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release 
> since HDFS erasure coding has not yet made it to branch-2. Along with 
> JDK8, the shell script rewrite, and many other improvements, I think 
> it's time to revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of 
> regular alpha releases leading up to beta and GA. Alpha releases make 
> it easier for downstreams to integrate with our code, and making them 
> regular means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings (i.e. 
> HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. 
> If you have changes like this, please set the target version to 3.0.0 
> and mark them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out 
> (namely, the new CHANGES.txt and release note generation from Yetus), 
> but I'd tentatively like to roll the first alpha a month out, so third 
> week of March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs) 
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK 
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's 
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> > <an...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 
> > > 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 
> > > language features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> > > <rs...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with 
> > >> similar feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize 
> > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> > >> <sa...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client 
> > >> >> side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's not an incompatible change, but feel better to be done in the major release.

Regards,
Kai

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Friday, February 19, 2016 7:04 AM
To: hdfs-dev@hadoop.apache.org; Kihwal Lee <ki...@yahoo-inc.com>
Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta or GA for some number of months. In the meanwhile, it'd be good to keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main 
> motivations, are we getting rid of branch-2.8?
>
> Kihwal
>
>       From: Andrew Wang <an...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> hdfs-dev <hd...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release 
> since HDFS erasure coding has not yet made it to branch-2. Along with 
> JDK8, the shell script rewrite, and many other improvements, I think 
> it's time to revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of 
> regular alpha releases leading up to beta and GA. Alpha releases make 
> it easier for downstreams to integrate with our code, and making them 
> regular means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings (i.e. 
> HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. 
> If you have changes like this, please set the target version to 3.0.0 
> and mark them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out 
> (namely, the new CHANGES.txt and release note generation from Yetus), 
> but I'd tentatively like to roll the first alpha a month out, so third 
> week of March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs) 
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK 
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's 
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> > <an...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 
> > > 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 
> > > language features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> > > <rs...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with 
> > >> similar feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize 
> > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> > >> <sa...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client 
> > >> >> side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

Re: Looking to a Hadoop 3 release

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

+1 for the 3.0 release plan and continuing 2.x releases.
I'm thinking we should consider stopping new 2.x minor releases after 
3.x reaches GA.

Thanks,
Akira

On 2/19/16 10:33, Gangumalla, Uma wrote:
> Yes. I think starting 3.0 release with alpha is good idea. So it would get
> some time to reach the beta or GA.
>
> +1 for the plan.
>
> For the compatibility purposes and as current stable versions, we should
> continue 2.x releases anyway.
>
> Thanks Andrew for starting the thread.
>
> Regards,
> Uma
>
> On 2/18/16, 3:04 PM, "Andrew Wang" <an...@cloudera.com> wrote:
>
>> Hi Kihwal,
>>
>> I think there's still value in continuing the 2.x releases. 3.x comes with
>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>> be beta or GA for some number of months. In the meanwhile, it'd be good to
>> keep putting out regular, stable 2.x releases.
>>
>> Best,
>> Andrew
>>
>>
>> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
>> wrote:
>>
>>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>> motivations,
>>> are we getting rid of branch-2.8?
>>>
>>> Kihwal
>>>
>>>        From: Andrew Wang <an...@cloudera.com>
>>>   To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
>>> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
>>> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
>>> hdfs-dev <hd...@hadoop.apache.org>
>>>   Sent: Thursday, February 18, 2016 4:35 PM
>>>   Subject: Re: Looking to a Hadoop 3 release
>>>
>>> Hi all,
>>>
>>> Reviving this thread. I've seen renewed interest in a trunk release
>>> since
>>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
>>> the
>>> shell script rewrite, and many other improvements, I think it's time to
>>> revisit Hadoop 3.0 release plans.
>>>
>>> My overall plan is still the same as in my original email: a series of
>>> regular alpha releases leading up to beta and GA. Alpha releases make it
>>> easier for downstreams to integrate with our code, and making them
>>> regular
>>> means features can be included when they are ready.
>>>
>>> I know there are some incompatible changes waiting in the wings
>>> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
>>> HADOOP-9991 bumping dependency versions) that would be good to get in.
>>> If
>>> you have changes like this, please set the target version to 3.0.0 and
>>> mark
>>> them "Incompatible". We can use this JIRA query to track:
>>>
>>>
>>>
>>> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
>>> FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
>>> 223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
>>> s%22%3D%22Incompatible%20change%22%20order%20by%20priority
>>>
>>> There's some release-related stuff that needs to be sorted out (namely,
>>> the
>>> new CHANGES.txt and release note generation from Yetus), but I'd
>>> tentatively like to roll the first alpha a month out, so third week of
>>> March.
>>>
>>> Best,
>>> Andrew
>>>
>>> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
>>> wrote:
>>>
>>>> Avoiding the use of JDK8 language features (and, presumably, APIs)
>>>> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
>>>> source version to JDK8.
>>>>
>>>> Also, note that releasing from trunk is a way of achieving #3, it's
>>>> not a way of abandoning it.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
>>>> wrote:
>>>>> Hi Raymie,
>>>>>
>>>>> Konst proposed just releasing off of trunk rather than cutting a
>>>> branch-2,
>>>>> and there was general agreement there. So, consider #3 abandoned.
>>> 1&2
>>> can
>>>>> be achieved at the same time, we just need to avoid using JDK8
>>> language
>>>>> features in trunk so things can be backported.
>>>>>
>>>>> Best,
>>>>> Andrew
>>>>>
>>>>> On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
>>>> wrote:
>>>>>
>>>>>> In this (and the related threads), I see the following three
>>>> requirements:
>>>>>>
>>>>>> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>>>>>>
>>>>>> 2. "We'll still be releasing 2.x releases for a while, with similar
>>>>>> feature sets as 3.x."
>>>>>>
>>>>>> 3. Avoid the "risk of split-brain behavior" by "minimize
>>> backporting
>>>>>> headaches. Pulling trunk > branch-2 > branch-2.x is already
>>> tedious.
>>>>>> Adding a branch-3, branch-3.x would be obnoxious."
>>>>>>
>>>>>> These three cannot be achieved at the same time.  Which do we
>>> abandon?
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
>>> <sa...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
>>> wrote:
>>>>>>>>
>>>>>>>> 2) Simplification of configs - potentially separating client
>>> side
>>>>>> configs
>>>>>>>> and those used by daemons. This is another source of perpetual
>>>> confusion
>>>>>>>> for users.
>>>>>>> + 1 on this.
>>>>>>>
>>>>>>> sanjay
>>>>>>
>>>>
>>>
>>>
>>>
>

Re: Looking to a Hadoop 3 release

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

+1 for the 3.0 release plan and continuing 2.x releases.
I'm thinking we should consider stopping new 2.x minor releases after 
3.x reaches GA.

Thanks,
Akira

On 2/19/16 10:33, Gangumalla, Uma wrote:
> Yes. I think starting 3.0 release with alpha is good idea. So it would get
> some time to reach the beta or GA.
>
> +1 for the plan.
>
> For the compatibility purposes and as current stable versions, we should
> continue 2.x releases anyway.
>
> Thanks Andrew for starting the thread.
>
> Regards,
> Uma
>
> On 2/18/16, 3:04 PM, "Andrew Wang" <an...@cloudera.com> wrote:
>
>> Hi Kihwal,
>>
>> I think there's still value in continuing the 2.x releases. 3.x comes with
>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>> be beta or GA for some number of months. In the meanwhile, it'd be good to
>> keep putting out regular, stable 2.x releases.
>>
>> Best,
>> Andrew
>>
>>
>> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
>> wrote:
>>
>>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>> motivations,
>>> are we getting rid of branch-2.8?
>>>
>>> Kihwal
>>>
>>>        From: Andrew Wang <an...@cloudera.com>
>>>   To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
>>> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
>>> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
>>> hdfs-dev <hd...@hadoop.apache.org>
>>>   Sent: Thursday, February 18, 2016 4:35 PM
>>>   Subject: Re: Looking to a Hadoop 3 release
>>>
>>> Hi all,
>>>
>>> Reviving this thread. I've seen renewed interest in a trunk release
>>> since
>>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
>>> the
>>> shell script rewrite, and many other improvements, I think it's time to
>>> revisit Hadoop 3.0 release plans.
>>>
>>> My overall plan is still the same as in my original email: a series of
>>> regular alpha releases leading up to beta and GA. Alpha releases make it
>>> easier for downstreams to integrate with our code, and making them
>>> regular
>>> means features can be included when they are ready.
>>>
>>> I know there are some incompatible changes waiting in the wings
>>> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
>>> HADOOP-9991 bumping dependency versions) that would be good to get in.
>>> If
>>> you have changes like this, please set the target version to 3.0.0 and
>>> mark
>>> them "Incompatible". We can use this JIRA query to track:
>>>
>>>
>>>
>>> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
>>> FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
>>> 223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
>>> s%22%3D%22Incompatible%20change%22%20order%20by%20priority
>>>
>>> There's some release-related stuff that needs to be sorted out (namely,
>>> the
>>> new CHANGES.txt and release note generation from Yetus), but I'd
>>> tentatively like to roll the first alpha a month out, so third week of
>>> March.
>>>
>>> Best,
>>> Andrew
>>>
>>> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
>>> wrote:
>>>
>>>> Avoiding the use of JDK8 language features (and, presumably, APIs)
>>>> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
>>>> source version to JDK8.
>>>>
>>>> Also, note that releasing from trunk is a way of achieving #3, it's
>>>> not a way of abandoning it.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
>>>> wrote:
>>>>> Hi Raymie,
>>>>>
>>>>> Konst proposed just releasing off of trunk rather than cutting a
>>>> branch-2,
>>>>> and there was general agreement there. So, consider #3 abandoned.
>>> 1&2
>>> can
>>>>> be achieved at the same time, we just need to avoid using JDK8
>>> language
>>>>> features in trunk so things can be backported.
>>>>>
>>>>> Best,
>>>>> Andrew
>>>>>
>>>>> On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
>>>> wrote:
>>>>>
>>>>>> In this (and the related threads), I see the following three
>>>> requirements:
>>>>>>
>>>>>> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>>>>>>
>>>>>> 2. "We'll still be releasing 2.x releases for a while, with similar
>>>>>> feature sets as 3.x."
>>>>>>
>>>>>> 3. Avoid the "risk of split-brain behavior" by "minimize
>>> backporting
>>>>>> headaches. Pulling trunk > branch-2 > branch-2.x is already
>>> tedious.
>>>>>> Adding a branch-3, branch-3.x would be obnoxious."
>>>>>>
>>>>>> These three cannot be achieved at the same time.  Which do we
>>> abandon?
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
>>> <sa...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
>>> wrote:
>>>>>>>>
>>>>>>>> 2) Simplification of configs - potentially separating client
>>> side
>>>>>> configs
>>>>>>>> and those used by daemons. This is another source of perpetual
>>>> confusion
>>>>>>>> for users.
>>>>>>> + 1 on this.
>>>>>>>
>>>>>>> sanjay
>>>>>>
>>>>
>>>
>>>
>>>
>

Re: Looking to a Hadoop 3 release

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

+1 for the 3.0 release plan and continuing 2.x releases.
I'm thinking we should consider stopping new 2.x minor releases after 
3.x reaches GA.

Thanks,
Akira

On 2/19/16 10:33, Gangumalla, Uma wrote:
> Yes. I think starting 3.0 release with alpha is good idea. So it would get
> some time to reach the beta or GA.
>
> +1 for the plan.
>
> For the compatibility purposes and as current stable versions, we should
> continue 2.x releases anyway.
>
> Thanks Andrew for starting the thread.
>
> Regards,
> Uma
>
> On 2/18/16, 3:04 PM, "Andrew Wang" <an...@cloudera.com> wrote:
>
>> Hi Kihwal,
>>
>> I think there's still value in continuing the 2.x releases. 3.x comes with
>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>> be beta or GA for some number of months. In the meanwhile, it'd be good to
>> keep putting out regular, stable 2.x releases.
>>
>> Best,
>> Andrew
>>
>>
>> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
>> wrote:
>>
>>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>> motivations,
>>> are we getting rid of branch-2.8?
>>>
>>> Kihwal
>>>
>>>        From: Andrew Wang <an...@cloudera.com>
>>>   To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
>>> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
>>> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
>>> hdfs-dev <hd...@hadoop.apache.org>
>>>   Sent: Thursday, February 18, 2016 4:35 PM
>>>   Subject: Re: Looking to a Hadoop 3 release
>>>
>>> Hi all,
>>>
>>> Reviving this thread. I've seen renewed interest in a trunk release
>>> since
>>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
>>> the
>>> shell script rewrite, and many other improvements, I think it's time to
>>> revisit Hadoop 3.0 release plans.
>>>
>>> My overall plan is still the same as in my original email: a series of
>>> regular alpha releases leading up to beta and GA. Alpha releases make it
>>> easier for downstreams to integrate with our code, and making them
>>> regular
>>> means features can be included when they are ready.
>>>
>>> I know there are some incompatible changes waiting in the wings
>>> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
>>> HADOOP-9991 bumping dependency versions) that would be good to get in.
>>> If
>>> you have changes like this, please set the target version to 3.0.0 and
>>> mark
>>> them "Incompatible". We can use this JIRA query to track:
>>>
>>>
>>>
>>> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
>>> FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
>>> 223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
>>> s%22%3D%22Incompatible%20change%22%20order%20by%20priority
>>>
>>> There's some release-related stuff that needs to be sorted out (namely,
>>> the
>>> new CHANGES.txt and release note generation from Yetus), but I'd
>>> tentatively like to roll the first alpha a month out, so third week of
>>> March.
>>>
>>> Best,
>>> Andrew
>>>
>>> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
>>> wrote:
>>>
>>>> Avoiding the use of JDK8 language features (and, presumably, APIs)
>>>> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
>>>> source version to JDK8.
>>>>
>>>> Also, note that releasing from trunk is a way of achieving #3, it's
>>>> not a way of abandoning it.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
>>>> wrote:
>>>>> Hi Raymie,
>>>>>
>>>>> Konst proposed just releasing off of trunk rather than cutting a
>>>> branch-2,
>>>>> and there was general agreement there. So, consider #3 abandoned.
>>> 1&2
>>> can
>>>>> be achieved at the same time, we just need to avoid using JDK8
>>> language
>>>>> features in trunk so things can be backported.
>>>>>
>>>>> Best,
>>>>> Andrew
>>>>>
>>>>> On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
>>>> wrote:
>>>>>
>>>>>> In this (and the related threads), I see the following three
>>>> requirements:
>>>>>>
>>>>>> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>>>>>>
>>>>>> 2. "We'll still be releasing 2.x releases for a while, with similar
>>>>>> feature sets as 3.x."
>>>>>>
>>>>>> 3. Avoid the "risk of split-brain behavior" by "minimize
>>> backporting
>>>>>> headaches. Pulling trunk > branch-2 > branch-2.x is already
>>> tedious.
>>>>>> Adding a branch-3, branch-3.x would be obnoxious."
>>>>>>
>>>>>> These three cannot be achieved at the same time.  Which do we
>>> abandon?
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
>>> <sa...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
>>> wrote:
>>>>>>>>
>>>>>>>> 2) Simplification of configs - potentially separating client
>>> side
>>>>>> configs
>>>>>>>> and those used by daemons. This is another source of perpetual
>>>> confusion
>>>>>>>> for users.
>>>>>>> + 1 on this.
>>>>>>>
>>>>>>> sanjay
>>>>>>
>>>>
>>>
>>>
>>>
>

Re: Looking to a Hadoop 3 release

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

+1 for the 3.0 release plan and continuing 2.x releases.
I'm thinking we should consider stopping new 2.x minor releases after 
3.x reaches GA.

Thanks,
Akira

On 2/19/16 10:33, Gangumalla, Uma wrote:
> Yes. I think starting 3.0 release with alpha is good idea. So it would get
> some time to reach the beta or GA.
>
> +1 for the plan.
>
> For the compatibility purposes and as current stable versions, we should
> continue 2.x releases anyway.
>
> Thanks Andrew for starting the thread.
>
> Regards,
> Uma
>
> On 2/18/16, 3:04 PM, "Andrew Wang" <an...@cloudera.com> wrote:
>
>> Hi Kihwal,
>>
>> I think there's still value in continuing the 2.x releases. 3.x comes with
>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>> be beta or GA for some number of months. In the meanwhile, it'd be good to
>> keep putting out regular, stable 2.x releases.
>>
>> Best,
>> Andrew
>>
>>
>> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
>> wrote:
>>
>>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>> motivations,
>>> are we getting rid of branch-2.8?
>>>
>>> Kihwal
>>>
>>>        From: Andrew Wang <an...@cloudera.com>
>>>   To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
>>> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
>>> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
>>> hdfs-dev <hd...@hadoop.apache.org>
>>>   Sent: Thursday, February 18, 2016 4:35 PM
>>>   Subject: Re: Looking to a Hadoop 3 release
>>>
>>> Hi all,
>>>
>>> Reviving this thread. I've seen renewed interest in a trunk release
>>> since
>>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
>>> the
>>> shell script rewrite, and many other improvements, I think it's time to
>>> revisit Hadoop 3.0 release plans.
>>>
>>> My overall plan is still the same as in my original email: a series of
>>> regular alpha releases leading up to beta and GA. Alpha releases make it
>>> easier for downstreams to integrate with our code, and making them
>>> regular
>>> means features can be included when they are ready.
>>>
>>> I know there are some incompatible changes waiting in the wings
>>> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
>>> HADOOP-9991 bumping dependency versions) that would be good to get in.
>>> If
>>> you have changes like this, please set the target version to 3.0.0 and
>>> mark
>>> them "Incompatible". We can use this JIRA query to track:
>>>
>>>
>>>
>>> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
>>> FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
>>> 223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
>>> s%22%3D%22Incompatible%20change%22%20order%20by%20priority
>>>
>>> There's some release-related stuff that needs to be sorted out (namely,
>>> the
>>> new CHANGES.txt and release note generation from Yetus), but I'd
>>> tentatively like to roll the first alpha a month out, so third week of
>>> March.
>>>
>>> Best,
>>> Andrew
>>>
>>> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
>>> wrote:
>>>
>>>> Avoiding the use of JDK8 language features (and, presumably, APIs)
>>>> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
>>>> source version to JDK8.
>>>>
>>>> Also, note that releasing from trunk is a way of achieving #3, it's
>>>> not a way of abandoning it.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
>>>> wrote:
>>>>> Hi Raymie,
>>>>>
>>>>> Konst proposed just releasing off of trunk rather than cutting a
>>>> branch-2,
>>>>> and there was general agreement there. So, consider #3 abandoned.
>>> 1&2
>>> can
>>>>> be achieved at the same time, we just need to avoid using JDK8
>>> language
>>>>> features in trunk so things can be backported.
>>>>>
>>>>> Best,
>>>>> Andrew
>>>>>
>>>>> On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
>>>> wrote:
>>>>>
>>>>>> In this (and the related threads), I see the following three
>>>> requirements:
>>>>>>
>>>>>> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>>>>>>
>>>>>> 2. "We'll still be releasing 2.x releases for a while, with similar
>>>>>> feature sets as 3.x."
>>>>>>
>>>>>> 3. Avoid the "risk of split-brain behavior" by "minimize
>>> backporting
>>>>>> headaches. Pulling trunk > branch-2 > branch-2.x is already
>>> tedious.
>>>>>> Adding a branch-3, branch-3.x would be obnoxious."
>>>>>>
>>>>>> These three cannot be achieved at the same time.  Which do we
>>> abandon?
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
>>> <sa...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
>>> wrote:
>>>>>>>>
>>>>>>>> 2) Simplification of configs - potentially separating client
>>> side
>>>>>> configs
>>>>>>>> and those used by daemons. This is another source of perpetual
>>>> confusion
>>>>>>>> for users.
>>>>>>> + 1 on this.
>>>>>>>
>>>>>>> sanjay
>>>>>>
>>>>
>>>
>>>
>>>
>

Re: Looking to a Hadoop 3 release

Posted by "Gangumalla, Uma" <um...@intel.com>.

Yes. I think starting 3.0 release with alpha is good idea. So it would get
some time to reach the beta or GA.

+1 for the plan.

For the compatibility purposes and as current stable versions, we should
continue 2.x releases anyway.

Thanks Andrew for starting the thread.

Regards,
Uma

On 2/18/16, 3:04 PM, "Andrew Wang" <an...@cloudera.com> wrote:

>Hi Kihwal,
>
>I think there's still value in continuing the 2.x releases. 3.x comes with
>the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>be beta or GA for some number of months. In the meanwhile, it'd be good to
>keep putting out regular, stable 2.x releases.
>
>Best,
>Andrew
>
>
>On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
>wrote:
>
>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>motivations,
>> are we getting rid of branch-2.8?
>>
>> Kihwal
>>
>>       From: Andrew Wang <an...@cloudera.com>
>>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
>> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
>> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
>> hdfs-dev <hd...@hadoop.apache.org>
>>  Sent: Thursday, February 18, 2016 4:35 PM
>>  Subject: Re: Looking to a Hadoop 3 release
>>
>> Hi all,
>>
>> Reviving this thread. I've seen renewed interest in a trunk release
>>since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
>>the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>>
>> My overall plan is still the same as in my original email: a series of
>> regular alpha releases leading up to beta and GA. Alpha releases make it
>> easier for downstreams to integrate with our code, and making them
>>regular
>> means features can be included when they are ready.
>>
>> I know there are some incompatible changes waiting in the wings
>> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
>> HADOOP-9991 bumping dependency versions) that would be good to get in.
>>If
>> you have changes like this, please set the target version to 3.0.0 and
>>mark
>> them "Incompatible". We can use this JIRA query to track:
>>
>>
>> 
>>https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
>>FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
>>223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
>>s%22%3D%22Incompatible%20change%22%20order%20by%20priority
>>
>> There's some release-related stuff that needs to be sorted out (namely,
>>the
>> new CHANGES.txt and release note generation from Yetus), but I'd
>> tentatively like to roll the first alpha a month out, so third week of
>> March.
>>
>> Best,
>> Andrew
>>
>> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
>>wrote:
>>
>> > Avoiding the use of JDK8 language features (and, presumably, APIs)
>> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
>> > source version to JDK8.
>> >
>> > Also, note that releasing from trunk is a way of achieving #3, it's
>> > not a way of abandoning it.
>> >
>> >
>> >
>> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
>> > wrote:
>> > > Hi Raymie,
>> > >
>> > > Konst proposed just releasing off of trunk rather than cutting a
>> > branch-2,
>> > > and there was general agreement there. So, consider #3 abandoned.
>>1&2
>> can
>> > > be achieved at the same time, we just need to avoid using JDK8
>>language
>> > > features in trunk so things can be backported.
>> > >
>> > > Best,
>> > > Andrew
>> > >
>> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
>> > wrote:
>> > >
>> > >> In this (and the related threads), I see the following three
>> > requirements:
>> > >>
>> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>> > >>
>> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
>> > >> feature sets as 3.x."
>> > >>
>> > >> 3. Avoid the "risk of split-brain behavior" by "minimize
>>backporting
>> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already
>>tedious.
>> > >> Adding a branch-3, branch-3.x would be obnoxious."
>> > >>
>> > >> These three cannot be achieved at the same time.  Which do we
>>abandon?
>> > >>
>> > >>
>> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
>><sa...@gmail.com>
>> > >> wrote:
>> > >> >
>> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
>> wrote:
>> > >> >>
>> > >> >> 2) Simplification of configs - potentially separating client
>>side
>> > >> configs
>> > >> >> and those used by daemons. This is another source of perpetual
>> > confusion
>> > >> >> for users.
>> > >> > + 1 on this.
>> > >> >
>> > >> > sanjay
>> > >>
>> >
>>
>>
>>

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's not an incompatible change, but feel better to be done in the major release.

Regards,
Kai

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Friday, February 19, 2016 7:04 AM
To: hdfs-dev@hadoop.apache.org; Kihwal Lee <ki...@yahoo-inc.com>
Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta or GA for some number of months. In the meanwhile, it'd be good to keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main 
> motivations, are we getting rid of branch-2.8?
>
> Kihwal
>
>       From: Andrew Wang <an...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> hdfs-dev <hd...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release 
> since HDFS erasure coding has not yet made it to branch-2. Along with 
> JDK8, the shell script rewrite, and many other improvements, I think 
> it's time to revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of 
> regular alpha releases leading up to beta and GA. Alpha releases make 
> it easier for downstreams to integrate with our code, and making them 
> regular means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings (i.e. 
> HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. 
> If you have changes like this, please set the target version to 3.0.0 
> and mark them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out 
> (namely, the new CHANGES.txt and release note generation from Yetus), 
> but I'd tentatively like to roll the first alpha a month out, so third 
> week of March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs) 
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK 
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's 
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> > <an...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 
> > > 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 
> > > language features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> > > <rs...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with 
> > >> similar feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize 
> > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> > >> <sa...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client 
> > >> >> side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

Re: Looking to a Hadoop 3 release

Posted by "Gangumalla, Uma" <um...@intel.com>.

Yes. I think starting 3.0 release with alpha is good idea. So it would get
some time to reach the beta or GA.

+1 for the plan.

For the compatibility purposes and as current stable versions, we should
continue 2.x releases anyway.

Thanks Andrew for starting the thread.

Regards,
Uma

On 2/18/16, 3:04 PM, "Andrew Wang" <an...@cloudera.com> wrote:

>Hi Kihwal,
>
>I think there's still value in continuing the 2.x releases. 3.x comes with
>the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>be beta or GA for some number of months. In the meanwhile, it'd be good to
>keep putting out regular, stable 2.x releases.
>
>Best,
>Andrew
>
>
>On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
>wrote:
>
>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>motivations,
>> are we getting rid of branch-2.8?
>>
>> Kihwal
>>
>>       From: Andrew Wang <an...@cloudera.com>
>>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
>> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
>> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
>> hdfs-dev <hd...@hadoop.apache.org>
>>  Sent: Thursday, February 18, 2016 4:35 PM
>>  Subject: Re: Looking to a Hadoop 3 release
>>
>> Hi all,
>>
>> Reviving this thread. I've seen renewed interest in a trunk release
>>since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
>>the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>>
>> My overall plan is still the same as in my original email: a series of
>> regular alpha releases leading up to beta and GA. Alpha releases make it
>> easier for downstreams to integrate with our code, and making them
>>regular
>> means features can be included when they are ready.
>>
>> I know there are some incompatible changes waiting in the wings
>> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
>> HADOOP-9991 bumping dependency versions) that would be good to get in.
>>If
>> you have changes like this, please set the target version to 3.0.0 and
>>mark
>> them "Incompatible". We can use this JIRA query to track:
>>
>>
>> 
>>https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
>>FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
>>223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
>>s%22%3D%22Incompatible%20change%22%20order%20by%20priority
>>
>> There's some release-related stuff that needs to be sorted out (namely,
>>the
>> new CHANGES.txt and release note generation from Yetus), but I'd
>> tentatively like to roll the first alpha a month out, so third week of
>> March.
>>
>> Best,
>> Andrew
>>
>> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com>
>>wrote:
>>
>> > Avoiding the use of JDK8 language features (and, presumably, APIs)
>> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
>> > source version to JDK8.
>> >
>> > Also, note that releasing from trunk is a way of achieving #3, it's
>> > not a way of abandoning it.
>> >
>> >
>> >
>> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
>> > wrote:
>> > > Hi Raymie,
>> > >
>> > > Konst proposed just releasing off of trunk rather than cutting a
>> > branch-2,
>> > > and there was general agreement there. So, consider #3 abandoned.
>>1&2
>> can
>> > > be achieved at the same time, we just need to avoid using JDK8
>>language
>> > > features in trunk so things can be backported.
>> > >
>> > > Best,
>> > > Andrew
>> > >
>> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
>> > wrote:
>> > >
>> > >> In this (and the related threads), I see the following three
>> > requirements:
>> > >>
>> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>> > >>
>> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
>> > >> feature sets as 3.x."
>> > >>
>> > >> 3. Avoid the "risk of split-brain behavior" by "minimize
>>backporting
>> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already
>>tedious.
>> > >> Adding a branch-3, branch-3.x would be obnoxious."
>> > >>
>> > >> These three cannot be achieved at the same time.  Which do we
>>abandon?
>> > >>
>> > >>
>> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
>><sa...@gmail.com>
>> > >> wrote:
>> > >> >
>> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
>> wrote:
>> > >> >>
>> > >> >> 2) Simplification of configs - potentially separating client
>>side
>> > >> configs
>> > >> >> and those used by daemons. This is another source of perpetual
>> > confusion
>> > >> >> for users.
>> > >> > + 1 on this.
>> > >> >
>> > >> > sanjay
>> > >>
>> >
>>
>>
>>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with
the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
be beta or GA for some number of months. In the meanwhile, it'd be good to
keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations,
> are we getting rid of branch-2.8?
>
> Kihwal
>
>       From: Andrew Wang <an...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> hdfs-dev <hd...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release since
> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
> shell script rewrite, and many other improvements, I think it's time to
> revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of
> regular alpha releases leading up to beta and GA. Alpha releases make it
> easier for downstreams to integrate with our code, and making them regular
> means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings
> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. If
> you have changes like this, please set the target version to 3.0.0 and mark
> them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out (namely, the
> new CHANGES.txt and release note generation from Yetus), but I'd
> tentatively like to roll the first alpha a month out, so third week of
> March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 language
> > > features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
> > >> feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with
the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
be beta or GA for some number of months. In the meanwhile, it'd be good to
keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations,
> are we getting rid of branch-2.8?
>
> Kihwal
>
>       From: Andrew Wang <an...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> hdfs-dev <hd...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release since
> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
> shell script rewrite, and many other improvements, I think it's time to
> revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of
> regular alpha releases leading up to beta and GA. Alpha releases make it
> easier for downstreams to integrate with our code, and making them regular
> means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings
> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. If
> you have changes like this, please set the target version to 3.0.0 and mark
> them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out (namely, the
> new CHANGES.txt and release note generation from Yetus), but I'd
> tentatively like to roll the first alpha a month out, so third week of
> March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 language
> > > features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
> > >> feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with
the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
be beta or GA for some number of months. In the meanwhile, it'd be good to
keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations,
> are we getting rid of branch-2.8?
>
> Kihwal
>
>       From: Andrew Wang <an...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> hdfs-dev <hd...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release since
> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
> shell script rewrite, and many other improvements, I think it's time to
> revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of
> regular alpha releases leading up to beta and GA. Alpha releases make it
> easier for downstreams to integrate with our code, and making them regular
> means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings
> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. If
> you have changes like this, please set the target version to 3.0.0 and mark
> them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out (namely, the
> new CHANGES.txt and release note generation from Yetus), but I'd
> tentatively like to roll the first alpha a month out, so third week of
> March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 language
> > > features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
> > >> feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with
the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
be beta or GA for some number of months. In the meanwhile, it'd be good to
keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <ki...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations,
> are we getting rid of branch-2.8?
>
> Kihwal
>
>       From: Andrew Wang <an...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>;
> hdfs-dev <hd...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release since
> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
> shell script rewrite, and many other improvements, I think it's time to
> revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of
> regular alpha releases leading up to beta and GA. Alpha releases make it
> easier for downstreams to integrate with our code, and making them regular
> means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings
> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. If
> you have changes like this, please set the target version to 3.0.0 and mark
> them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out (namely, the
> new CHANGES.txt and release note generation from Yetus), but I'd
> tentatively like to roll the first alpha a month out, so third week of
> March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 language
> > > features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
> > >> feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

Re: Looking to a Hadoop 3 release

Posted by Kihwal Lee <ki...@yahoo-inc.com.INVALID>.

Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations, are we getting rid of branch-2.8? 

Kihwal

      From: Andrew Wang <an...@cloudera.com>
 To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org> 
Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; hdfs-dev <hd...@hadoop.apache.org>
 Sent: Thursday, February 18, 2016 4:35 PM
 Subject: Re: Looking to a Hadoop 3 release

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

Posted by Kihwal Lee <ki...@yahoo-inc.com.INVALID>.

Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations, are we getting rid of branch-2.8? 

Kihwal

      From: Andrew Wang <an...@cloudera.com>
 To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org> 
Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; hdfs-dev <hd...@hadoop.apache.org>
 Sent: Thursday, February 18, 2016 4:35 PM
 Subject: Re: Looking to a Hadoop 3 release

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

Posted by Kihwal Lee <ki...@yahoo-inc.com.INVALID>.

Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations, are we getting rid of branch-2.8? 

Kihwal

      From: Andrew Wang <an...@cloudera.com>
 To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org> 
Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; hdfs-dev <hd...@hadoop.apache.org>
 Sent: Thursday, February 18, 2016 4:35 PM
 Subject: Re: Looking to a Hadoop 3 release

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

Posted by Kihwal Lee <ki...@yahoo-inc.com.INVALID>.

Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations, are we getting rid of branch-2.8? 

Kihwal

      From: Andrew Wang <an...@cloudera.com>
 To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org> 
Cc: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>; hdfs-dev <hd...@hadoop.apache.org>
 Sent: Thursday, February 18, 2016 4:35 PM
 Subject: Re: Looking to a Hadoop 3 release

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

Posted by Ravi Prakash <ra...@gmail.com>.

+1 for the plan to start cutting 3.x alpha releases. Thanks for the
initiative Andrew!

On Fri, Feb 19, 2016 at 6:19 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> > On 19 Feb 2016, at 11:27, Dmitry Sivachenko <tr...@gmail.com> wrote:
> >
> >
> >> On 19 Feb 2016, at 01:35, Andrew Wang <an...@cloudera.com> wrote:
> >>
> >> Hi all,
> >>
> >> Reviving this thread. I've seen renewed interest in a trunk release
> since
> >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
> the
> >> shell script rewrite, and many other improvements, I think it's time to
> >> revisit Hadoop 3.0 release plans.
> >>
> >
>
> It's time to start ... I suspect it'll take a while to stabilise. I look
> forward to the new shell scripts already
>
> One thing I do want there is for all the alpha releases to make clear that
> there are no compatibility policies here; protocols may change and there is
> no requirement of the first 3.x release to be compatible with all the 3.0.x
> alphas. That's something we missed out on the 2.0.x-alpha process, or at
> least not repeated often enough.
>
> >
> > Hello,
> >
> > any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes
> out?
> >
> > Thanks!
> >
> >
>
> sounds like a good time for a status update on the FB work —and anything
> people can do to test it would be appreciated by all. That includes testing
> on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on
> and both MIT and AD kerberos servers. At the same time, IPv6 support ought
> to be something that could be added in.
>
>
> I don't have any opinions on timescale, but
>
> +1 to anything related to classpath isolation
> +1 to a careful bump of versions of dependencies.
> +1 to fixing the outstanding Java 8 migration issues, especially the big
> Jersey patch that's just been updated.
> +1 to switching to JIRA-created release notes
>
> Having been doing the slider releases recently, it's clear to me that you
> can do a lot in automating the release process itself. All those steps in
> the release runbook can be turned into targets in a special ant release.xml
> build file, calling maven, gpg, etc.
>
> I think doing something like this for 3.0 will significantly benefit both
> the release phase here but the future releases
>
> This is the slider one:
> https://github.com/apache/incubator-slider/blob/develop/bin/release.xml
>
> It doesn't replace maven, instead it choreographs that along with all the
> other steps: signing and checksumming artifacts, publishing them, voting
>
> it includes
>  -refusing to release if the git repo is modified
>  -making the various git branch/tag/push operations
>  -issuing the various mvn versions:update commands
>  -signing
>  -publishing via asf SVN
>  -using GET calls too verify the artifacts made it
>  -generating the vote and vote result emails (it even counts the votes)
>
> I recommend this is included as part of the release process. It does make
> a difference; we can now cut new releases with no human intervention other
> than editing a properties file and running different targets as the process
> goes through its release and vote phases.
>
> -Steve

Re: Looking to a Hadoop 3 release

Posted by Ravi Prakash <ra...@gmail.com>.

+1 for the plan to start cutting 3.x alpha releases. Thanks for the
initiative Andrew!

On Fri, Feb 19, 2016 at 6:19 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> > On 19 Feb 2016, at 11:27, Dmitry Sivachenko <tr...@gmail.com> wrote:
> >
> >
> >> On 19 Feb 2016, at 01:35, Andrew Wang <an...@cloudera.com> wrote:
> >>
> >> Hi all,
> >>
> >> Reviving this thread. I've seen renewed interest in a trunk release
> since
> >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
> the
> >> shell script rewrite, and many other improvements, I think it's time to
> >> revisit Hadoop 3.0 release plans.
> >>
> >
>
> It's time to start ... I suspect it'll take a while to stabilise. I look
> forward to the new shell scripts already
>
> One thing I do want there is for all the alpha releases to make clear that
> there are no compatibility policies here; protocols may change and there is
> no requirement of the first 3.x release to be compatible with all the 3.0.x
> alphas. That's something we missed out on the 2.0.x-alpha process, or at
> least not repeated often enough.
>
> >
> > Hello,
> >
> > any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes
> out?
> >
> > Thanks!
> >
> >
>
> sounds like a good time for a status update on the FB work —and anything
> people can do to test it would be appreciated by all. That includes testing
> on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on
> and both MIT and AD kerberos servers. At the same time, IPv6 support ought
> to be something that could be added in.
>
>
> I don't have any opinions on timescale, but
>
> +1 to anything related to classpath isolation
> +1 to a careful bump of versions of dependencies.
> +1 to fixing the outstanding Java 8 migration issues, especially the big
> Jersey patch that's just been updated.
> +1 to switching to JIRA-created release notes
>
> Having been doing the slider releases recently, it's clear to me that you
> can do a lot in automating the release process itself. All those steps in
> the release runbook can be turned into targets in a special ant release.xml
> build file, calling maven, gpg, etc.
>
> I think doing something like this for 3.0 will significantly benefit both
> the release phase here but the future releases
>
> This is the slider one:
> https://github.com/apache/incubator-slider/blob/develop/bin/release.xml
>
> It doesn't replace maven, instead it choreographs that along with all the
> other steps: signing and checksumming artifacts, publishing them, voting
>
> it includes
>  -refusing to release if the git repo is modified
>  -making the various git branch/tag/push operations
>  -issuing the various mvn versions:update commands
>  -signing
>  -publishing via asf SVN
>  -using GET calls too verify the artifacts made it
>  -generating the vote and vote result emails (it even counts the votes)
>
> I recommend this is included as part of the release process. It does make
> a difference; we can now cut new releases with no human intervention other
> than editing a properties file and running different targets as the process
> goes through its release and vote phases.
>
> -Steve

Re: Looking to a Hadoop 3 release

Posted by Ravi Prakash <ra...@gmail.com>.

+1 for the plan to start cutting 3.x alpha releases. Thanks for the
initiative Andrew!

On Fri, Feb 19, 2016 at 6:19 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> > On 19 Feb 2016, at 11:27, Dmitry Sivachenko <tr...@gmail.com> wrote:
> >
> >
> >> On 19 Feb 2016, at 01:35, Andrew Wang <an...@cloudera.com> wrote:
> >>
> >> Hi all,
> >>
> >> Reviving this thread. I've seen renewed interest in a trunk release
> since
> >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
> the
> >> shell script rewrite, and many other improvements, I think it's time to
> >> revisit Hadoop 3.0 release plans.
> >>
> >
>
> It's time to start ... I suspect it'll take a while to stabilise. I look
> forward to the new shell scripts already
>
> One thing I do want there is for all the alpha releases to make clear that
> there are no compatibility policies here; protocols may change and there is
> no requirement of the first 3.x release to be compatible with all the 3.0.x
> alphas. That's something we missed out on the 2.0.x-alpha process, or at
> least not repeated often enough.
>
> >
> > Hello,
> >
> > any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes
> out?
> >
> > Thanks!
> >
> >
>
> sounds like a good time for a status update on the FB work —and anything
> people can do to test it would be appreciated by all. That includes testing
> on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on
> and both MIT and AD kerberos servers. At the same time, IPv6 support ought
> to be something that could be added in.
>
>
> I don't have any opinions on timescale, but
>
> +1 to anything related to classpath isolation
> +1 to a careful bump of versions of dependencies.
> +1 to fixing the outstanding Java 8 migration issues, especially the big
> Jersey patch that's just been updated.
> +1 to switching to JIRA-created release notes
>
> Having been doing the slider releases recently, it's clear to me that you
> can do a lot in automating the release process itself. All those steps in
> the release runbook can be turned into targets in a special ant release.xml
> build file, calling maven, gpg, etc.
>
> I think doing something like this for 3.0 will significantly benefit both
> the release phase here but the future releases
>
> This is the slider one:
> https://github.com/apache/incubator-slider/blob/develop/bin/release.xml
>
> It doesn't replace maven, instead it choreographs that along with all the
> other steps: signing and checksumming artifacts, publishing them, voting
>
> it includes
>  -refusing to release if the git repo is modified
>  -making the various git branch/tag/push operations
>  -issuing the various mvn versions:update commands
>  -signing
>  -publishing via asf SVN
>  -using GET calls too verify the artifacts made it
>  -generating the vote and vote result emails (it even counts the votes)
>
> I recommend this is included as part of the release process. It does make
> a difference; we can now cut new releases with no human intervention other
> than editing a properties file and running different targets as the process
> goes through its release and vote phases.
>
> -Steve

Re: Looking to a Hadoop 3 release

Posted by Ravi Prakash <ra...@gmail.com>.

+1 for the plan to start cutting 3.x alpha releases. Thanks for the
initiative Andrew!

On Fri, Feb 19, 2016 at 6:19 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> > On 19 Feb 2016, at 11:27, Dmitry Sivachenko <tr...@gmail.com> wrote:
> >
> >
> >> On 19 Feb 2016, at 01:35, Andrew Wang <an...@cloudera.com> wrote:
> >>
> >> Hi all,
> >>
> >> Reviving this thread. I've seen renewed interest in a trunk release
> since
> >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
> the
> >> shell script rewrite, and many other improvements, I think it's time to
> >> revisit Hadoop 3.0 release plans.
> >>
> >
>
> It's time to start ... I suspect it'll take a while to stabilise. I look
> forward to the new shell scripts already
>
> One thing I do want there is for all the alpha releases to make clear that
> there are no compatibility policies here; protocols may change and there is
> no requirement of the first 3.x release to be compatible with all the 3.0.x
> alphas. That's something we missed out on the 2.0.x-alpha process, or at
> least not repeated often enough.
>
> >
> > Hello,
> >
> > any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes
> out?
> >
> > Thanks!
> >
> >
>
> sounds like a good time for a status update on the FB work —and anything
> people can do to test it would be appreciated by all. That includes testing
> on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on
> and both MIT and AD kerberos servers. At the same time, IPv6 support ought
> to be something that could be added in.
>
>
> I don't have any opinions on timescale, but
>
> +1 to anything related to classpath isolation
> +1 to a careful bump of versions of dependencies.
> +1 to fixing the outstanding Java 8 migration issues, especially the big
> Jersey patch that's just been updated.
> +1 to switching to JIRA-created release notes
>
> Having been doing the slider releases recently, it's clear to me that you
> can do a lot in automating the release process itself. All those steps in
> the release runbook can be turned into targets in a special ant release.xml
> build file, calling maven, gpg, etc.
>
> I think doing something like this for 3.0 will significantly benefit both
> the release phase here but the future releases
>
> This is the slider one:
> https://github.com/apache/incubator-slider/blob/develop/bin/release.xml
>
> It doesn't replace maven, instead it choreographs that along with all the
> other steps: signing and checksumming artifacts, publishing them, voting
>
> it includes
>  -refusing to release if the git repo is modified
>  -making the various git branch/tag/push operations
>  -issuing the various mvn versions:update commands
>  -signing
>  -publishing via asf SVN
>  -using GET calls too verify the artifacts made it
>  -generating the vote and vote result emails (it even counts the votes)
>
> I recommend this is included as part of the release process. It does make
> a difference; we can now cut new releases with no human intervention other
> than editing a properties file and running different targets as the process
> goes through its release and vote phases.
>
> -Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

> On 19 Feb 2016, at 11:27, Dmitry Sivachenko <tr...@gmail.com> wrote:
> 
> 
>> On 19 Feb 2016, at 01:35, Andrew Wang <an...@cloudera.com> wrote:
>> 
>> Hi all,
>> 
>> Reviving this thread. I've seen renewed interest in a trunk release since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>> 
> 

It's time to start ... I suspect it'll take a while to stabilise. I look forward to the new shell scripts already

One thing I do want there is for all the alpha releases to make clear that there are no compatibility policies here; protocols may change and there is no requirement of the first 3.x release to be compatible with all the 3.0.x alphas. That's something we missed out on the 2.0.x-alpha process, or at least not repeated often enough.

> 
> Hello,
> 
> any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out?
> 
> Thanks!
> 
> 

sounds like a good time for a status update on the FB work —and anything people can do to test it would be appreciated by all. That includes testing on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on and both MIT and AD kerberos servers. At the same time, IPv6 support ought to be something that could be added in.

I don't have any opinions on timescale, but

+1 to anything related to classpath isolation
+1 to a careful bump of versions of dependencies.
+1 to fixing the outstanding Java 8 migration issues, especially the big Jersey patch that's just been updated.
+1 to switching to JIRA-created release notes

Having been doing the slider releases recently, it's clear to me that you can do a lot in automating the release process itself. All those steps in the release runbook can be turned into targets in a special ant release.xml build file, calling maven, gpg, etc.

I think doing something like this for 3.0 will significantly benefit both the release phase here but the future releases

This is the slider one: https://github.com/apache/incubator-slider/blob/develop/bin/release.xml

It doesn't replace maven, instead it choreographs that along with all the other steps: signing and checksumming artifacts, publishing them, voting

it includes
 -refusing to release if the git repo is modified
 -making the various git branch/tag/push operations
 -issuing the various mvn versions:update commands
 -signing
 -publishing via asf SVN 
 -using GET calls too verify the artifacts made it
 -generating the vote and vote result emails (it even counts the votes)

I recommend this is included as part of the release process. It does make a difference; we can now cut new releases with no human intervention other than editing a properties file and running different targets as the process goes through its release and vote phases.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

> On 19 Feb 2016, at 11:27, Dmitry Sivachenko <tr...@gmail.com> wrote:
> 
> 
>> On 19 Feb 2016, at 01:35, Andrew Wang <an...@cloudera.com> wrote:
>> 
>> Hi all,
>> 
>> Reviving this thread. I've seen renewed interest in a trunk release since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>> 
> 

It's time to start ... I suspect it'll take a while to stabilise. I look forward to the new shell scripts already

One thing I do want there is for all the alpha releases to make clear that there are no compatibility policies here; protocols may change and there is no requirement of the first 3.x release to be compatible with all the 3.0.x alphas. That's something we missed out on the 2.0.x-alpha process, or at least not repeated often enough.

> 
> Hello,
> 
> any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out?
> 
> Thanks!
> 
> 

sounds like a good time for a status update on the FB work —and anything people can do to test it would be appreciated by all. That includes testing on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on and both MIT and AD kerberos servers. At the same time, IPv6 support ought to be something that could be added in.

I don't have any opinions on timescale, but

+1 to anything related to classpath isolation
+1 to a careful bump of versions of dependencies.
+1 to fixing the outstanding Java 8 migration issues, especially the big Jersey patch that's just been updated.
+1 to switching to JIRA-created release notes

Having been doing the slider releases recently, it's clear to me that you can do a lot in automating the release process itself. All those steps in the release runbook can be turned into targets in a special ant release.xml build file, calling maven, gpg, etc.

I think doing something like this for 3.0 will significantly benefit both the release phase here but the future releases

This is the slider one: https://github.com/apache/incubator-slider/blob/develop/bin/release.xml

It doesn't replace maven, instead it choreographs that along with all the other steps: signing and checksumming artifacts, publishing them, voting

it includes
 -refusing to release if the git repo is modified
 -making the various git branch/tag/push operations
 -issuing the various mvn versions:update commands
 -signing
 -publishing via asf SVN 
 -using GET calls too verify the artifacts made it
 -generating the vote and vote result emails (it even counts the votes)

I recommend this is included as part of the release process. It does make a difference; we can now cut new releases with no human intervention other than editing a properties file and running different targets as the process goes through its release and vote phases.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

> On 19 Feb 2016, at 11:27, Dmitry Sivachenko <tr...@gmail.com> wrote:
> 
> 
>> On 19 Feb 2016, at 01:35, Andrew Wang <an...@cloudera.com> wrote:
>> 
>> Hi all,
>> 
>> Reviving this thread. I've seen renewed interest in a trunk release since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>> 
> 

It's time to start ... I suspect it'll take a while to stabilise. I look forward to the new shell scripts already

One thing I do want there is for all the alpha releases to make clear that there are no compatibility policies here; protocols may change and there is no requirement of the first 3.x release to be compatible with all the 3.0.x alphas. That's something we missed out on the 2.0.x-alpha process, or at least not repeated often enough.

> 
> Hello,
> 
> any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out?
> 
> Thanks!
> 
> 

sounds like a good time for a status update on the FB work —and anything people can do to test it would be appreciated by all. That includes testing on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on and both MIT and AD kerberos servers. At the same time, IPv6 support ought to be something that could be added in.

I don't have any opinions on timescale, but

+1 to anything related to classpath isolation
+1 to a careful bump of versions of dependencies.
+1 to fixing the outstanding Java 8 migration issues, especially the big Jersey patch that's just been updated.
+1 to switching to JIRA-created release notes

Having been doing the slider releases recently, it's clear to me that you can do a lot in automating the release process itself. All those steps in the release runbook can be turned into targets in a special ant release.xml build file, calling maven, gpg, etc.

I think doing something like this for 3.0 will significantly benefit both the release phase here but the future releases

This is the slider one: https://github.com/apache/incubator-slider/blob/develop/bin/release.xml

It doesn't replace maven, instead it choreographs that along with all the other steps: signing and checksumming artifacts, publishing them, voting

it includes
 -refusing to release if the git repo is modified
 -making the various git branch/tag/push operations
 -issuing the various mvn versions:update commands
 -signing
 -publishing via asf SVN 
 -using GET calls too verify the artifacts made it
 -generating the vote and vote result emails (it even counts the votes)

I recommend this is included as part of the release process. It does make a difference; we can now cut new releases with no human intervention other than editing a properties file and running different targets as the process goes through its release and vote phases.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

> On 19 Feb 2016, at 11:27, Dmitry Sivachenko <tr...@gmail.com> wrote:
> 
> 
>> On 19 Feb 2016, at 01:35, Andrew Wang <an...@cloudera.com> wrote:
>> 
>> Hi all,
>> 
>> Reviving this thread. I've seen renewed interest in a trunk release since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>> 
> 

It's time to start ... I suspect it'll take a while to stabilise. I look forward to the new shell scripts already

One thing I do want there is for all the alpha releases to make clear that there are no compatibility policies here; protocols may change and there is no requirement of the first 3.x release to be compatible with all the 3.0.x alphas. That's something we missed out on the 2.0.x-alpha process, or at least not repeated often enough.

> 
> Hello,
> 
> any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out?
> 
> Thanks!
> 
> 

sounds like a good time for a status update on the FB work —and anything people can do to test it would be appreciated by all. That includes testing on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on and both MIT and AD kerberos servers. At the same time, IPv6 support ought to be something that could be added in.

I don't have any opinions on timescale, but

+1 to anything related to classpath isolation
+1 to a careful bump of versions of dependencies.
+1 to fixing the outstanding Java 8 migration issues, especially the big Jersey patch that's just been updated.
+1 to switching to JIRA-created release notes

Having been doing the slider releases recently, it's clear to me that you can do a lot in automating the release process itself. All those steps in the release runbook can be turned into targets in a special ant release.xml build file, calling maven, gpg, etc.

I think doing something like this for 3.0 will significantly benefit both the release phase here but the future releases

This is the slider one: https://github.com/apache/incubator-slider/blob/develop/bin/release.xml

It doesn't replace maven, instead it choreographs that along with all the other steps: signing and checksumming artifacts, publishing them, voting

it includes
 -refusing to release if the git repo is modified
 -making the various git branch/tag/push operations
 -issuing the various mvn versions:update commands
 -signing
 -publishing via asf SVN 
 -using GET calls too verify the artifacts made it
 -generating the vote and vote result emails (it even counts the votes)

I recommend this is included as part of the release process. It does make a difference; we can now cut new releases with no human intervention other than editing a properties file and running different targets as the process goes through its release and vote phases.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Dmitry Sivachenko <tr...@gmail.com>.

> On 19 Feb 2016, at 01:35, Andrew Wang <an...@cloudera.com> wrote:
> 
> Hi all,
> 
> Reviving this thread. I've seen renewed interest in a trunk release since
> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
> shell script rewrite, and many other improvements, I think it's time to
> revisit Hadoop 3.0 release plans.
> 


Hello,

any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out?

Thanks!

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rs...@altiscale.com> wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com>
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

Posted by Raymie Stata <rs...@altiscale.com>.

Avoiding the use of JDK8 language features (and, presumably, APIs)
means you've abandoned #1, i.e., you haven't (really) bumped the JDK
source version to JDK8.

Also, note that releasing from trunk is a way of achieving #3, it's
not a way of abandoning it.



On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <an...@cloudera.com> wrote:
> Hi Raymie,
>
> Konst proposed just releasing off of trunk rather than cutting a branch-2,
> and there was general agreement there. So, consider #3 abandoned. 1&2 can
> be achieved at the same time, we just need to avoid using JDK8 language
> features in trunk so things can be backported.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com> wrote:
>
>> In this (and the related threads), I see the following three requirements:
>>
>> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>>
>> 2. "We'll still be releasing 2.x releases for a while, with similar
>> feature sets as 3.x."
>>
>> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
>> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
>> Adding a branch-3, branch-3.x would be obnoxious."
>>
>> These three cannot be achieved at the same time.  Which do we abandon?
>>
>>
>> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
>> wrote:
>> >
>> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
>> >>
>> >> 2) Simplification of configs - potentially separating client side
>> configs
>> >> and those used by daemons. This is another source of perpetual confusion
>> >> for users.
>> > + 1 on this.
>> >
>> > sanjay
>>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Raymie,

Konst proposed just releasing off of trunk rather than cutting a branch-2,
and there was general agreement there. So, consider #3 abandoned. 1&2 can
be achieved at the same time, we just need to avoid using JDK8 language
features in trunk so things can be backported.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com> wrote:

> In this (and the related threads), I see the following three requirements:
>
> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>
> 2. "We'll still be releasing 2.x releases for a while, with similar
> feature sets as 3.x."
>
> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> Adding a branch-3, branch-3.x would be obnoxious."
>
> These three cannot be achieved at the same time.  Which do we abandon?
>
>
> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> wrote:
> >
> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >>
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> > + 1 on this.
> >
> > sanjay
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Raymie,

Konst proposed just releasing off of trunk rather than cutting a branch-2,
and there was general agreement there. So, consider #3 abandoned. 1&2 can
be achieved at the same time, we just need to avoid using JDK8 language
features in trunk so things can be backported.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com> wrote:

> In this (and the related threads), I see the following three requirements:
>
> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>
> 2. "We'll still be releasing 2.x releases for a while, with similar
> feature sets as 3.x."
>
> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> Adding a branch-3, branch-3.x would be obnoxious."
>
> These three cannot be achieved at the same time.  Which do we abandon?
>
>
> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> wrote:
> >
> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >>
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> > + 1 on this.
> >
> > sanjay
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Raymie,

Konst proposed just releasing off of trunk rather than cutting a branch-2,
and there was general agreement there. So, consider #3 abandoned. 1&2 can
be achieved at the same time, we just need to avoid using JDK8 language
features in trunk so things can be backported.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com> wrote:

> In this (and the related threads), I see the following three requirements:
>
> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>
> 2. "We'll still be releasing 2.x releases for a while, with similar
> feature sets as 3.x."
>
> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> Adding a branch-3, branch-3.x would be obnoxious."
>
> These three cannot be achieved at the same time.  Which do we abandon?
>
>
> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> wrote:
> >
> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >>
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> > + 1 on this.
> >
> > sanjay
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Raymie,

Konst proposed just releasing off of trunk rather than cutting a branch-2,
and there was general agreement there. So, consider #3 abandoned. 1&2 can
be achieved at the same time, we just need to avoid using JDK8 language
features in trunk so things can be backported.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rs...@altiscale.com> wrote:

> In this (and the related threads), I see the following three requirements:
>
> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>
> 2. "We'll still be releasing 2.x releases for a while, with similar
> feature sets as 3.x."
>
> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> Adding a branch-3, branch-3.x would be obnoxious."
>
> These three cannot be achieved at the same time.  Which do we abandon?
>
>
> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com>
> wrote:
> >
> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >>
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> > + 1 on this.
> >
> > sanjay
>

Re: Looking to a Hadoop 3 release

Posted by Raymie Stata <rs...@altiscale.com>.

In this (and the related threads), I see the following three requirements:

1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).

2. "We'll still be releasing 2.x releases for a while, with similar
feature sets as 3.x."

3. Avoid the "risk of split-brain behavior" by "minimize backporting
headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
Adding a branch-3, branch-3.x would be obnoxious."

These three cannot be achieved at the same time.  Which do we abandon?

On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sa...@gmail.com> wrote:
>
>> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
>>
>> 2) Simplification of configs - potentially separating client side configs
>> and those used by daemons. This is another source of perpetual confusion
>> for users.
> + 1 on this.
>
> sanjay

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> 
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
+ 1 on this.

sanjay

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> 
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
+ 1 on this.

sanjay

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes, these are the kind of enhancements that need to be proposed and discussed for inclusion!

Thanks,
+Vinod

On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:


> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
> 
> Thanks
> - Sid
> 
> 
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
> 
>> Sorry, outlook dequoted Alejandros's comments.
>> 
>> Let me try again with his comments in italic and proofreading of mine
>> 
>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>> stevel@hortonworks.com>> wrote:
>> 
>> 
>> 
>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>> 
>> IMO, if part of the community wants to take on the responsibility and work
>> that takes to do a new major release, we should not discourage them from
>> doing that.
>> 
>> Having multiple major branches active is a standard practice.
>> 
>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>> long time to get out, and during that time 0.21, 0.22, got released and
>> ignored; 0.23 picked up and used in production.
>> 
>> The 2.04-alpha release was more of a troublespot as it got picked up
>> widely enough to be used in products, and changes were made between that
>> alpha & 2.2 itself which raised compatibility issues.
>> 
>> For 3.x I'd propose
>> 
>> 
>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>> releases to shipping. Best effort, but not to the extent that it gets in
>> the way. More succinctly: we will care more about seamless migration from
>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
>> phase
>> 
>> As well as backwards compatibility, we need to think about Forwards
>> compatibility, with the goal being:
>> 
>> Any app written/shipped with the 3.x release binaries (JAR and native)
>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>> where y>=x  and is-release(x) and is-release(y)
>> 
>> That's important, as it means all server-side changes in 3.x which are
>> expected to to mandate client-side updates: protocols, HDFS erasure
>> decoding, security features, must be considered complete and stable before
>> we can say is-release(x). In an ideal world, we'll even get the semantics
>> right with tests to show this.
>> 
>> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
>> it's only one of the features, and given there's not any design doc on that
>> JIRA, way too immature to set a release schedule on. An alpha schedule with
>> no-guarantees and a regular alpha roll, could be viable, as new features go
>> in and can then be used to experimentally try this stuff in branches of
>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>> will be transitive downstream.
>> 
>> 
>> This time around we are not replacing the guts as we did from Hadoop 1 to
>> Hadoop 2, but superficial surgery to address issues were not considered (or
>> was too much to take on top of the guts transplant).
>> 
>> For the split brain concern, we did a great of job maintaining Hadoop 1 and
>> Hadoop 2 until Hadoop 1 faded away.
>> 
>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>> compatibility.
>> 
>> 
>> Based on that experience I would say that the coexistence of Hadoop 2 and
>> Hadoop 3 will be much less demanding/traumatic.
>> 
>> The re-layout of all the source trees was a major change there, assuming
>> there's no refactoring or switch of build tools then picking things back
>> will be tractable
>> 
>> 
>> Also, to facilitate the coexistence we should limit Java language features
>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
>> we can remove this limitation.
>> 
>> +1; setting javac.version will fix this
>> 
>> What is nice about having java 8 as the base JVM is that it means you can
>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>> and libs can use all Java 8 features they want to.
>> 
>> There's one policy change to consider there which is possibly, just
>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>> languages early, provided everyone recognised that "backport to branch-2"
>> isn't going to happen.
>> 
>> -Steve
>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
page. In addition to the two things I've been pushing, I also looked
through Allen's list (thanks Allen for making this) and picked out the
shell script rewrite and the removal of HFTP as big changes. This would be
the place to propose features for inclusion in 3.x, I'd particularly
appreciate help on the YARN/MR side.

Based on what I'm hearing, let me modulate my proposal to the following:

- We avoid cutting branch-3, and release off of trunk. The trunk-only
changes don't look that scary, so I think this is fine. This does mean we
need to be more rigorous before merging branches to trunk. I think
Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
be very helpful in this regard.
- We do not include anything to break wire compatibility unless (as Jason
says) it's an unbelievably awesome feature.
- No harm in rolling alphas from trunk, as it doesn't lock us to anything
compatibility wise. Downstreams like releases.

I'll take Steve's advice about not locking GA to a given date, but I also
share his belief that we can alpha/beta/GA faster than it took for Hadoop
2. Let's roll some intermediate releases, work on the roadmap items, and
see how we're feeling in a few months.

Best,
Andrew

On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:

> I think it'll be useful to have a discussion about what else people would
> like to see in Hadoop 3.x - especially if the change is potentially
> incompatible. Also, what we expect the release schedule to be for major
> releases and what triggers them - JVM version, major features, the need for
> incompatible changes ? Assuming major versions will not be released every 6
> months/1 year (adoption time, fairly disruptive for downstream projects,
> and users) -  considering additional features/incompatible changes for 3.x
> would be useful.
>
> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
>
> Thanks
> - Sid
>
>
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> > Sorry, outlook dequoted Alejandros's comments.
> >
> > Let me try again with his comments in italic and proofreading of mine
> >
> > On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> > stevel@hortonworks.com>> wrote:
> >
> >
> >
> > On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> > tucu00@gmail.com><ma...@gmail.com>> wrote:
> >
> > IMO, if part of the community wants to take on the responsibility and
> work
> > that takes to do a new major release, we should not discourage them from
> > doing that.
> >
> > Having multiple major branches active is a standard practice.
> >
> > Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> > long time to get out, and during that time 0.21, 0.22, got released and
> > ignored; 0.23 picked up and used in production.
> >
> > The 2.04-alpha release was more of a troublespot as it got picked up
> > widely enough to be used in products, and changes were made between that
> > alpha & 2.2 itself which raised compatibility issues.
> >
> > For 3.x I'd propose
> >
> >
> >   1.  Have less longevity of 3.x alpha/beta artifacts
> >   2.  Make clear there are no guarantees of compatibility from alpha/beta
> > releases to shipping. Best effort, but not to the extent that it gets in
> > the way. More succinctly: we will care more about seamless migration from
> > 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> > accept policy (2). Hadoop's "instability guarantee" for the 3.x
> alpha/beta
> > phase
> >
> > As well as backwards compatibility, we need to think about Forwards
> > compatibility, with the goal being:
> >
> > Any app written/shipped with the 3.x release binaries (JAR and native)
> > will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> > where y>=x  and is-release(x) and is-release(y)
> >
> > That's important, as it means all server-side changes in 3.x which are
> > expected to to mandate client-side updates: protocols, HDFS erasure
> > decoding, security features, must be considered complete and stable
> before
> > we can say is-release(x). In an ideal world, we'll even get the semantics
> > right with tests to show this.
> >
> > Fixing classpath hell downstream is certainly one feature I am +1 on.
> But:
> > it's only one of the features, and given there's not any design doc on
> that
> > JIRA, way too immature to set a release schedule on. An alpha schedule
> with
> > no-guarantees and a regular alpha roll, could be viable, as new features
> go
> > in and can then be used to experimentally try this stuff in branches of
> > Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> > will be transitive downstream.
> >
> >
> > This time around we are not replacing the guts as we did from Hadoop 1 to
> > Hadoop 2, but superficial surgery to address issues were not considered
> (or
> > was too much to take on top of the guts transplant).
> >
> > For the split brain concern, we did a great of job maintaining Hadoop 1
> and
> > Hadoop 2 until Hadoop 1 faded away.
> >
> > And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> > compatibility.
> >
> >
> > Based on that experience I would say that the coexistence of Hadoop 2 and
> > Hadoop 3 will be much less demanding/traumatic.
> >
> > The re-layout of all the source trees was a major change there, assuming
> > there's no refactoring or switch of build tools then picking things back
> > will be tractable
> >
> >
> > Also, to facilitate the coexistence we should limit Java language
> features
> > to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> anymore
> > we can remove this limitation.
> >
> > +1; setting javac.version will fix this
> >
> > What is nice about having java 8 as the base JVM is that it means you can
> > be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> > and libs can use all Java 8 features they want to.
> >
> > There's one policy change to consider there which is possibly, just
> > possibly, we could allow new modules in hadoop-tools to adopt Java 8
> > languages early, provided everyone recognised that "backport to branch-2"
> > isn't going to happen.
> >
> > -Steve
> >
> >
>

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> 
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
+ 1 on this.

sanjay

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
page. In addition to the two things I've been pushing, I also looked
through Allen's list (thanks Allen for making this) and picked out the
shell script rewrite and the removal of HFTP as big changes. This would be
the place to propose features for inclusion in 3.x, I'd particularly
appreciate help on the YARN/MR side.

Based on what I'm hearing, let me modulate my proposal to the following:

- We avoid cutting branch-3, and release off of trunk. The trunk-only
changes don't look that scary, so I think this is fine. This does mean we
need to be more rigorous before merging branches to trunk. I think
Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
be very helpful in this regard.
- We do not include anything to break wire compatibility unless (as Jason
says) it's an unbelievably awesome feature.
- No harm in rolling alphas from trunk, as it doesn't lock us to anything
compatibility wise. Downstreams like releases.

I'll take Steve's advice about not locking GA to a given date, but I also
share his belief that we can alpha/beta/GA faster than it took for Hadoop
2. Let's roll some intermediate releases, work on the roadmap items, and
see how we're feeling in a few months.

Best,
Andrew

On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:

> I think it'll be useful to have a discussion about what else people would
> like to see in Hadoop 3.x - especially if the change is potentially
> incompatible. Also, what we expect the release schedule to be for major
> releases and what triggers them - JVM version, major features, the need for
> incompatible changes ? Assuming major versions will not be released every 6
> months/1 year (adoption time, fairly disruptive for downstream projects,
> and users) -  considering additional features/incompatible changes for 3.x
> would be useful.
>
> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
>
> Thanks
> - Sid
>
>
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> > Sorry, outlook dequoted Alejandros's comments.
> >
> > Let me try again with his comments in italic and proofreading of mine
> >
> > On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> > stevel@hortonworks.com>> wrote:
> >
> >
> >
> > On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> > tucu00@gmail.com><ma...@gmail.com>> wrote:
> >
> > IMO, if part of the community wants to take on the responsibility and
> work
> > that takes to do a new major release, we should not discourage them from
> > doing that.
> >
> > Having multiple major branches active is a standard practice.
> >
> > Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> > long time to get out, and during that time 0.21, 0.22, got released and
> > ignored; 0.23 picked up and used in production.
> >
> > The 2.04-alpha release was more of a troublespot as it got picked up
> > widely enough to be used in products, and changes were made between that
> > alpha & 2.2 itself which raised compatibility issues.
> >
> > For 3.x I'd propose
> >
> >
> >   1.  Have less longevity of 3.x alpha/beta artifacts
> >   2.  Make clear there are no guarantees of compatibility from alpha/beta
> > releases to shipping. Best effort, but not to the extent that it gets in
> > the way. More succinctly: we will care more about seamless migration from
> > 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> > accept policy (2). Hadoop's "instability guarantee" for the 3.x
> alpha/beta
> > phase
> >
> > As well as backwards compatibility, we need to think about Forwards
> > compatibility, with the goal being:
> >
> > Any app written/shipped with the 3.x release binaries (JAR and native)
> > will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> > where y>=x  and is-release(x) and is-release(y)
> >
> > That's important, as it means all server-side changes in 3.x which are
> > expected to to mandate client-side updates: protocols, HDFS erasure
> > decoding, security features, must be considered complete and stable
> before
> > we can say is-release(x). In an ideal world, we'll even get the semantics
> > right with tests to show this.
> >
> > Fixing classpath hell downstream is certainly one feature I am +1 on.
> But:
> > it's only one of the features, and given there's not any design doc on
> that
> > JIRA, way too immature to set a release schedule on. An alpha schedule
> with
> > no-guarantees and a regular alpha roll, could be viable, as new features
> go
> > in and can then be used to experimentally try this stuff in branches of
> > Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> > will be transitive downstream.
> >
> >
> > This time around we are not replacing the guts as we did from Hadoop 1 to
> > Hadoop 2, but superficial surgery to address issues were not considered
> (or
> > was too much to take on top of the guts transplant).
> >
> > For the split brain concern, we did a great of job maintaining Hadoop 1
> and
> > Hadoop 2 until Hadoop 1 faded away.
> >
> > And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> > compatibility.
> >
> >
> > Based on that experience I would say that the coexistence of Hadoop 2 and
> > Hadoop 3 will be much less demanding/traumatic.
> >
> > The re-layout of all the source trees was a major change there, assuming
> > there's no refactoring or switch of build tools then picking things back
> > will be tractable
> >
> >
> > Also, to facilitate the coexistence we should limit Java language
> features
> > to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> anymore
> > we can remove this limitation.
> >
> > +1; setting javac.version will fix this
> >
> > What is nice about having java 8 as the base JVM is that it means you can
> > be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> > and libs can use all Java 8 features they want to.
> >
> > There's one policy change to consider there which is possibly, just
> > possibly, we could allow new modules in hadoop-tools to adopt Java 8
> > languages early, provided everyone recognised that "backport to branch-2"
> > isn't going to happen.
> >
> > -Steve
> >
> >
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes, these are the kind of enhancements that need to be proposed and discussed for inclusion!

Thanks,
+Vinod

On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:


> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
> 
> Thanks
> - Sid
> 
> 
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
> 
>> Sorry, outlook dequoted Alejandros's comments.
>> 
>> Let me try again with his comments in italic and proofreading of mine
>> 
>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>> stevel@hortonworks.com>> wrote:
>> 
>> 
>> 
>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>> 
>> IMO, if part of the community wants to take on the responsibility and work
>> that takes to do a new major release, we should not discourage them from
>> doing that.
>> 
>> Having multiple major branches active is a standard practice.
>> 
>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>> long time to get out, and during that time 0.21, 0.22, got released and
>> ignored; 0.23 picked up and used in production.
>> 
>> The 2.04-alpha release was more of a troublespot as it got picked up
>> widely enough to be used in products, and changes were made between that
>> alpha & 2.2 itself which raised compatibility issues.
>> 
>> For 3.x I'd propose
>> 
>> 
>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>> releases to shipping. Best effort, but not to the extent that it gets in
>> the way. More succinctly: we will care more about seamless migration from
>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
>> phase
>> 
>> As well as backwards compatibility, we need to think about Forwards
>> compatibility, with the goal being:
>> 
>> Any app written/shipped with the 3.x release binaries (JAR and native)
>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>> where y>=x  and is-release(x) and is-release(y)
>> 
>> That's important, as it means all server-side changes in 3.x which are
>> expected to to mandate client-side updates: protocols, HDFS erasure
>> decoding, security features, must be considered complete and stable before
>> we can say is-release(x). In an ideal world, we'll even get the semantics
>> right with tests to show this.
>> 
>> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
>> it's only one of the features, and given there's not any design doc on that
>> JIRA, way too immature to set a release schedule on. An alpha schedule with
>> no-guarantees and a regular alpha roll, could be viable, as new features go
>> in and can then be used to experimentally try this stuff in branches of
>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>> will be transitive downstream.
>> 
>> 
>> This time around we are not replacing the guts as we did from Hadoop 1 to
>> Hadoop 2, but superficial surgery to address issues were not considered (or
>> was too much to take on top of the guts transplant).
>> 
>> For the split brain concern, we did a great of job maintaining Hadoop 1 and
>> Hadoop 2 until Hadoop 1 faded away.
>> 
>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>> compatibility.
>> 
>> 
>> Based on that experience I would say that the coexistence of Hadoop 2 and
>> Hadoop 3 will be much less demanding/traumatic.
>> 
>> The re-layout of all the source trees was a major change there, assuming
>> there's no refactoring or switch of build tools then picking things back
>> will be tractable
>> 
>> 
>> Also, to facilitate the coexistence we should limit Java language features
>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
>> we can remove this limitation.
>> 
>> +1; setting javac.version will fix this
>> 
>> What is nice about having java 8 as the base JVM is that it means you can
>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>> and libs can use all Java 8 features they want to.
>> 
>> There's one policy change to consider there which is possibly, just
>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>> languages early, provided everyone recognised that "backport to branch-2"
>> isn't going to happen.
>> 
>> -Steve
>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes, these are the kind of enhancements that need to be proposed and discussed for inclusion!

Thanks,
+Vinod

On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:


> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
> 
> Thanks
> - Sid
> 
> 
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
> 
>> Sorry, outlook dequoted Alejandros's comments.
>> 
>> Let me try again with his comments in italic and proofreading of mine
>> 
>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>> stevel@hortonworks.com>> wrote:
>> 
>> 
>> 
>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>> 
>> IMO, if part of the community wants to take on the responsibility and work
>> that takes to do a new major release, we should not discourage them from
>> doing that.
>> 
>> Having multiple major branches active is a standard practice.
>> 
>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>> long time to get out, and during that time 0.21, 0.22, got released and
>> ignored; 0.23 picked up and used in production.
>> 
>> The 2.04-alpha release was more of a troublespot as it got picked up
>> widely enough to be used in products, and changes were made between that
>> alpha & 2.2 itself which raised compatibility issues.
>> 
>> For 3.x I'd propose
>> 
>> 
>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>> releases to shipping. Best effort, but not to the extent that it gets in
>> the way. More succinctly: we will care more about seamless migration from
>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
>> phase
>> 
>> As well as backwards compatibility, we need to think about Forwards
>> compatibility, with the goal being:
>> 
>> Any app written/shipped with the 3.x release binaries (JAR and native)
>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>> where y>=x  and is-release(x) and is-release(y)
>> 
>> That's important, as it means all server-side changes in 3.x which are
>> expected to to mandate client-side updates: protocols, HDFS erasure
>> decoding, security features, must be considered complete and stable before
>> we can say is-release(x). In an ideal world, we'll even get the semantics
>> right with tests to show this.
>> 
>> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
>> it's only one of the features, and given there's not any design doc on that
>> JIRA, way too immature to set a release schedule on. An alpha schedule with
>> no-guarantees and a regular alpha roll, could be viable, as new features go
>> in and can then be used to experimentally try this stuff in branches of
>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>> will be transitive downstream.
>> 
>> 
>> This time around we are not replacing the guts as we did from Hadoop 1 to
>> Hadoop 2, but superficial surgery to address issues were not considered (or
>> was too much to take on top of the guts transplant).
>> 
>> For the split brain concern, we did a great of job maintaining Hadoop 1 and
>> Hadoop 2 until Hadoop 1 faded away.
>> 
>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>> compatibility.
>> 
>> 
>> Based on that experience I would say that the coexistence of Hadoop 2 and
>> Hadoop 3 will be much less demanding/traumatic.
>> 
>> The re-layout of all the source trees was a major change there, assuming
>> there's no refactoring or switch of build tools then picking things back
>> will be tractable
>> 
>> 
>> Also, to facilitate the coexistence we should limit Java language features
>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
>> we can remove this limitation.
>> 
>> +1; setting javac.version will fix this
>> 
>> What is nice about having java 8 as the base JVM is that it means you can
>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>> and libs can use all Java 8 features they want to.
>> 
>> There's one policy change to consider there which is possibly, just
>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>> languages early, provided everyone recognised that "backport to branch-2"
>> isn't going to happen.
>> 
>> -Steve
>> 
>>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes, these are the kind of enhancements that need to be proposed and discussed for inclusion!

Thanks,
+Vinod

On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:


> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
> 
> Thanks
> - Sid
> 
> 
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
> 
>> Sorry, outlook dequoted Alejandros's comments.
>> 
>> Let me try again with his comments in italic and proofreading of mine
>> 
>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>> stevel@hortonworks.com>> wrote:
>> 
>> 
>> 
>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>> tucu00@gmail.com><ma...@gmail.com>> wrote:
>> 
>> IMO, if part of the community wants to take on the responsibility and work
>> that takes to do a new major release, we should not discourage them from
>> doing that.
>> 
>> Having multiple major branches active is a standard practice.
>> 
>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>> long time to get out, and during that time 0.21, 0.22, got released and
>> ignored; 0.23 picked up and used in production.
>> 
>> The 2.04-alpha release was more of a troublespot as it got picked up
>> widely enough to be used in products, and changes were made between that
>> alpha & 2.2 itself which raised compatibility issues.
>> 
>> For 3.x I'd propose
>> 
>> 
>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>> releases to shipping. Best effort, but not to the extent that it gets in
>> the way. More succinctly: we will care more about seamless migration from
>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
>> phase
>> 
>> As well as backwards compatibility, we need to think about Forwards
>> compatibility, with the goal being:
>> 
>> Any app written/shipped with the 3.x release binaries (JAR and native)
>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>> where y>=x  and is-release(x) and is-release(y)
>> 
>> That's important, as it means all server-side changes in 3.x which are
>> expected to to mandate client-side updates: protocols, HDFS erasure
>> decoding, security features, must be considered complete and stable before
>> we can say is-release(x). In an ideal world, we'll even get the semantics
>> right with tests to show this.
>> 
>> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
>> it's only one of the features, and given there's not any design doc on that
>> JIRA, way too immature to set a release schedule on. An alpha schedule with
>> no-guarantees and a regular alpha roll, could be viable, as new features go
>> in and can then be used to experimentally try this stuff in branches of
>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>> will be transitive downstream.
>> 
>> 
>> This time around we are not replacing the guts as we did from Hadoop 1 to
>> Hadoop 2, but superficial surgery to address issues were not considered (or
>> was too much to take on top of the guts transplant).
>> 
>> For the split brain concern, we did a great of job maintaining Hadoop 1 and
>> Hadoop 2 until Hadoop 1 faded away.
>> 
>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>> compatibility.
>> 
>> 
>> Based on that experience I would say that the coexistence of Hadoop 2 and
>> Hadoop 3 will be much less demanding/traumatic.
>> 
>> The re-layout of all the source trees was a major change there, assuming
>> there's no refactoring or switch of build tools then picking things back
>> will be tractable
>> 
>> 
>> Also, to facilitate the coexistence we should limit Java language features
>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
>> we can remove this limitation.
>> 
>> +1; setting javac.version will fix this
>> 
>> What is nice about having java 8 as the base JVM is that it means you can
>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>> and libs can use all Java 8 features they want to.
>> 
>> There's one policy change to consider there which is possibly, just
>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>> languages early, provided everyone recognised that "backport to branch-2"
>> isn't going to happen.
>> 
>> -Steve
>> 
>>

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> 
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
+ 1 on this.

sanjay

Re: Looking to a Hadoop 3 release

Posted by Siddharth Seth <ss...@apache.org>.

I think it'll be useful to have a discussion about what else people would
like to see in Hadoop 3.x - especially if the change is potentially
incompatible. Also, what we expect the release schedule to be for major
releases and what triggers them - JVM version, major features, the need for
incompatible changes ? Assuming major versions will not be released every 6
months/1 year (adoption time, fairly disruptive for downstream projects,
and users) -  considering additional features/incompatible changes for 3.x
would be useful.

Some features that come to mind immediately would be
1) enhancements to the RPC mechanics - specifically support for AsynRPC /
two way communication. There's a lot of places where we re-use heartbeats
to send more information than what would be done if the PRC layer supported
these features. Some of this can be done in a compatible manner to the
existing RPC sub-system. Others like 2 way communication probably cannot.
After this, having HDFS/YARN actually make use of these changes. The other
consideration is adoption of an alternate system ike gRpc which would be
incompatible.
2) Simplification of configs - potentially separating client side configs
and those used by daemons. This is another source of perpetual confusion
for users.

Thanks
- Sid


On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
wrote:

> Sorry, outlook dequoted Alejandros's comments.
>
> Let me try again with his comments in italic and proofreading of mine
>
> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> stevel@hortonworks.com>> wrote:
>
>
>
> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> tucu00@gmail.com><ma...@gmail.com>> wrote:
>
> IMO, if part of the community wants to take on the responsibility and work
> that takes to do a new major release, we should not discourage them from
> doing that.
>
> Having multiple major branches active is a standard practice.
>
> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> long time to get out, and during that time 0.21, 0.22, got released and
> ignored; 0.23 picked up and used in production.
>
> The 2.04-alpha release was more of a troublespot as it got picked up
> widely enough to be used in products, and changes were made between that
> alpha & 2.2 itself which raised compatibility issues.
>
> For 3.x I'd propose
>
>
>   1.  Have less longevity of 3.x alpha/beta artifacts
>   2.  Make clear there are no guarantees of compatibility from alpha/beta
> releases to shipping. Best effort, but not to the extent that it gets in
> the way. More succinctly: we will care more about seamless migration from
> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
> phase
>
> As well as backwards compatibility, we need to think about Forwards
> compatibility, with the goal being:
>
> Any app written/shipped with the 3.x release binaries (JAR and native)
> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> where y>=x  and is-release(x) and is-release(y)
>
> That's important, as it means all server-side changes in 3.x which are
> expected to to mandate client-side updates: protocols, HDFS erasure
> decoding, security features, must be considered complete and stable before
> we can say is-release(x). In an ideal world, we'll even get the semantics
> right with tests to show this.
>
> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
> it's only one of the features, and given there's not any design doc on that
> JIRA, way too immature to set a release schedule on. An alpha schedule with
> no-guarantees and a regular alpha roll, could be viable, as new features go
> in and can then be used to experimentally try this stuff in branches of
> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> will be transitive downstream.
>
>
> This time around we are not replacing the guts as we did from Hadoop 1 to
> Hadoop 2, but superficial surgery to address issues were not considered (or
> was too much to take on top of the guts transplant).
>
> For the split brain concern, we did a great of job maintaining Hadoop 1 and
> Hadoop 2 until Hadoop 1 faded away.
>
> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> compatibility.
>
>
> Based on that experience I would say that the coexistence of Hadoop 2 and
> Hadoop 3 will be much less demanding/traumatic.
>
> The re-layout of all the source trees was a major change there, assuming
> there's no refactoring or switch of build tools then picking things back
> will be tractable
>
>
> Also, to facilitate the coexistence we should limit Java language features
> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
> we can remove this limitation.
>
> +1; setting javac.version will fix this
>
> What is nice about having java 8 as the base JVM is that it means you can
> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> and libs can use all Java 8 features they want to.
>
> There's one policy change to consider there which is possibly, just
> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> languages early, provided everyone recognised that "backport to branch-2"
> isn't going to happen.
>
> -Steve
>
>

Re: Looking to a Hadoop 3 release

Posted by Siddharth Seth <ss...@apache.org>.

I think it'll be useful to have a discussion about what else people would
like to see in Hadoop 3.x - especially if the change is potentially
incompatible. Also, what we expect the release schedule to be for major
releases and what triggers them - JVM version, major features, the need for
incompatible changes ? Assuming major versions will not be released every 6
months/1 year (adoption time, fairly disruptive for downstream projects,
and users) -  considering additional features/incompatible changes for 3.x
would be useful.

Some features that come to mind immediately would be
1) enhancements to the RPC mechanics - specifically support for AsynRPC /
two way communication. There's a lot of places where we re-use heartbeats
to send more information than what would be done if the PRC layer supported
these features. Some of this can be done in a compatible manner to the
existing RPC sub-system. Others like 2 way communication probably cannot.
After this, having HDFS/YARN actually make use of these changes. The other
consideration is adoption of an alternate system ike gRpc which would be
incompatible.
2) Simplification of configs - potentially separating client side configs
and those used by daemons. This is another source of perpetual confusion
for users.

Thanks
- Sid


On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
wrote:

> Sorry, outlook dequoted Alejandros's comments.
>
> Let me try again with his comments in italic and proofreading of mine
>
> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> stevel@hortonworks.com>> wrote:
>
>
>
> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> tucu00@gmail.com><ma...@gmail.com>> wrote:
>
> IMO, if part of the community wants to take on the responsibility and work
> that takes to do a new major release, we should not discourage them from
> doing that.
>
> Having multiple major branches active is a standard practice.
>
> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> long time to get out, and during that time 0.21, 0.22, got released and
> ignored; 0.23 picked up and used in production.
>
> The 2.04-alpha release was more of a troublespot as it got picked up
> widely enough to be used in products, and changes were made between that
> alpha & 2.2 itself which raised compatibility issues.
>
> For 3.x I'd propose
>
>
>   1.  Have less longevity of 3.x alpha/beta artifacts
>   2.  Make clear there are no guarantees of compatibility from alpha/beta
> releases to shipping. Best effort, but not to the extent that it gets in
> the way. More succinctly: we will care more about seamless migration from
> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
> phase
>
> As well as backwards compatibility, we need to think about Forwards
> compatibility, with the goal being:
>
> Any app written/shipped with the 3.x release binaries (JAR and native)
> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> where y>=x  and is-release(x) and is-release(y)
>
> That's important, as it means all server-side changes in 3.x which are
> expected to to mandate client-side updates: protocols, HDFS erasure
> decoding, security features, must be considered complete and stable before
> we can say is-release(x). In an ideal world, we'll even get the semantics
> right with tests to show this.
>
> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
> it's only one of the features, and given there's not any design doc on that
> JIRA, way too immature to set a release schedule on. An alpha schedule with
> no-guarantees and a regular alpha roll, could be viable, as new features go
> in and can then be used to experimentally try this stuff in branches of
> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> will be transitive downstream.
>
>
> This time around we are not replacing the guts as we did from Hadoop 1 to
> Hadoop 2, but superficial surgery to address issues were not considered (or
> was too much to take on top of the guts transplant).
>
> For the split brain concern, we did a great of job maintaining Hadoop 1 and
> Hadoop 2 until Hadoop 1 faded away.
>
> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> compatibility.
>
>
> Based on that experience I would say that the coexistence of Hadoop 2 and
> Hadoop 3 will be much less demanding/traumatic.
>
> The re-layout of all the source trees was a major change there, assuming
> there's no refactoring or switch of build tools then picking things back
> will be tractable
>
>
> Also, to facilitate the coexistence we should limit Java language features
> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
> we can remove this limitation.
>
> +1; setting javac.version will fix this
>
> What is nice about having java 8 as the base JVM is that it means you can
> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> and libs can use all Java 8 features they want to.
>
> There's one policy change to consider there which is possibly, just
> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> languages early, provided everyone recognised that "backport to branch-2"
> isn't going to happen.
>
> -Steve
>
>

Re: Looking to a Hadoop 3 release

Posted by Siddharth Seth <ss...@apache.org>.

I think it'll be useful to have a discussion about what else people would
like to see in Hadoop 3.x - especially if the change is potentially
incompatible. Also, what we expect the release schedule to be for major
releases and what triggers them - JVM version, major features, the need for
incompatible changes ? Assuming major versions will not be released every 6
months/1 year (adoption time, fairly disruptive for downstream projects,
and users) -  considering additional features/incompatible changes for 3.x
would be useful.

Some features that come to mind immediately would be
1) enhancements to the RPC mechanics - specifically support for AsynRPC /
two way communication. There's a lot of places where we re-use heartbeats
to send more information than what would be done if the PRC layer supported
these features. Some of this can be done in a compatible manner to the
existing RPC sub-system. Others like 2 way communication probably cannot.
After this, having HDFS/YARN actually make use of these changes. The other
consideration is adoption of an alternate system ike gRpc which would be
incompatible.
2) Simplification of configs - potentially separating client side configs
and those used by daemons. This is another source of perpetual confusion
for users.

Thanks
- Sid


On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
wrote:

> Sorry, outlook dequoted Alejandros's comments.
>
> Let me try again with his comments in italic and proofreading of mine
>
> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> stevel@hortonworks.com>> wrote:
>
>
>
> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> tucu00@gmail.com><ma...@gmail.com>> wrote:
>
> IMO, if part of the community wants to take on the responsibility and work
> that takes to do a new major release, we should not discourage them from
> doing that.
>
> Having multiple major branches active is a standard practice.
>
> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> long time to get out, and during that time 0.21, 0.22, got released and
> ignored; 0.23 picked up and used in production.
>
> The 2.04-alpha release was more of a troublespot as it got picked up
> widely enough to be used in products, and changes were made between that
> alpha & 2.2 itself which raised compatibility issues.
>
> For 3.x I'd propose
>
>
>   1.  Have less longevity of 3.x alpha/beta artifacts
>   2.  Make clear there are no guarantees of compatibility from alpha/beta
> releases to shipping. Best effort, but not to the extent that it gets in
> the way. More succinctly: we will care more about seamless migration from
> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
> phase
>
> As well as backwards compatibility, we need to think about Forwards
> compatibility, with the goal being:
>
> Any app written/shipped with the 3.x release binaries (JAR and native)
> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> where y>=x  and is-release(x) and is-release(y)
>
> That's important, as it means all server-side changes in 3.x which are
> expected to to mandate client-side updates: protocols, HDFS erasure
> decoding, security features, must be considered complete and stable before
> we can say is-release(x). In an ideal world, we'll even get the semantics
> right with tests to show this.
>
> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
> it's only one of the features, and given there's not any design doc on that
> JIRA, way too immature to set a release schedule on. An alpha schedule with
> no-guarantees and a regular alpha roll, could be viable, as new features go
> in and can then be used to experimentally try this stuff in branches of
> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> will be transitive downstream.
>
>
> This time around we are not replacing the guts as we did from Hadoop 1 to
> Hadoop 2, but superficial surgery to address issues were not considered (or
> was too much to take on top of the guts transplant).
>
> For the split brain concern, we did a great of job maintaining Hadoop 1 and
> Hadoop 2 until Hadoop 1 faded away.
>
> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> compatibility.
>
>
> Based on that experience I would say that the coexistence of Hadoop 2 and
> Hadoop 3 will be much less demanding/traumatic.
>
> The re-layout of all the source trees was a major change there, assuming
> there's no refactoring or switch of build tools then picking things back
> will be tractable
>
>
> Also, to facilitate the coexistence we should limit Java language features
> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
> we can remove this limitation.
>
> +1; setting javac.version will fix this
>
> What is nice about having java 8 as the base JVM is that it means you can
> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> and libs can use all Java 8 features they want to.
>
> There's one policy change to consider there which is possibly, just
> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> languages early, provided everyone recognised that "backport to branch-2"
> isn't going to happen.
>
> -Steve
>
>

Re: Looking to a Hadoop 3 release

Posted by Siddharth Seth <ss...@apache.org>.

I think it'll be useful to have a discussion about what else people would
like to see in Hadoop 3.x - especially if the change is potentially
incompatible. Also, what we expect the release schedule to be for major
releases and what triggers them - JVM version, major features, the need for
incompatible changes ? Assuming major versions will not be released every 6
months/1 year (adoption time, fairly disruptive for downstream projects,
and users) -  considering additional features/incompatible changes for 3.x
would be useful.

Some features that come to mind immediately would be
1) enhancements to the RPC mechanics - specifically support for AsynRPC /
two way communication. There's a lot of places where we re-use heartbeats
to send more information than what would be done if the PRC layer supported
these features. Some of this can be done in a compatible manner to the
existing RPC sub-system. Others like 2 way communication probably cannot.
After this, having HDFS/YARN actually make use of these changes. The other
consideration is adoption of an alternate system ike gRpc which would be
incompatible.
2) Simplification of configs - potentially separating client side configs
and those used by daemons. This is another source of perpetual confusion
for users.

Thanks
- Sid


On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <st...@hortonworks.com>
wrote:

> Sorry, outlook dequoted Alejandros's comments.
>
> Let me try again with his comments in italic and proofreading of mine
>
> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
> stevel@hortonworks.com>> wrote:
>
>
>
> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
> tucu00@gmail.com><ma...@gmail.com>> wrote:
>
> IMO, if part of the community wants to take on the responsibility and work
> that takes to do a new major release, we should not discourage them from
> doing that.
>
> Having multiple major branches active is a standard practice.
>
> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> long time to get out, and during that time 0.21, 0.22, got released and
> ignored; 0.23 picked up and used in production.
>
> The 2.04-alpha release was more of a troublespot as it got picked up
> widely enough to be used in products, and changes were made between that
> alpha & 2.2 itself which raised compatibility issues.
>
> For 3.x I'd propose
>
>
>   1.  Have less longevity of 3.x alpha/beta artifacts
>   2.  Make clear there are no guarantees of compatibility from alpha/beta
> releases to shipping. Best effort, but not to the extent that it gets in
> the way. More succinctly: we will care more about seamless migration from
> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
> phase
>
> As well as backwards compatibility, we need to think about Forwards
> compatibility, with the goal being:
>
> Any app written/shipped with the 3.x release binaries (JAR and native)
> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> where y>=x  and is-release(x) and is-release(y)
>
> That's important, as it means all server-side changes in 3.x which are
> expected to to mandate client-side updates: protocols, HDFS erasure
> decoding, security features, must be considered complete and stable before
> we can say is-release(x). In an ideal world, we'll even get the semantics
> right with tests to show this.
>
> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
> it's only one of the features, and given there's not any design doc on that
> JIRA, way too immature to set a release schedule on. An alpha schedule with
> no-guarantees and a regular alpha roll, could be viable, as new features go
> in and can then be used to experimentally try this stuff in branches of
> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> will be transitive downstream.
>
>
> This time around we are not replacing the guts as we did from Hadoop 1 to
> Hadoop 2, but superficial surgery to address issues were not considered (or
> was too much to take on top of the guts transplant).
>
> For the split brain concern, we did a great of job maintaining Hadoop 1 and
> Hadoop 2 until Hadoop 1 faded away.
>
> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> compatibility.
>
>
> Based on that experience I would say that the coexistence of Hadoop 2 and
> Hadoop 3 will be much less demanding/traumatic.
>
> The re-layout of all the source trees was a major change there, assuming
> there's no refactoring or switch of build tools then picking things back
> will be tractable
>
>
> Also, to facilitate the coexistence we should limit Java language features
> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
> we can remove this limitation.
>
> +1; setting javac.version will fix this
>
> What is nice about having java 8 as the base JVM is that it means you can
> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> and libs can use all Java 8 features they want to.
>
> There's one policy change to consider there which is possibly, just
> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> languages early, provided everyone recognised that "backport to branch-2"
> isn't going to happen.
>
> -Steve
>
>

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

Sorry, outlook dequoted Alejandros's comments.

Let me try again with his comments in italic and proofreading of mine

On 05/03/2015 13:59, "Steve Loughran" <st...@hortonworks.com>> wrote:

On 05/03/2015 13:05, "Alejandro Abdelnur" <tu...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha & 2.2 itself which raised compatibility issues.

For 3.x I'd propose

1. Have less longevity of 3.x alpha/beta artifacts
2. Make clear there are no guarantees of compatibility from alpha/beta releases to shipping. Best effort, but not to the extent that it gets in the way. More succinctly: we will care more about seamless migration from 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
3. Anybody who ships code based on 3.x alpha/beta to recognise and accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta phase

As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will work in and against a 3.y Hadoop cluster, for all x, y in Natural where y>=x and is-release(x) and is-release(y)

That's important, as it means all server-side changes in 3.x which are expected to to mandate client-side updates: protocols, HDFS erasure decoding, security features, must be considered complete and stable before we can say is-release(x). In an ideal world, we'll even get the semantics right with tests to show this.

Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's only one of the features, and given there's not any design doc on that JIRA, way too immature to set a release schedule on. An alpha schedule with no-guarantees and a regular alpha roll, could be viable, as new features go in and can then be used to experimentally try this stuff in branches of Hbase (well volunteered, Stack!), etc. Of course instability guarantees will be transitive downstream.

This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to.

There's one policy change to consider there which is possibly, just possibly, we could allow new modules in hadoop-tools to adopt Java 8 languages early, provided everyone recognised that "backport to branch-2" isn't going to happen.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

Sorry, outlook dequoted Alejandros's comments.

Let me try again with his comments in italic and proofreading of mine

On 05/03/2015 13:59, "Steve Loughran" <st...@hortonworks.com>> wrote:

On 05/03/2015 13:05, "Alejandro Abdelnur" <tu...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha & 2.2 itself which raised compatibility issues.

For 3.x I'd propose

As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will work in and against a 3.y Hadoop cluster, for all x, y in Natural where y>=x and is-release(x) and is-release(y)

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

Sorry, outlook dequoted Alejandros's comments.

Let me try again with his comments in italic and proofreading of mine

On 05/03/2015 13:59, "Steve Loughran" <st...@hortonworks.com>> wrote:

On 05/03/2015 13:05, "Alejandro Abdelnur" <tu...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha & 2.2 itself which raised compatibility issues.

For 3.x I'd propose

As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will work in and against a 3.y Hadoop cluster, for all x, y in Natural where y>=x and is-release(x) and is-release(y)

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

Sorry, outlook dequoted Alejandros's comments.

Let me try again with his comments in italic and proofreading of mine

On 05/03/2015 13:59, "Steve Loughran" <st...@hortonworks.com>> wrote:

On 05/03/2015 13:05, "Alejandro Abdelnur" <tu...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha & 2.2 itself which raised compatibility issues.

For 3.x I'd propose

As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will work in and against a 3.y Hadoop cluster, for all x, y in Natural where y>=x and is-release(x) and is-release(y)

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

On 05/03/2015 13:05, "Alejandro Abdelnur" <tu...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha & 2.2 itself which raised compatibility issues.

For 3.x I'd propose

As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will work against a 3.y Hadoop release, for all x, y in Natural where y>=x and is-release(x) and is-release(y)

Fixing classpath hell downstream is certainly one feature I am +1 on this roadmap is classpath isolation. But: it's only one of the features, and given there's not any design doc on that JIRA, way too immature to set a release schedule on. An alpha schedule with no-guarantees and a regular alpha roll, could be viable, as new features go in and can then be used to experimentally try this stuff in branches of Hbase (well volunteered, Stack!), etc. Of course instability guarantees will transitive

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

On 05/03/2015 13:05, "Alejandro Abdelnur" <tu...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha & 2.2 itself which raised compatibility issues.

For 3.x I'd propose

As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will work against a 3.y Hadoop release, for all x, y in Natural where y>=x and is-release(x) and is-release(y)

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

On 05/03/2015 13:05, "Alejandro Abdelnur" <tu...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha & 2.2 itself which raised compatibility issues.

For 3.x I'd propose

As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will work against a 3.y Hadoop release, for all x, y in Natural where y>=x and is-release(x) and is-release(y)

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

On 05/03/2015 13:05, "Alejandro Abdelnur" <tu...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha & 2.2 itself which raised compatibility issues.

For 3.x I'd propose

As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will work against a 3.y Hadoop release, for all x, y in Natural where y>=x and is-release(x) and is-release(y)

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to.

-Steve

Re: Looking to a Hadoop 3 release

Posted by Alejandro Abdelnur <tu...@gmail.com>.

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

Thanks.


On Thu, Mar 5, 2015 at 11:40 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The 'resistance' is not so much about  a new major release, more so about
> the content and the roadmap of the release. Other than the two specific
> features raised (the need for breaking compat for them is something that I
> am debating), I haven't seen a roadmap of branch-3 about any more features
> that this community needs to discuss about. If all the difference between
> branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it
> is a big problem in two dimensions (1) it's a burden keeping the branches
> in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse
> branch-0.23, branch-2 and (2) very hard to ask people to not break more
> things in branch-3.
>
> We seem to have agreed upon a course of action for JDK7. And now we are
> taking a different direction for JDK8. Going by this new proposal, come
> 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop
> releases.
>
> Regarding, individual improvements like classpath isolation, shell script
> stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be
> possible for every major feature that we develop to be a opt in, unless the
> change is so great and users can balance out the incompatibilities for the
> new stuff they are getting. Even with an ground breaking change like with
> YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that
> has paid so many times over in return. Breaking compatibility shouldn't
> come across as too cheap a thing.
>
> Thanks,
> +Vinod
>
> On Mar 4, 2015, at 10:15 AM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
>
>

Re: Looking to a Hadoop 3 release

Posted by Alejandro Abdelnur <tu...@gmail.com>.

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

Thanks.


On Thu, Mar 5, 2015 at 11:40 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The 'resistance' is not so much about  a new major release, more so about
> the content and the roadmap of the release. Other than the two specific
> features raised (the need for breaking compat for them is something that I
> am debating), I haven't seen a roadmap of branch-3 about any more features
> that this community needs to discuss about. If all the difference between
> branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it
> is a big problem in two dimensions (1) it's a burden keeping the branches
> in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse
> branch-0.23, branch-2 and (2) very hard to ask people to not break more
> things in branch-3.
>
> We seem to have agreed upon a course of action for JDK7. And now we are
> taking a different direction for JDK8. Going by this new proposal, come
> 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop
> releases.
>
> Regarding, individual improvements like classpath isolation, shell script
> stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be
> possible for every major feature that we develop to be a opt in, unless the
> change is so great and users can balance out the incompatibilities for the
> new stuff they are getting. Even with an ground breaking change like with
> YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that
> has paid so many times over in return. Breaking compatibility shouldn't
> come across as too cheap a thing.
>
> Thanks,
> +Vinod
>
> On Mar 4, 2015, at 10:15 AM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
>
>

Re: Looking to a Hadoop 3 release

Posted by Alejandro Abdelnur <tu...@gmail.com>.

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

Thanks.


On Thu, Mar 5, 2015 at 11:40 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The 'resistance' is not so much about  a new major release, more so about
> the content and the roadmap of the release. Other than the two specific
> features raised (the need for breaking compat for them is something that I
> am debating), I haven't seen a roadmap of branch-3 about any more features
> that this community needs to discuss about. If all the difference between
> branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it
> is a big problem in two dimensions (1) it's a burden keeping the branches
> in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse
> branch-0.23, branch-2 and (2) very hard to ask people to not break more
> things in branch-3.
>
> We seem to have agreed upon a course of action for JDK7. And now we are
> taking a different direction for JDK8. Going by this new proposal, come
> 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop
> releases.
>
> Regarding, individual improvements like classpath isolation, shell script
> stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be
> possible for every major feature that we develop to be a opt in, unless the
> change is so great and users can balance out the incompatibilities for the
> new stuff they are getting. Even with an ground breaking change like with
> YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that
> has paid so many times over in return. Breaking compatibility shouldn't
> come across as too cheap a thing.
>
> Thanks,
> +Vinod
>
> On Mar 4, 2015, at 10:15 AM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
>
>

Re: Looking to a Hadoop 3 release

Posted by Alejandro Abdelnur <tu...@gmail.com>.

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

Thanks.


On Thu, Mar 5, 2015 at 11:40 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The 'resistance' is not so much about  a new major release, more so about
> the content and the roadmap of the release. Other than the two specific
> features raised (the need for breaking compat for them is something that I
> am debating), I haven't seen a roadmap of branch-3 about any more features
> that this community needs to discuss about. If all the difference between
> branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it
> is a big problem in two dimensions (1) it's a burden keeping the branches
> in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse
> branch-0.23, branch-2 and (2) very hard to ask people to not break more
> things in branch-3.
>
> We seem to have agreed upon a course of action for JDK7. And now we are
> taking a different direction for JDK8. Going by this new proposal, come
> 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop
> releases.
>
> Regarding, individual improvements like classpath isolation, shell script
> stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be
> possible for every major feature that we develop to be a opt in, unless the
> change is so great and users can balance out the incompatibilities for the
> new stuff they are getting. Even with an ground breaking change like with
> YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that
> has paid so many times over in return. Breaking compatibility shouldn't
> come across as too cheap a thing.
>
> Thanks,
> +Vinod
>
> On Mar 4, 2015, at 10:15 AM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
>
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

The 'resistance' is not so much about a new major release, more so about the content and the roadmap of the release. Other than the two specific features raised (the need for breaking compat for them is something that I am debating), I haven't seen a roadmap of branch-3 about any more features that this community needs to discuss about. If all the difference between branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it is a big problem in two dimensions (1) it's a burden keeping the branches in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse branch-0.23, branch-2 and (2) very hard to ask people to not break more things in branch-3.

We seem to have agreed upon a course of action for JDK7. And now we are taking a different direction for JDK8. Going by this new proposal, come 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop releases.

Regarding, individual improvements like classpath isolation, shell script stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be possible for every major feature that we develop to be a opt in, unless the change is so great and users can balance out the incompatibilities for the new stuff they are getting. Even with an ground breaking change like with YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that has paid so many times over in return. Breaking compatibility shouldn't come across as too cheap a thing.

Thanks,
+Vinod

On Mar 4, 2015, at 10:15 AM, Andrew Wang <an...@cloudera.com>> wrote:

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Thanks,
+Vinod

On Mar 4, 2015, at 10:15 AM, Andrew Wang <an...@cloudera.com>> wrote:

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Thanks,
+Vinod

On Mar 4, 2015, at 10:15 AM, Andrew Wang <an...@cloudera.com>> wrote:

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Thanks,
+Vinod

On Mar 4, 2015, at 10:15 AM, Andrew Wang <an...@cloudera.com>> wrote:

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> ________________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> ________________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> ________________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> ________________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> yarn-dev@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang <andrew.wang@cloudera.com<mailto:
> andrew.wang@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

Posted by Arun Murthy <ac...@hortonworks.com>.

Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release.

Thanks Vinod.

Arun

________________________________________
From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
Sent: Tuesday, March 03, 2015 2:30 PM
To: common-dev@hadoop.apache.org
Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

Posted by Arun Murthy <ac...@hortonworks.com>.

Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release.

Thanks Vinod.

Arun

________________________________________
From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
Sent: Tuesday, March 03, 2015 2:30 PM
To: common-dev@hadoop.apache.org
Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

Posted by Arun Murthy <ac...@hortonworks.com>.

Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release.

Thanks Vinod.

Arun

________________________________________
From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
Sent: Tuesday, March 03, 2015 2:30 PM
To: common-dev@hadoop.apache.org
Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

Posted by Arun Murthy <ac...@hortonworks.com>.

Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release.

Thanks Vinod.

Arun

________________________________________
From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
Sent: Tuesday, March 03, 2015 2:30 PM
To: common-dev@hadoop.apache.org
Cc: hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Moving to JDK8 involves a lot of things
 (1) Get Hadoop apps to be able to run on JDK8 and chose JDK8 language features. This is already possible with the decoupling of apps from the platform.
 (2) Get the platform to run on JDK8. This can be done so that we can run Hadoop on both JDK8 and JDK7 without any compatibility issues. This in itself is a huge move, what with potential GC behavior changes, native library compat etc.
 (3) Get the platform to use JDK8 language features. As much as I love the new stuff in JDK8, I'm willing to postpone usage of the language features in the platform till the time when JDK8 is already in full force.

So, how about we do (1) + (2) for now, get JDK8 going and then come around to make the decision of dropping support for JDK7? This is no different from what we did for the adoption of JDK7. For a bit of time (2/3 releases?), we were able to run on both JDK6 and JDK7 and we are phasing out JDK6 only when most of the community stopped using it.

Thanks,
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com> wrote:
>> Given that we already agreed to put in JDK7 in 2.7, and that the
>> classpath is a fairly minor irritant given some existing solutions (e.g. a
>> new default classloader), how do you quantify the benefit for users?
>> 
>> I looked at our thread on this topic from last time, and we (meaning at
> least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
> 2.x for practical reasons. We waited for so long that we had some assurance
> JDK6 was on the outs. Multiple distros also already had bumped their min
> version to JDK7. This is not true this time around. Bumping the JDK version
> is hugely impactful on the end user, and my email on the earlier thread
> still reflects my thoughts on JDK compatibility:
> 
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E
> 
>> .....

> Right now, the incompatible changes would be JDK8, classpath isolation, and
> whatever is already in trunk. I can audit these existing trunk changes when
> branch-3 is cut.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Moving to JDK8 involves a lot of things
 (1) Get Hadoop apps to be able to run on JDK8 and chose JDK8 language features. This is already possible with the decoupling of apps from the platform.
 (2) Get the platform to run on JDK8. This can be done so that we can run Hadoop on both JDK8 and JDK7 without any compatibility issues. This in itself is a huge move, what with potential GC behavior changes, native library compat etc.
 (3) Get the platform to use JDK8 language features. As much as I love the new stuff in JDK8, I'm willing to postpone usage of the language features in the platform till the time when JDK8 is already in full force.

So, how about we do (1) + (2) for now, get JDK8 going and then come around to make the decision of dropping support for JDK7? This is no different from what we did for the adoption of JDK7. For a bit of time (2/3 releases?), we were able to run on both JDK6 and JDK7 and we are phasing out JDK6 only when most of the community stopped using it.

Thanks,
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com> wrote:
>> Given that we already agreed to put in JDK7 in 2.7, and that the
>> classpath is a fairly minor irritant given some existing solutions (e.g. a
>> new default classloader), how do you quantify the benefit for users?
>> 
>> I looked at our thread on this topic from last time, and we (meaning at
> least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
> 2.x for practical reasons. We waited for so long that we had some assurance
> JDK6 was on the outs. Multiple distros also already had bumped their min
> version to JDK7. This is not true this time around. Bumping the JDK version
> is hugely impactful on the end user, and my email on the earlier thread
> still reflects my thoughts on JDK compatibility:
> 
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E
> 
>> .....

> Right now, the incompatible changes would be JDK8, classpath isolation, and
> whatever is already in trunk. I can audit these existing trunk changes when
> branch-3 is cut.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Moving to JDK8 involves a lot of things
 (1) Get Hadoop apps to be able to run on JDK8 and chose JDK8 language features. This is already possible with the decoupling of apps from the platform.
 (2) Get the platform to run on JDK8. This can be done so that we can run Hadoop on both JDK8 and JDK7 without any compatibility issues. This in itself is a huge move, what with potential GC behavior changes, native library compat etc.
 (3) Get the platform to use JDK8 language features. As much as I love the new stuff in JDK8, I'm willing to postpone usage of the language features in the platform till the time when JDK8 is already in full force.

So, how about we do (1) + (2) for now, get JDK8 going and then come around to make the decision of dropping support for JDK7? This is no different from what we did for the adoption of JDK7. For a bit of time (2/3 releases?), we were able to run on both JDK6 and JDK7 and we are phasing out JDK6 only when most of the community stopped using it.

Thanks,
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang <an...@cloudera.com> wrote:
>> Given that we already agreed to put in JDK7 in 2.7, and that the
>> classpath is a fairly minor irritant given some existing solutions (e.g. a
>> new default classloader), how do you quantify the benefit for users?
>> 
>> I looked at our thread on this topic from last time, and we (meaning at
> least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
> 2.x for practical reasons. We waited for so long that we had some assurance
> JDK6 was on the outs. Multiple distros also already had bumped their min
> version to JDK7. This is not true this time around. Bumping the JDK version
> is hugely impactful on the end user, and my email on the earlier thread
> still reflects my thoughts on JDK compatibility:
> 
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E
> 
>> .....

> Right now, the incompatible changes would be JDK8, classpath isolation, and
> whatever is already in trunk. I can audit these existing trunk changes when
> branch-3 is cut.

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

 Thanks as always for the feedback everyone. Some inline comments to Arun's
email, as his were the most extensive:


>  Given that we already agreed to put in JDK7 in 2.7, and that the
> classpath is a fairly minor irritant given some existing solutions (e.g. a
> new default classloader), how do you quantify the benefit for users?
>
> I looked at our thread on this topic from last time, and we (meaning at
least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
2.x for practical reasons. We waited for so long that we had some assurance
JDK6 was on the outs. Multiple distros also already had bumped their min
version to JDK7. This is not true this time around. Bumping the JDK version
is hugely impactful on the end user, and my email on the earlier thread
still reflects my thoughts on JDK compatibility:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Having the freedom to upgrade our dependencies at will would also be a big
win for us as developers.

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely
> welcome to run the RM role for that release.
>
>  Furthermore, I'm really concerned that this will be used as an
> opportunity to further break compat in more egregious ways.
>
>  Also, are you foreseeing more compat breaks? OTOH, if we all agree that
> we should absolutely prevent compat breakages such as the client-server
> wire protocol, I feel the point of a major release is kinda lost.
>
>
Right now, the incompatible changes would be JDK8, classpath isolation, and
whatever is already in trunk. I can audit these existing trunk changes when
branch-3 is cut.

I would like to keep this list as short as possible, to preserve wire
compat and rolling upgrade. As far as major releases go, this is not one to
be scared of. However, since it's incompatible, it still needs that major
version bump.

Best,
Andrew

P.S. Vinod, the shell script rewrite is incompatible. Allen intentionally
excluded it from branch-2 for this reason.

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

 Thanks as always for the feedback everyone. Some inline comments to Arun's
email, as his were the most extensive:


>  Given that we already agreed to put in JDK7 in 2.7, and that the
> classpath is a fairly minor irritant given some existing solutions (e.g. a
> new default classloader), how do you quantify the benefit for users?
>
> I looked at our thread on this topic from last time, and we (meaning at
least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
2.x for practical reasons. We waited for so long that we had some assurance
JDK6 was on the outs. Multiple distros also already had bumped their min
version to JDK7. This is not true this time around. Bumping the JDK version
is hugely impactful on the end user, and my email on the earlier thread
still reflects my thoughts on JDK compatibility:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Having the freedom to upgrade our dependencies at will would also be a big
win for us as developers.

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely
> welcome to run the RM role for that release.
>
>  Furthermore, I'm really concerned that this will be used as an
> opportunity to further break compat in more egregious ways.
>
>  Also, are you foreseeing more compat breaks? OTOH, if we all agree that
> we should absolutely prevent compat breakages such as the client-server
> wire protocol, I feel the point of a major release is kinda lost.
>
>
Right now, the incompatible changes would be JDK8, classpath isolation, and
whatever is already in trunk. I can audit these existing trunk changes when
branch-3 is cut.

I would like to keep this list as short as possible, to preserve wire
compat and rolling upgrade. As far as major releases go, this is not one to
be scared of. However, since it's incompatible, it still needs that major
version bump.

Best,
Andrew

P.S. Vinod, the shell script rewrite is incompatible. Allen intentionally
excluded it from branch-2 for this reason.

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Agreed. The difference between a 3.0 GA release and a parallel 2.x release line is just JDK8 + a different classpath (potentially isolated) - doesn't sound like a big enough delta warranting the license to break compat.

Thanks,
+Vinod

On Mar 2, 2015, at 6:30 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Andrew,
> 
> Thanks for bringing up this discussion.
> 
> I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.
> 
> IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. 
> 
> Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 
> 
> However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.
> 
> Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?
> 
> We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.
> 
> Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. 
> 
> Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.
> 
> Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 
> 
> Thoughts?
> 
> thanks,
> Arun
> 
> ________________________________________
> From: Andrew Wang <an...@cloudera.com>
> Sent: Monday, March 02, 2015 3:19 PM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Looking to a Hadoop 3 release
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

 Thanks as always for the feedback everyone. Some inline comments to Arun's
email, as his were the most extensive:


>  Given that we already agreed to put in JDK7 in 2.7, and that the
> classpath is a fairly minor irritant given some existing solutions (e.g. a
> new default classloader), how do you quantify the benefit for users?
>
> I looked at our thread on this topic from last time, and we (meaning at
least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
2.x for practical reasons. We waited for so long that we had some assurance
JDK6 was on the outs. Multiple distros also already had bumped their min
version to JDK7. This is not true this time around. Bumping the JDK version
is hugely impactful on the end user, and my email on the earlier thread
still reflects my thoughts on JDK compatibility:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Having the freedom to upgrade our dependencies at will would also be a big
win for us as developers.

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely
> welcome to run the RM role for that release.
>
>  Furthermore, I'm really concerned that this will be used as an
> opportunity to further break compat in more egregious ways.
>
>  Also, are you foreseeing more compat breaks? OTOH, if we all agree that
> we should absolutely prevent compat breakages such as the client-server
> wire protocol, I feel the point of a major release is kinda lost.
>
>
Right now, the incompatible changes would be JDK8, classpath isolation, and
whatever is already in trunk. I can audit these existing trunk changes when
branch-3 is cut.

I would like to keep this list as short as possible, to preserve wire
compat and rolling upgrade. As far as major releases go, this is not one to
be scared of. However, since it's incompatible, it still needs that major
version bump.

Best,
Andrew

P.S. Vinod, the shell script rewrite is incompatible. Allen intentionally
excluded it from branch-2 for this reason.

Re: Looking to a Hadoop 3 release

Posted by Steve Loughran <st...@hortonworks.com>.

I'm +1 for a migrate to Java 8 as soon as possible.

That's branch-2 & trunk, as having them on the same language level makes cherrypicking stuff off trunk possible. That's particularly the case for Java 8 as it is the first major change to the language since Java 5.

w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully not as long as the 2.x release process, but you never know.   Which means I expect some more Hadoop 2 releases this year. We need to make the jump there too, get 2.7 out the door and include a roadmap in there to when the java 8+ only event happens across the codebase.


-Steve


ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on the classpath of a maven build. Last time I tried there were some (minor) bits of YARN that wouldn't compile...




On 2 March 2015 at 18:31:00, Arun Murthy (acm@hortonworks.com<ma...@hortonworks.com>) wrote:

Andrew,

Thanks for bringing up this discussion.

I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.

IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount.

Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1).

However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.

Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?

We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.

Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways.

Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.

Overall, my biggest concern is the compatibility story vis-a-vis the benefit.

Thoughts?

thanks,
Arun

________________________________________
From: Andrew Wang <an...@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

 Thanks as always for the feedback everyone. Some inline comments to Arun's
email, as his were the most extensive:


>  Given that we already agreed to put in JDK7 in 2.7, and that the
> classpath is a fairly minor irritant given some existing solutions (e.g. a
> new default classloader), how do you quantify the benefit for users?
>
> I looked at our thread on this topic from last time, and we (meaning at
least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
2.x for practical reasons. We waited for so long that we had some assurance
JDK6 was on the outs. Multiple distros also already had bumped their min
version to JDK7. This is not true this time around. Bumping the JDK version
is hugely impactful on the end user, and my email on the earlier thread
still reflects my thoughts on JDK compatibility:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Having the freedom to upgrade our dependencies at will would also be a big
win for us as developers.

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely
> welcome to run the RM role for that release.
>
>  Furthermore, I'm really concerned that this will be used as an
> opportunity to further break compat in more egregious ways.
>
>  Also, are you foreseeing more compat breaks? OTOH, if we all agree that
> we should absolutely prevent compat breakages such as the client-server
> wire protocol, I feel the point of a major release is kinda lost.
>
>
Right now, the incompatible changes would be JDK8, classpath isolation, and
whatever is already in trunk. I can audit these existing trunk changes when
branch-3 is cut.

I would like to keep this list as short as possible, to preserve wire
compat and rolling upgrade. As far as major releases go, this is not one to
be scared of. However, since it's incompatible, it still needs that major
version bump.

Best,
Andrew

P.S. Vinod, the shell script rewrite is incompatible. Allen intentionally
excluded it from branch-2 for this reason.

Re: Looking to a Hadoop 3 release

Posted by Arun Murthy <ac...@hortonworks.com>.

Andrew,

 Thanks for bringing up this discussion.

 I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.

 IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. 

 Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 

 However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.

 Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.

 Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. 

 Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.

 Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 

 Thoughts?

thanks,
Arun

________________________________________
From: Andrew Wang <an...@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Chris Douglas <cd...@apache.org>.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.

Agreed; let's try to minimize backporting headaches. Pulling trunk >
branch-2 > branch-2.x is already tedious. Adding a branch-3,
branch-3.x would be obnoxious.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.

+1 This sounds like a good approach. Marked as alpha, we can break
compatibility in minor versions. Stabilizing a beta can correspond
with cutting branch-3, since that will be winding down branch-2. This
shouldn't disrupt existing plans for branch-2.

However, this requires that committers not accumulate too much
compatibility debt in trunk. Undoing all that in branch-3 imposes a
burdensome tax. Scanning through Allen's diff: that doesn't appear to
be the case so far, but it recommends against developing features "in
place" on trunk. Just be considerate of users and developers who will
need to move from (and maintain) branch-2.

> I believe we can start planning 3.0 from trunk right after 2.7 is out.

If we're publishing a snapshot, we don't need too much planning. -C

> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
>> Hi devs,
>>
>> It's been a year and a half since 2.x went GA, and I think we're about due
>> for a 3.x release.
>> Notably, there are two incompatible changes I'd like to call out, that will
>> have a tremendous positive impact for our users.
>>
>> First, classpath isolation being done at HADOOP-11656, which has been a
>> long-standing request from many downstreams and Hadoop users.
>>
>> Second, bumping the source and target JDK version to JDK8 (related to
>> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
>> months from now). In the past, we've had issues with our dependencies
>> discontinuing support for old JDKs, so this will future-proof us.
>>
>> Between the two, we'll also have quite an opportunity to clean up and
>> upgrade our dependencies, another common user and developer request.
>>
>> I'd like to propose that we start rolling a series of monthly-ish series of
>> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
>> other cat herding responsibilities. There are already quite a few changes
>> slated for 3.0 besides the above (for instance the shell script rewrite) so
>> there's already value in a 3.0 alpha, and the more time we give downstreams
>> to integrate, the better.
>>
>> This opens up discussion about inclusion of other changes, but I'm hoping
>> to freeze incompatible changes after maybe two alphas, do a beta (with no
>> further incompat changes allowed), and then finally a 3.x GA. For those
>> keeping track, that means a 3.x GA in about four months.
>>
>> I would also like to stress though that this is not intended to be a big
>> bang release. For instance, it would be great if we could maintain wire
>> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
>> branch-2 and branch-3 similar also makes backports easier, since we're
>> likely maintaining 2.x for a while yet.
>>
>> Please let me know any comments / concerns related to the above. If people
>> are friendly to the idea, I'd like to cut a branch-3 and start working on
>> the first alpha.
>>
>> Best,
>> Andrew
>>

Re: Looking to a Hadoop 3 release

Posted by Chris Douglas <cd...@apache.org>.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.

Agreed; let's try to minimize backporting headaches. Pulling trunk >
branch-2 > branch-2.x is already tedious. Adding a branch-3,
branch-3.x would be obnoxious.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.

+1 This sounds like a good approach. Marked as alpha, we can break
compatibility in minor versions. Stabilizing a beta can correspond
with cutting branch-3, since that will be winding down branch-2. This
shouldn't disrupt existing plans for branch-2.

However, this requires that committers not accumulate too much
compatibility debt in trunk. Undoing all that in branch-3 imposes a
burdensome tax. Scanning through Allen's diff: that doesn't appear to
be the case so far, but it recommends against developing features "in
place" on trunk. Just be considerate of users and developers who will
need to move from (and maintain) branch-2.

> I believe we can start planning 3.0 from trunk right after 2.7 is out.

If we're publishing a snapshot, we don't need too much planning. -C

> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
>> Hi devs,
>>
>> It's been a year and a half since 2.x went GA, and I think we're about due
>> for a 3.x release.
>> Notably, there are two incompatible changes I'd like to call out, that will
>> have a tremendous positive impact for our users.
>>
>> First, classpath isolation being done at HADOOP-11656, which has been a
>> long-standing request from many downstreams and Hadoop users.
>>
>> Second, bumping the source and target JDK version to JDK8 (related to
>> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
>> months from now). In the past, we've had issues with our dependencies
>> discontinuing support for old JDKs, so this will future-proof us.
>>
>> Between the two, we'll also have quite an opportunity to clean up and
>> upgrade our dependencies, another common user and developer request.
>>
>> I'd like to propose that we start rolling a series of monthly-ish series of
>> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
>> other cat herding responsibilities. There are already quite a few changes
>> slated for 3.0 besides the above (for instance the shell script rewrite) so
>> there's already value in a 3.0 alpha, and the more time we give downstreams
>> to integrate, the better.
>>
>> This opens up discussion about inclusion of other changes, but I'm hoping
>> to freeze incompatible changes after maybe two alphas, do a beta (with no
>> further incompat changes allowed), and then finally a 3.x GA. For those
>> keeping track, that means a 3.x GA in about four months.
>>
>> I would also like to stress though that this is not intended to be a big
>> bang release. For instance, it would be great if we could maintain wire
>> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
>> branch-2 and branch-3 similar also makes backports easier, since we're
>> likely maintaining 2.x for a while yet.
>>
>> Please let me know any comments / concerns related to the above. If people
>> are friendly to the idea, I'd like to cut a branch-3 and start working on
>> the first alpha.
>>
>> Best,
>> Andrew
>>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Konst, thanks for taking a look. I think I essentially agree with your
points.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko <sh...@gmail.com>
wrote:

> Andrew,
>
> Hadoop 3 seems in general like a good idea to me.
> 1. I did not understand if you propose to release 3.0 instead of 2.7 or in
> addition?
>    I think 2.7 is needed at least as a stabilization step for the 2.x line.
>
> I agree with this, 2.7 is needed, and I think Vinod/Arun are working on it
now.

I expect branch-2 to be maintained for a while yet, separate from a
branch-3.

> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.
>
> My motivations here are purely what I've stated above. I remember the pain
of the branch-1 days as well, and this would be a far, far smaller
difference. JDK8 min version and classpath isolation are compelling, yet
incompatible, which is why I'm proposing Hadoop 3. Besides those two
features, it should be approximately the same "size" as our 2.x releases.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.
> I believe we can start planning 3.0 from trunk right after 2.7 is out.
>

I agree with this, and would be okay with this if our audit of trunk
reveals no incompatible changes we're uncomfortable releasing.

I'll note though that committing to multiple branches is way easier now
with git and cherry-pick, so that overhead is reduced. Rolling out an alpha
now is strictly a good thing for our downstreams, even if it means we need
to do extra commits.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Konst, thanks for taking a look. I think I essentially agree with your
points.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko <sh...@gmail.com>
wrote:

> Andrew,
>
> Hadoop 3 seems in general like a good idea to me.
> 1. I did not understand if you propose to release 3.0 instead of 2.7 or in
> addition?
>    I think 2.7 is needed at least as a stabilization step for the 2.x line.
>
> I agree with this, 2.7 is needed, and I think Vinod/Arun are working on it
now.

I expect branch-2 to be maintained for a while yet, separate from a
branch-3.

> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.
>
> My motivations here are purely what I've stated above. I remember the pain
of the branch-1 days as well, and this would be a far, far smaller
difference. JDK8 min version and classpath isolation are compelling, yet
incompatible, which is why I'm proposing Hadoop 3. Besides those two
features, it should be approximately the same "size" as our 2.x releases.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.
> I believe we can start planning 3.0 from trunk right after 2.7 is out.
>

I agree with this, and would be okay with this if our audit of trunk
reveals no incompatible changes we're uncomfortable releasing.

I'll note though that committing to multiple branches is way easier now
with git and cherry-pick, so that overhead is reduced. Rolling out an alpha
now is strictly a good thing for our downstreams, even if it means we need
to do extra commits.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Konst, thanks for taking a look. I think I essentially agree with your
points.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko <sh...@gmail.com>
wrote:

> Andrew,
>
> Hadoop 3 seems in general like a good idea to me.
> 1. I did not understand if you propose to release 3.0 instead of 2.7 or in
> addition?
>    I think 2.7 is needed at least as a stabilization step for the 2.x line.
>
> I agree with this, 2.7 is needed, and I think Vinod/Arun are working on it
now.

I expect branch-2 to be maintained for a while yet, separate from a
branch-3.

> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.
>
> My motivations here are purely what I've stated above. I remember the pain
of the branch-1 days as well, and this would be a far, far smaller
difference. JDK8 min version and classpath isolation are compelling, yet
incompatible, which is why I'm proposing Hadoop 3. Besides those two
features, it should be approximately the same "size" as our 2.x releases.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.
> I believe we can start planning 3.0 from trunk right after 2.7 is out.
>

I agree with this, and would be okay with this if our audit of trunk
reveals no incompatible changes we're uncomfortable releasing.

I'll note though that committing to multiple branches is way easier now
with git and cherry-pick, so that overhead is reduced. Rolling out an alpha
now is strictly a good thing for our downstreams, even if it means we need
to do extra commits.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Chris Douglas <cd...@apache.org>.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.

Agreed; let's try to minimize backporting headaches. Pulling trunk >
branch-2 > branch-2.x is already tedious. Adding a branch-3,
branch-3.x would be obnoxious.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.

+1 This sounds like a good approach. Marked as alpha, we can break
compatibility in minor versions. Stabilizing a beta can correspond
with cutting branch-3, since that will be winding down branch-2. This
shouldn't disrupt existing plans for branch-2.

However, this requires that committers not accumulate too much
compatibility debt in trunk. Undoing all that in branch-3 imposes a
burdensome tax. Scanning through Allen's diff: that doesn't appear to
be the case so far, but it recommends against developing features "in
place" on trunk. Just be considerate of users and developers who will
need to move from (and maintain) branch-2.

> I believe we can start planning 3.0 from trunk right after 2.7 is out.

If we're publishing a snapshot, we don't need too much planning. -C

> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
>> Hi devs,
>>
>> It's been a year and a half since 2.x went GA, and I think we're about due
>> for a 3.x release.
>> Notably, there are two incompatible changes I'd like to call out, that will
>> have a tremendous positive impact for our users.
>>
>> First, classpath isolation being done at HADOOP-11656, which has been a
>> long-standing request from many downstreams and Hadoop users.
>>
>> Second, bumping the source and target JDK version to JDK8 (related to
>> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
>> months from now). In the past, we've had issues with our dependencies
>> discontinuing support for old JDKs, so this will future-proof us.
>>
>> Between the two, we'll also have quite an opportunity to clean up and
>> upgrade our dependencies, another common user and developer request.
>>
>> I'd like to propose that we start rolling a series of monthly-ish series of
>> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
>> other cat herding responsibilities. There are already quite a few changes
>> slated for 3.0 besides the above (for instance the shell script rewrite) so
>> there's already value in a 3.0 alpha, and the more time we give downstreams
>> to integrate, the better.
>>
>> This opens up discussion about inclusion of other changes, but I'm hoping
>> to freeze incompatible changes after maybe two alphas, do a beta (with no
>> further incompat changes allowed), and then finally a 3.x GA. For those
>> keeping track, that means a 3.x GA in about four months.
>>
>> I would also like to stress though that this is not intended to be a big
>> bang release. For instance, it would be great if we could maintain wire
>> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
>> branch-2 and branch-3 similar also makes backports easier, since we're
>> likely maintaining 2.x for a while yet.
>>
>> Please let me know any comments / concerns related to the above. If people
>> are friendly to the idea, I'd like to cut a branch-3 and start working on
>> the first alpha.
>>
>> Best,
>> Andrew
>>

Re: Looking to a Hadoop 3 release

Posted by Chris Douglas <cd...@apache.org>.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.

Agreed; let's try to minimize backporting headaches. Pulling trunk >
branch-2 > branch-2.x is already tedious. Adding a branch-3,
branch-3.x would be obnoxious.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.

+1 This sounds like a good approach. Marked as alpha, we can break
compatibility in minor versions. Stabilizing a beta can correspond
with cutting branch-3, since that will be winding down branch-2. This
shouldn't disrupt existing plans for branch-2.

However, this requires that committers not accumulate too much
compatibility debt in trunk. Undoing all that in branch-3 imposes a
burdensome tax. Scanning through Allen's diff: that doesn't appear to
be the case so far, but it recommends against developing features "in
place" on trunk. Just be considerate of users and developers who will
need to move from (and maintain) branch-2.

> I believe we can start planning 3.0 from trunk right after 2.7 is out.

If we're publishing a snapshot, we don't need too much planning. -C

> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
>> Hi devs,
>>
>> It's been a year and a half since 2.x went GA, and I think we're about due
>> for a 3.x release.
>> Notably, there are two incompatible changes I'd like to call out, that will
>> have a tremendous positive impact for our users.
>>
>> First, classpath isolation being done at HADOOP-11656, which has been a
>> long-standing request from many downstreams and Hadoop users.
>>
>> Second, bumping the source and target JDK version to JDK8 (related to
>> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
>> months from now). In the past, we've had issues with our dependencies
>> discontinuing support for old JDKs, so this will future-proof us.
>>
>> Between the two, we'll also have quite an opportunity to clean up and
>> upgrade our dependencies, another common user and developer request.
>>
>> I'd like to propose that we start rolling a series of monthly-ish series of
>> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
>> other cat herding responsibilities. There are already quite a few changes
>> slated for 3.0 besides the above (for instance the shell script rewrite) so
>> there's already value in a 3.0 alpha, and the more time we give downstreams
>> to integrate, the better.
>>
>> This opens up discussion about inclusion of other changes, but I'm hoping
>> to freeze incompatible changes after maybe two alphas, do a beta (with no
>> further incompat changes allowed), and then finally a 3.x GA. For those
>> keeping track, that means a 3.x GA in about four months.
>>
>> I would also like to stress though that this is not intended to be a big
>> bang release. For instance, it would be great if we could maintain wire
>> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
>> branch-2 and branch-3 similar also makes backports easier, since we're
>> likely maintaining 2.x for a while yet.
>>
>> Please let me know any comments / concerns related to the above. If people
>> are friendly to the idea, I'd like to cut a branch-3 and start working on
>> the first alpha.
>>
>> Best,
>> Andrew
>>

Re: Looking to a Hadoop 3 release

Posted by Andrew Wang <an...@cloudera.com>.

Hi Konst, thanks for taking a look. I think I essentially agree with your
points.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko <sh...@gmail.com>
wrote:

> Andrew,
>
> Hadoop 3 seems in general like a good idea to me.
> 1. I did not understand if you propose to release 3.0 instead of 2.7 or in
> addition?
>    I think 2.7 is needed at least as a stabilization step for the 2.x line.
>
> I agree with this, 2.7 is needed, and I think Vinod/Arun are working on it
now.

I expect branch-2 to be maintained for a while yet, separate from a
branch-3.

> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.
>
> My motivations here are purely what I've stated above. I remember the pain
of the branch-1 days as well, and this would be a far, far smaller
difference. JDK8 min version and classpath isolation are compelling, yet
incompatible, which is why I'm proposing Hadoop 3. Besides those two
features, it should be approximately the same "size" as our 2.x releases.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.
> I believe we can start planning 3.0 from trunk right after 2.7 is out.
>

I agree with this, and would be okay with this if our audit of trunk
reveals no incompatible changes we're uncomfortable releasing.

I'll note though that committing to multiple branches is way easier now
with git and cherry-pick, so that overhead is reduced. Rolling out an alpha
now is strictly a good thing for our downstreams, even if it means we need
to do extra commits.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Konstantin Shvachko <sh...@gmail.com>.

Andrew,

Hadoop 3 seems in general like a good idea to me.
1. I did not understand if you propose to release 3.0 instead of 2.7 or in
addition?
   I think 2.7 is needed at least as a stabilization step for the 2.x line.

2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
other versions. If that somehow beneficial for commercial vendors, which I
don't see how, for the community it was proven to be very disruptive. Would
be really good to avoid it this time.

3. Could we release Hadoop 3 directly from trunk? With a proper feature
freeze in advance. Current trunk is in the best working condition I've seen
in years - much better, than when hadoop-2 was coming to life. It could
make a good alpha.
I believe we can start planning 3.0 from trunk right after 2.7 is out.

Thanks,
--Konst

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

RE: Looking to a Hadoop 3 release

Posted by "Liu, Yi A" <yi...@intel.com>.

+1

Regards,
Yi Liu

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

 > classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

 > JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

 > maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

RE: Looking to a Hadoop 3 release

Posted by "Liu, Yi A" <yi...@intel.com>.

+1

Regards,
Yi Liu

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

JDK8 support is in the consideration, looks like many issues were reported and resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

Andrew 
  Thanks for bringing up the issue of moving to Java8. Java8 is important
However, I am not seeing a strong motivation for changing the major number.
We can go to Java8 in  the 2.series. 
The classpath issue for Hadoop-11656 is too minor to force a major number change (no pun intended).

Lets separate the issue of Java8 and Hadoop 3.0

sanjay


> On Mar 2, 2015, at 3:19 PM, Andrew Wang <an...@cloudera.com> wrote:
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

+1

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.


> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>

Guava etc. have been such a pain in the past. Can't wait to have a release
we don't have to worry about what version of dependencies users want to
use.


>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>

Are you saying we can use lambdas without re-writing all of Hadoop in
Scala?


>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities.


Will be glad to help.


> There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.


> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.


Is moving to JDK8 fundamentally different from the move to JDK7? We are moving to JDK7 via release 2.7 that I am helping with now.


> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.


Aren't the shell script rewrite changes supposed to be compatible?

Thanks,
+Vinod

Re: Looking to a Hadoop 3 release

Posted by "Aaron T. Myers" <at...@apache.org>.

+1, this sounds like a good plan to me.

Thanks a lot for volunteering to take this on, Andrew.

Best,
Aaron

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

+1

It sounds like a good idea, especially regarding JDK.

Regards
JB

On 03/03/2015 12:19 AM, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Looking to a Hadoop 3 release

Posted by Arun Murthy <ac...@hortonworks.com>.

Andrew,

 Thanks for bringing up this discussion.

 I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.

 IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. 

 Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 

 However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.

 Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.

 Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. 

 Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.

 Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 

 Thoughts?

thanks,
Arun

________________________________________
From: Andrew Wang <an...@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.


Is moving to JDK8 fundamentally different from the move to JDK7? We are moving to JDK7 via release 2.7 that I am helping with now.


> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.


Aren't the shell script rewrite changes supposed to be compatible?

Thanks,
+Vinod

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

JDK8 support is in the consideration, looks like many issues were reported and resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Konstantin Shvachko <sh...@gmail.com>.

Andrew,

Hadoop 3 seems in general like a good idea to me.
1. I did not understand if you propose to release 3.0 instead of 2.7 or in
addition?
   I think 2.7 is needed at least as a stabilization step for the 2.x line.

2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
other versions. If that somehow beneficial for commercial vendors, which I
don't see how, for the community it was proven to be very disruptive. Would
be really good to avoid it this time.

3. Could we release Hadoop 3 directly from trunk? With a proper feature
freeze in advance. Current trunk is in the best working condition I've seen
in years - much better, than when hadoop-2 was coming to life. It could
make a good alpha.
I believe we can start planning 3.0 from trunk right after 2.7 is out.

Thanks,
--Konst

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

Andrew 
  Thanks for bringing up the issue of moving to Java8. Java8 is important
However, I am not seeing a strong motivation for changing the major number.
We can go to Java8 in  the 2.series. 
The classpath issue for Hadoop-11656 is too minor to force a major number change (no pun intended).

Lets separate the issue of Java8 and Hadoop 3.0

sanjay


> On Mar 2, 2015, at 3:19 PM, Andrew Wang <an...@cloudera.com> wrote:
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

Posted by Chen He <ai...@gmail.com>.

+1 non-binding

It is a nice to have hadoop 3.x release. My honor to help.

Regards!

Chen

On Mon, Mar 2, 2015 at 4:58 PM, Zheng, Kai <ka...@intel.com> wrote:

> Sorry for the bad. I thought it was sending to my colleagues.
>
> By the way, for the JDK8 support, we (Intel) would like to investigate
> further and help, thanks.
>
> Regards,
> Kai
>
> -----Original Message-----
> From: Zheng, Kai
> Sent: Tuesday, March 03, 2015 8:49 AM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: RE: Looking to a Hadoop 3 release
>
> JDK8 support is in the consideration, looks like many issues were reported
> and resolved already.
>
> https://issues.apache.org/jira/browse/HADOOP-11090
>
>
> -----Original Message-----
> From: Andrew Wang [mailto:andrew.wang@cloudera.com]
> Sent: Tuesday, March 03, 2015 7:20 AM
> To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Looking to a Hadoop 3 release
>
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that
> will have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Sorry for the bad. I thought it was sending to my colleagues. 

By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks.

Regards,
Kai

-----Original Message-----
From: Zheng, Kai 
Sent: Tuesday, March 03, 2015 8:49 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: RE: Looking to a Hadoop 3 release

JDK8 support is in the consideration, looks like many issues were reported and resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

JDK8 support is in the consideration, looks like many issues were reported and resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Konstantin Shvachko <sh...@gmail.com>.

Andrew,

Hadoop 3 seems in general like a good idea to me.
1. I did not understand if you propose to release 3.0 instead of 2.7 or in
addition?
   I think 2.7 is needed at least as a stabilization step for the 2.x line.

2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
other versions. If that somehow beneficial for commercial vendors, which I
don't see how, for the community it was proven to be very disruptive. Would
be really good to avoid it this time.

3. Could we release Hadoop 3 directly from trunk? With a proper feature
freeze in advance. Current trunk is in the best working condition I've seen
in years - much better, than when hadoop-2 was coming to life. It could
make a good alpha.
I believe we can start planning 3.0 from trunk right after 2.7 is out.

Thanks,
--Konst

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

 > classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

 > JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

 > maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Yongjun Zhang <yz...@cloudera.com>.

Thanks all.

There is an open issue HDFS-6962 (ACLs inheritance conflicts with
umaskmode), for which the incompatibility appears to make it not suitable
for 2.x and it's targetted 3.0, please see:

https://issues.apache.org/jira/browse/HDFS-6962?focusedCommentId=14335418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14335418

Best,

--Yongjun


On Wed, Mar 4, 2015 at 8:13 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> One of the questions that keeps popping up is “what exactly is in trunk?”
>
> As some may recall, I had done some experiments creating the change log
> based upon JIRA.  While the interest level appeared to be approaching zero,
> I kept playing with it a bit and eventually also started playing with the
> release notes script (for various reasons I won’t bore you with.)
>
> In any case, I’ve started posting the results of these runs on one of my
> github repos if anyone was wanting a quick reference as to JIRA’s opinion
> on the matter:
>
> https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0
>
>
>

Re: Looking to a Hadoop 3 release

Posted by Yongjun Zhang <yz...@cloudera.com>.

Thanks all.

There is an open issue HDFS-6962 (ACLs inheritance conflicts with
umaskmode), for which the incompatibility appears to make it not suitable
for 2.x and it's targetted 3.0, please see:

https://issues.apache.org/jira/browse/HDFS-6962?focusedCommentId=14335418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14335418

Best,

--Yongjun


On Wed, Mar 4, 2015 at 8:13 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> One of the questions that keeps popping up is “what exactly is in trunk?”
>
> As some may recall, I had done some experiments creating the change log
> based upon JIRA.  While the interest level appeared to be approaching zero,
> I kept playing with it a bit and eventually also started playing with the
> release notes script (for various reasons I won’t bore you with.)
>
> In any case, I’ve started posting the results of these runs on one of my
> github repos if anyone was wanting a quick reference as to JIRA’s opinion
> on the matter:
>
> https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0
>
>
>

Re: Looking to a Hadoop 3 release

Posted by Yongjun Zhang <yz...@cloudera.com>.

Thanks all.

There is an open issue HDFS-6962 (ACLs inheritance conflicts with
umaskmode), for which the incompatibility appears to make it not suitable
for 2.x and it's targetted 3.0, please see:

https://issues.apache.org/jira/browse/HDFS-6962?focusedCommentId=14335418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14335418

Best,

--Yongjun


On Wed, Mar 4, 2015 at 8:13 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> One of the questions that keeps popping up is “what exactly is in trunk?”
>
> As some may recall, I had done some experiments creating the change log
> based upon JIRA.  While the interest level appeared to be approaching zero,
> I kept playing with it a bit and eventually also started playing with the
> release notes script (for various reasons I won’t bore you with.)
>
> In any case, I’ve started posting the results of these runs on one of my
> github repos if anyone was wanting a quick reference as to JIRA’s opinion
> on the matter:
>
> https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0
>
>
>

Re: Looking to a Hadoop 3 release

Posted by Yongjun Zhang <yz...@cloudera.com>.

Thanks all.

There is an open issue HDFS-6962 (ACLs inheritance conflicts with
umaskmode), for which the incompatibility appears to make it not suitable
for 2.x and it's targetted 3.0, please see:

https://issues.apache.org/jira/browse/HDFS-6962?focusedCommentId=14335418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14335418

Best,

--Yongjun


On Wed, Mar 4, 2015 at 8:13 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
> One of the questions that keeps popping up is “what exactly is in trunk?”
>
> As some may recall, I had done some experiments creating the change log
> based upon JIRA.  While the interest level appeared to be approaching zero,
> I kept playing with it a bit and eventually also started playing with the
> release notes script (for various reasons I won’t bore you with.)
>
> In any case, I’ve started posting the results of these runs on one of my
> github repos if anyone was wanting a quick reference as to JIRA’s opinion
> on the matter:
>
> https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0
>
>
>

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

One of the questions that keeps popping up is “what exactly is in trunk?”

As some may recall, I had done some experiments creating the change log based upon JIRA.  While the interest level appeared to be approaching zero, I kept playing with it a bit and eventually also started playing with the release notes script (for various reasons I won’t bore you with.)

In any case, I’ve started posting the results of these runs on one of my github repos if anyone was wanting a quick reference as to JIRA’s opinion on the matter:

https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

One of the questions that keeps popping up is “what exactly is in trunk?”

As some may recall, I had done some experiments creating the change log based upon JIRA.  While the interest level appeared to be approaching zero, I kept playing with it a bit and eventually also started playing with the release notes script (for various reasons I won’t bore you with.)

In any case, I’ve started posting the results of these runs on one of my github repos if anyone was wanting a quick reference as to JIRA’s opinion on the matter:

https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

One of the questions that keeps popping up is “what exactly is in trunk?”

As some may recall, I had done some experiments creating the change log based upon JIRA.  While the interest level appeared to be approaching zero, I kept playing with it a bit and eventually also started playing with the release notes script (for various reasons I won’t bore you with.)

In any case, I’ve started posting the results of these runs on one of my github repos if anyone was wanting a quick reference as to JIRA’s opinion on the matter:

https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0

Re: Looking to a Hadoop 3 release

Posted by Allen Wittenauer <aw...@altiscale.com>.

One of the questions that keeps popping up is “what exactly is in trunk?”

As some may recall, I had done some experiments creating the change log based upon JIRA.  While the interest level appeared to be approaching zero, I kept playing with it a bit and eventually also started playing with the release notes script (for various reasons I won’t bore you with.)

In any case, I’ve started posting the results of these runs on one of my github repos if anyone was wanting a quick reference as to JIRA’s opinion on the matter:

https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

On Wed, Mar 4, 2015 at 10:46 AM, Stack <st...@duboce.net> wrote:

> In general +1 on 3.0.0. Its time. If we start now, it might make it out by
> 2016. If we start now, downstreamers can start aligning themselves to land
> versions that suit at about the same time.
>
> While two big items have been called out as possible incompatible changes,
> and there is ongoing discussion as to whether they are or not*, is there
> any chance of getting a longer list of big differences between the
> branches? In particular I'd be interested in improvements that are 'off' by
> default that would be better defaulted 'on'.
>
> Thanks,
> St.Ack
>
> * Let me note that 'compatible' around these parts is a trampled concept
> seemingly open to interpretation with a definition that is other than
> prevails elsewhere in software. See Allen's list above, and in our
> downstream project, the recent HBASE-13149 "HBase server MR tools are
> broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
> 2.x if only so we can leave behind all current notions of 'compatibility'
> and just start over (as per Allen).
>

Unfortunately, our compatibility policies
<http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html>
are
rather loose and allow for changes that break downstream projects. Fixing
the classpath issues would let us tighten our policies and bring our
"compatibility store" more inline with the general expectations.




>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

On Wed, Mar 4, 2015 at 10:46 AM, Stack <st...@duboce.net> wrote:

> In general +1 on 3.0.0. Its time. If we start now, it might make it out by
> 2016. If we start now, downstreamers can start aligning themselves to land
> versions that suit at about the same time.
>
> While two big items have been called out as possible incompatible changes,
> and there is ongoing discussion as to whether they are or not*, is there
> any chance of getting a longer list of big differences between the
> branches? In particular I'd be interested in improvements that are 'off' by
> default that would be better defaulted 'on'.
>
> Thanks,
> St.Ack
>
> * Let me note that 'compatible' around these parts is a trampled concept
> seemingly open to interpretation with a definition that is other than
> prevails elsewhere in software. See Allen's list above, and in our
> downstream project, the recent HBASE-13149 "HBase server MR tools are
> broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
> 2.x if only so we can leave behind all current notions of 'compatibility'
> and just start over (as per Allen).
>

Unfortunately, our compatibility policies
<http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html>
are
rather loose and allow for changes that break downstream projects. Fixing
the classpath issues would let us tighten our policies and bring our
"compatibility store" more inline with the general expectations.




>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

On Wed, Mar 4, 2015 at 10:46 AM, Stack <st...@duboce.net> wrote:

> In general +1 on 3.0.0. Its time. If we start now, it might make it out by
> 2016. If we start now, downstreamers can start aligning themselves to land
> versions that suit at about the same time.
>
> While two big items have been called out as possible incompatible changes,
> and there is ongoing discussion as to whether they are or not*, is there
> any chance of getting a longer list of big differences between the
> branches? In particular I'd be interested in improvements that are 'off' by
> default that would be better defaulted 'on'.
>
> Thanks,
> St.Ack
>
> * Let me note that 'compatible' around these parts is a trampled concept
> seemingly open to interpretation with a definition that is other than
> prevails elsewhere in software. See Allen's list above, and in our
> downstream project, the recent HBASE-13149 "HBase server MR tools are
> broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
> 2.x if only so we can leave behind all current notions of 'compatibility'
> and just start over (as per Allen).
>

Unfortunately, our compatibility policies
<http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html>
are
rather loose and allow for changes that break downstream projects. Fixing
the classpath issues would let us tighten our policies and bring our
"compatibility store" more inline with the general expectations.




>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Might I have some comments for this, just providing my thought. Thanks.

>> If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time.
Not only for down streamers to align with the long term release, but also for contributors like me to align with their future effort, maybe.

In addition to the JDK8 support and classpath isolation, might we add more possible candidate considerations. 
How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ?
https://issues.apache.org/jira/browse/HADOOP-9797

The benefits: 
1) allow multiple login sessions/contexts and authentication methods to be used in the same Java application/process without conflicts, providing good isolation by getting rid of globals and statics.
2) allow to pluggable new authentication methods for UGI, in modular, manageable and maintainable manner.

Another, we would also push the first release of Apache Kerby, preparing for a strong dedicated and clean Kerberos library in Java for both client and KDC sides, and by leveraging the library, 
update Hadoop-MiniKDC and perform more security tests.
https://issues.apache.org/jira/browse/DIRKRB-102

Hope this makes sense. Thanks.

Regards,
Kai

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Thursday, March 05, 2015 2:47 AM
To: common-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time.

While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 "HBase server MR tools are broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about 
> due for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that 
> will have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been 
> a long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to 
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two 
> months from now). In the past, we've had issues with our dependencies 
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and 
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish 
> series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM 
> and other cat herding responsibilities. There are already quite a few 
> changes slated for 3.0 besides the above (for instance the shell 
> script rewrite) so there's already value in a 3.0 alpha, and the more 
> time we give downstreams to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm 
> hoping to freeze incompatible changes after maybe two alphas, do a 
> beta (with no further incompat changes allowed), and then finally a 
> 3.x GA. For those keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a 
> big bang release. For instance, it would be great if we could maintain 
> wire compatibility between 2.x and 3.x, so rolling upgrades work. 
> Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're 
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If 
> people are friendly to the idea, I'd like to cut a branch-3 and start 
> working on the first alpha.
>
> Best,
> Andrew
>

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Might I have some comments for this, just providing my thought. Thanks.

>> If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time.
Not only for down streamers to align with the long term release, but also for contributors like me to align with their future effort, maybe.

In addition to the JDK8 support and classpath isolation, might we add more possible candidate considerations. 
How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ?
https://issues.apache.org/jira/browse/HADOOP-9797

The benefits: 
1) allow multiple login sessions/contexts and authentication methods to be used in the same Java application/process without conflicts, providing good isolation by getting rid of globals and statics.
2) allow to pluggable new authentication methods for UGI, in modular, manageable and maintainable manner.

Another, we would also push the first release of Apache Kerby, preparing for a strong dedicated and clean Kerberos library in Java for both client and KDC sides, and by leveraging the library, 
update Hadoop-MiniKDC and perform more security tests.
https://issues.apache.org/jira/browse/DIRKRB-102

Hope this makes sense. Thanks.

Regards,
Kai

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Thursday, March 05, 2015 2:47 AM
To: common-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time.

While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 "HBase server MR tools are broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about 
> due for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that 
> will have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been 
> a long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to 
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two 
> months from now). In the past, we've had issues with our dependencies 
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and 
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish 
> series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM 
> and other cat herding responsibilities. There are already quite a few 
> changes slated for 3.0 besides the above (for instance the shell 
> script rewrite) so there's already value in a 3.0 alpha, and the more 
> time we give downstreams to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm 
> hoping to freeze incompatible changes after maybe two alphas, do a 
> beta (with no further incompat changes allowed), and then finally a 
> 3.x GA. For those keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a 
> big bang release. For instance, it would be great if we could maintain 
> wire compatibility between 2.x and 3.x, so rolling upgrades work. 
> Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're 
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If 
> people are friendly to the idea, I'd like to cut a branch-3 and start 
> working on the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

On Wed, Mar 4, 2015 at 10:46 AM, Stack <st...@duboce.net> wrote:

> In general +1 on 3.0.0. Its time. If we start now, it might make it out by
> 2016. If we start now, downstreamers can start aligning themselves to land
> versions that suit at about the same time.
>
> While two big items have been called out as possible incompatible changes,
> and there is ongoing discussion as to whether they are or not*, is there
> any chance of getting a longer list of big differences between the
> branches? In particular I'd be interested in improvements that are 'off' by
> default that would be better defaulted 'on'.
>
> Thanks,
> St.Ack
>
> * Let me note that 'compatible' around these parts is a trampled concept
> seemingly open to interpretation with a definition that is other than
> prevails elsewhere in software. See Allen's list above, and in our
> downstream project, the recent HBASE-13149 "HBase server MR tools are
> broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
> 2.x if only so we can leave behind all current notions of 'compatibility'
> and just start over (as per Allen).
>

Unfortunately, our compatibility policies
<http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html>
are
rather loose and allow for changes that break downstream projects. Fixing
the classpath issues would let us tighten our policies and bring our
"compatibility store" more inline with the general expectations.




>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Might I have some comments for this, just providing my thought. Thanks.

>> If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time.
Not only for down streamers to align with the long term release, but also for contributors like me to align with their future effort, maybe.

In addition to the JDK8 support and classpath isolation, might we add more possible candidate considerations. 
How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ?
https://issues.apache.org/jira/browse/HADOOP-9797

The benefits: 
1) allow multiple login sessions/contexts and authentication methods to be used in the same Java application/process without conflicts, providing good isolation by getting rid of globals and statics.
2) allow to pluggable new authentication methods for UGI, in modular, manageable and maintainable manner.

Another, we would also push the first release of Apache Kerby, preparing for a strong dedicated and clean Kerberos library in Java for both client and KDC sides, and by leveraging the library, 
update Hadoop-MiniKDC and perform more security tests.
https://issues.apache.org/jira/browse/DIRKRB-102

Hope this makes sense. Thanks.

Regards,
Kai

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Thursday, March 05, 2015 2:47 AM
To: common-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time.

While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 "HBase server MR tools are broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about 
> due for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that 
> will have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been 
> a long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to 
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two 
> months from now). In the past, we've had issues with our dependencies 
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and 
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish 
> series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM 
> and other cat herding responsibilities. There are already quite a few 
> changes slated for 3.0 besides the above (for instance the shell 
> script rewrite) so there's already value in a 3.0 alpha, and the more 
> time we give downstreams to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm 
> hoping to freeze incompatible changes after maybe two alphas, do a 
> beta (with no further incompat changes allowed), and then finally a 
> 3.x GA. For those keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a 
> big bang release. For instance, it would be great if we could maintain 
> wire compatibility between 2.x and 3.x, so rolling upgrades work. 
> Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're 
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If 
> people are friendly to the idea, I'd like to cut a branch-3 and start 
> working on the first alpha.
>
> Best,
> Andrew
>

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Might I have some comments for this, just providing my thought. Thanks.

>> If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time.
Not only for down streamers to align with the long term release, but also for contributors like me to align with their future effort, maybe.

In addition to the JDK8 support and classpath isolation, might we add more possible candidate considerations. 
How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ?
https://issues.apache.org/jira/browse/HADOOP-9797

The benefits: 
1) allow multiple login sessions/contexts and authentication methods to be used in the same Java application/process without conflicts, providing good isolation by getting rid of globals and statics.
2) allow to pluggable new authentication methods for UGI, in modular, manageable and maintainable manner.

Another, we would also push the first release of Apache Kerby, preparing for a strong dedicated and clean Kerberos library in Java for both client and KDC sides, and by leveraging the library, 
update Hadoop-MiniKDC and perform more security tests.
https://issues.apache.org/jira/browse/DIRKRB-102

Hope this makes sense. Thanks.

Regards,
Kai

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Thursday, March 05, 2015 2:47 AM
To: common-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time.

While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 "HBase server MR tools are broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about 
> due for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that 
> will have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been 
> a long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to 
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two 
> months from now). In the past, we've had issues with our dependencies 
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and 
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish 
> series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM 
> and other cat herding responsibilities. There are already quite a few 
> changes slated for 3.0 besides the above (for instance the shell 
> script rewrite) so there's already value in a 3.0 alpha, and the more 
> time we give downstreams to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm 
> hoping to freeze incompatible changes after maybe two alphas, do a 
> beta (with no further incompat changes allowed), and then finally a 
> 3.x GA. For those keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a 
> big bang release. For instance, it would be great if we could maintain 
> wire compatibility between 2.x and 3.x, so rolling upgrades work. 
> Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're 
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If 
> people are friendly to the idea, I'd like to cut a branch-3 and start 
> working on the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Stack <st...@duboce.net>.

In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 "HBase server MR tools are
broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

Andrew 
  Thanks for bringing up the issue of moving to Java8. Java8 is important
However, I am not seeing a strong motivation for changing the major number.
We can go to Java8 in  the 2.series. 
The classpath issue for Hadoop-11656 is too minor to force a major number change (no pun intended).

Lets separate the issue of Java8 and Hadoop 3.0

sanjay


> On Mar 2, 2015, at 3:19 PM, Andrew Wang <an...@cloudera.com> wrote:
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

Posted by "Aaron T. Myers" <at...@apache.org>.

+1, this sounds like a good plan to me.

Thanks a lot for volunteering to take this on, Andrew.

Best,
Aaron

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Robert Kanter <rk...@cloudera.com>.

+1  Happy to help too

On Mon, Mar 2, 2015 at 3:57 PM, Yongjun Zhang <yz...@cloudera.com> wrote:

> Thanks Andrew for the proposal.
>
> +1, and I will be happy to help.
>
> --Yongjun
>
>
>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>

Re: Looking to a Hadoop 3 release

Posted by Robert Kanter <rk...@cloudera.com>.

+1  Happy to help too

On Mon, Mar 2, 2015 at 3:57 PM, Yongjun Zhang <yz...@cloudera.com> wrote:

> Thanks Andrew for the proposal.
>
> +1, and I will be happy to help.
>
> --Yongjun
>
>
>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>

Re: Looking to a Hadoop 3 release

Posted by Robert Kanter <rk...@cloudera.com>.

+1  Happy to help too

On Mon, Mar 2, 2015 at 3:57 PM, Yongjun Zhang <yz...@cloudera.com> wrote:

> Thanks Andrew for the proposal.
>
> +1, and I will be happy to help.
>
> --Yongjun
>
>
>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>

Re: Looking to a Hadoop 3 release

Posted by Robert Kanter <rk...@cloudera.com>.

+1  Happy to help too

On Mon, Mar 2, 2015 at 3:57 PM, Yongjun Zhang <yz...@cloudera.com> wrote:

> Thanks Andrew for the proposal.
>
> +1, and I will be happy to help.
>
> --Yongjun
>
>
>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>

Re: Looking to a Hadoop 3 release

Posted by Yongjun Zhang <yz...@cloudera.com>.

Thanks Andrew for the proposal.

+1, and I will be happy to help.

--Yongjun




On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Arun Murthy <ac...@hortonworks.com>.

Andrew,

 Thanks for bringing up this discussion.

 I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.

 IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. 

 Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 

 However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.

 Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.

 Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. 

 Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.

 Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 

 Thoughts?

thanks,
Arun

________________________________________
From: Andrew Wang <an...@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

+1

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.


> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>

Guava etc. have been such a pain in the past. Can't wait to have a release
we don't have to worry about what version of dependencies users want to
use.


>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>

Are you saying we can use lambdas without re-writing all of Hadoop in
Scala?


>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities.


Will be glad to help.


> There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.


> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

RE: Looking to a Hadoop 3 release

Posted by "Liu, Yi A" <yi...@intel.com>.

+1

Regards,
Yi Liu

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

+1

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.


> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>

Guava etc. have been such a pain in the past. Can't wait to have a release
we don't have to worry about what version of dependencies users want to
use.


>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>

Are you saying we can use lambdas without re-writing all of Hadoop in
Scala?


>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities.


Will be glad to help.


> There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.


> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Looking to a Hadoop 3 release

Posted by sanjay Radia <sa...@gmail.com>.

Andrew 
  Thanks for bringing up the issue of moving to Java8. Java8 is important
However, I am not seeing a strong motivation for changing the major number.
We can go to Java8 in  the 2.series. 
The classpath issue for Hadoop-11656 is too minor to force a major number change (no pun intended).

Lets separate the issue of Java8 and Hadoop 3.0

sanjay


> On Mar 2, 2015, at 3:19 PM, Andrew Wang <an...@cloudera.com> wrote:
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

Posted by Stack <st...@duboce.net>.

In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 "HBase server MR tools are
broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Lei Xu <le...@cloudera.com>.

+1.  Would love to help.



On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com> wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew



-- 
Lei (Eddy) Xu
Software Engineer, Cloudera

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Sorry for the bad. I thought it was sending to my colleagues. 

By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks.

Regards,
Kai

-----Original Message-----
From: Zheng, Kai 
Sent: Tuesday, March 03, 2015 8:49 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: RE: Looking to a Hadoop 3 release

JDK8 support is in the consideration, looks like many issues were reported and resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

 > classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

 > JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

 > maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Stack <st...@duboce.net>.

In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 "HBase server MR tools are
broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Sorry for the bad. I thought it was sending to my colleagues. 

By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks.

Regards,
Kai

-----Original Message-----
From: Zheng, Kai 
Sent: Tuesday, March 03, 2015 8:49 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: RE: Looking to a Hadoop 3 release

JDK8 support is in the consideration, looks like many issues were reported and resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Stack <st...@duboce.net>.

In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 "HBase server MR tools are
broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

JDK8 support is in the consideration, looks like many issues were reported and resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by "Aaron T. Myers" <at...@apache.org>.

+1, this sounds like a good plan to me.

Thanks a lot for volunteering to take this on, Andrew.

Best,
Aaron

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Karthik Kambatla <ka...@cloudera.com>.

+1

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.


> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>

Guava etc. have been such a pain in the past. Can't wait to have a release
we don't have to worry about what version of dependencies users want to
use.


>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>

Are you saying we can use lambdas without re-writing all of Hadoop in
Scala?


>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities.


Will be glad to help.


> There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.


> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Looking to a Hadoop 3 release

Posted by "Aaron T. Myers" <at...@apache.org>.

+1, this sounds like a good plan to me.

Thanks a lot for volunteering to take this on, Andrew.

Best,
Aaron

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

RE: Looking to a Hadoop 3 release

Posted by "Liu, Yi A" <yi...@intel.com>.

+1

Regards,
Yi Liu

-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Arun Murthy <ac...@hortonworks.com>.

Andrew,

 Thanks for bringing up this discussion.

 I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7.

 IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. 

 Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 

 However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage.

 Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users?

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release.

 Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. 

 Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost.

 Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 

 Thoughts?

thanks,
Arun

________________________________________
From: Andrew Wang <an...@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Yongjun Zhang <yz...@cloudera.com>.

Thanks Andrew for the proposal.

+1, and I will be happy to help.

--Yongjun




On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Yongjun Zhang <yz...@cloudera.com>.

Thanks Andrew for the proposal.

+1, and I will be happy to help.

--Yongjun




On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.


Is moving to JDK8 fundamentally different from the move to JDK7? We are moving to JDK7 via release 2.7 that I am helping with now.


> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.


Aren't the shell script rewrite changes supposed to be compatible?

Thanks,
+Vinod

RE: Looking to a Hadoop 3 release

Posted by "Zheng, Kai" <ka...@intel.com>.

Sorry for the bad. I thought it was sending to my colleagues. 

By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks.

Regards,
Kai

-----Original Message-----
From: Zheng, Kai 
Sent: Tuesday, March 03, 2015 8:49 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: RE: Looking to a Hadoop 3 release

JDK8 support is in the consideration, looks like many issues were reported and resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-----Original Message-----
From: Andrew Wang [mailto:andrew.wang@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

Posted by Yongjun Zhang <yz...@cloudera.com>.

Thanks Andrew for the proposal.

+1, and I will be happy to help.

--Yongjun




On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Konstantin Shvachko <sh...@gmail.com>.

Andrew,

Hadoop 3 seems in general like a good idea to me.
1. I did not understand if you propose to release 3.0 instead of 2.7 or in
addition?
   I think 2.7 is needed at least as a stabilization step for the 2.x line.

2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
other versions. If that somehow beneficial for commercial vendors, which I
don't see how, for the community it was proven to be very disruptive. Would
be really good to avoid it this time.

3. Could we release Hadoop 3 directly from trunk? With a proper feature
freeze in advance. Current trunk is in the best working condition I've seen
in years - much better, than when hadoop-2 was coming to life. It could
make a good alpha.
I believe we can start planning 3.0 from trunk right after 2.7 is out.

Thanks,
--Konst

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.


Is moving to JDK8 fundamentally different from the move to JDK7? We are moving to JDK7 via release 2.7 that I am helping with now.


> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.


Aren't the shell script rewrite changes supposed to be compatible?

Thanks,
+Vinod