You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Arun C Murthy <ac...@hortonworks.com> on 2014/06/21 07:02:43 UTC

Re: Plans of moving towards JDK7 in trunk

On Jun 20, 2014, at 9:51 PM, Steve Loughran <st...@hortonworks.com> wrote:

> On 20 June 2014 21:35, Steve Loughran <st...@hortonworks.com> wrote:
> 
>> 
>> This actually argues in favour of
>> 
>> -renaming branch-2 branch-3 after a release
>> -making trunk hadoop-4
>> 
>> -getting hadoop 3 released off the new branch-3 out in 2014, effectively
>> being an iteration of branch-2 with updated java , moves of (off?) guava,
>> off jetty, lib changes, but no other significant "big bang" features
>> 
>> 
>> Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
>> particular, anything that goes into Hadoop 4 for which there's no intent to
>> support in hadoop 2 & 3, can use the java 8 language features sooner rather
>> than later.
>> 
>> 
>> 
> I should add that I'm willing to be the person who gets the Java-7 based
> Hadoop  3.x out the door later this year

+1 that makes sense to me. Thanks for volunteering Steve - I'm glad to share the pain… ;-)

Arun
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> > Hadoop  3.x out the door later this year
>
> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
> share the pain… ;-)


Hey Arun, you may have missed that Andrew volunteered for doing this as
well (the thread is long, so easy to miss).

Cheers

-- 
Alejandro

Re: Plans of moving towards JDK7 in trunk

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> > Hadoop  3.x out the door later this year
>
> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
> share the pain… ;-)


Hey Arun, you may have missed that Andrew volunteered for doing this as
well (the thread is long, so easy to miss).

Cheers

-- 
Alejandro

Re: Plans of moving towards JDK7 in trunk

Posted by Sandy Ryza <sa...@cloudera.com>.
Andrew, correct me if I'm misunderstanding, but the incompatible change
that would require a major version bump is dropping support for JDK6.


On Mon, Jun 23, 2014 at 1:53 PM, sanjay Radia <sa...@hortonworks.com>
wrote:

>
> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
>
> > This is why I'd like to keep my original proposal on the table: keep
> going
> > with branch-2 in the near term, while working towards a JDK8-based
> Hadoop 3
> > by April next year. It doesn't need to be a big bang release either. I'd
> be
> > delighted if we could rolling upgrade from one to the other. I just
> didn't
> > want to rule out the inclusion of some very compelling feature outright.
> > Trust me though, I'd be the first person to ask about compatibility if
> such
> > a feature does come up.
>
>
> Given your above statement  on compatibility (such as rolling upgrades),
>  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and
> instead merely be 2.x? Or do you have any incompatible changes to Hadoop
> protocol or APIs in mind during the same time period?
>
> sanjay
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Plans of moving towards JDK7 in trunk

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Hey all,

This one started as an innocuous thread of enabling JDK7 on trunk and now it seems like (haven't still finished reading the entire thing, and I started a while ago) it has become a full blown proposal on 2.x, 3.x and 4.x releases. Some of us haven't been tracking this (at least me and a few others who indicated offline as such) assuming this is only about letting Jenkins run JDK7, but it has the potential to impact all future work.

I propose we fork this thread into a new one which clarifies the topic clearly for others to follow too.

Thanks,
+Vinod

On Jun 23, 2014, at 1:53 PM, sanjay Radia <sa...@hortonworks.com> wrote:

> 
> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
>> This is why I'd like to keep my original proposal on the table: keep going
>> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
>> by April next year. It doesn't need to be a big bang release either. I'd be
>> delighted if we could rolling upgrade from one to the other. I just didn't
>> want to rule out the inclusion of some very compelling feature outright.
>> Trust me though, I'd be the first person to ask about compatibility if such
>> a feature does come up.
> 
> 
> Given your above statement  on compatibility (such as rolling upgrades),  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and instead merely be 2.x? Or do you have any incompatible changes to Hadoop protocol or APIs in mind during the same time period?
> 
> sanjay
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Hey all,

This one started as an innocuous thread of enabling JDK7 on trunk and now it seems like (haven't still finished reading the entire thing, and I started a while ago) it has become a full blown proposal on 2.x, 3.x and 4.x releases. Some of us haven't been tracking this (at least me and a few others who indicated offline as such) assuming this is only about letting Jenkins run JDK7, but it has the potential to impact all future work.

I propose we fork this thread into a new one which clarifies the topic clearly for others to follow too.

Thanks,
+Vinod

On Jun 23, 2014, at 1:53 PM, sanjay Radia <sa...@hortonworks.com> wrote:

> 
> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
>> This is why I'd like to keep my original proposal on the table: keep going
>> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
>> by April next year. It doesn't need to be a big bang release either. I'd be
>> delighted if we could rolling upgrade from one to the other. I just didn't
>> want to rule out the inclusion of some very compelling feature outright.
>> Trust me though, I'd be the first person to ask about compatibility if such
>> a feature does come up.
> 
> 
> Given your above statement  on compatibility (such as rolling upgrades),  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and instead merely be 2.x? Or do you have any incompatible changes to Hadoop protocol or APIs in mind during the same time period?
> 
> sanjay
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Sandy Ryza <sa...@cloudera.com>.
Andrew, correct me if I'm misunderstanding, but the incompatible change
that would require a major version bump is dropping support for JDK6.


On Mon, Jun 23, 2014 at 1:53 PM, sanjay Radia <sa...@hortonworks.com>
wrote:

>
> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
>
> > This is why I'd like to keep my original proposal on the table: keep
> going
> > with branch-2 in the near term, while working towards a JDK8-based
> Hadoop 3
> > by April next year. It doesn't need to be a big bang release either. I'd
> be
> > delighted if we could rolling upgrade from one to the other. I just
> didn't
> > want to rule out the inclusion of some very compelling feature outright.
> > Trust me though, I'd be the first person to ask about compatibility if
> such
> > a feature does come up.
>
>
> Given your above statement  on compatibility (such as rolling upgrades),
>  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and
> instead merely be 2.x? Or do you have any incompatible changes to Hadoop
> protocol or APIs in mind during the same time period?
>
> sanjay
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Plans of moving towards JDK7 in trunk

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Hey all,

This one started as an innocuous thread of enabling JDK7 on trunk and now it seems like (haven't still finished reading the entire thing, and I started a while ago) it has become a full blown proposal on 2.x, 3.x and 4.x releases. Some of us haven't been tracking this (at least me and a few others who indicated offline as such) assuming this is only about letting Jenkins run JDK7, but it has the potential to impact all future work.

I propose we fork this thread into a new one which clarifies the topic clearly for others to follow too.

Thanks,
+Vinod

On Jun 23, 2014, at 1:53 PM, sanjay Radia <sa...@hortonworks.com> wrote:

> 
> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
>> This is why I'd like to keep my original proposal on the table: keep going
>> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
>> by April next year. It doesn't need to be a big bang release either. I'd be
>> delighted if we could rolling upgrade from one to the other. I just didn't
>> want to rule out the inclusion of some very compelling feature outright.
>> Trust me though, I'd be the first person to ask about compatibility if such
>> a feature does come up.
> 
> 
> Given your above statement  on compatibility (such as rolling upgrades),  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and instead merely be 2.x? Or do you have any incompatible changes to Hadoop protocol or APIs in mind during the same time period?
> 
> sanjay
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Hey all,

This one started as an innocuous thread of enabling JDK7 on trunk and now it seems like (haven't still finished reading the entire thing, and I started a while ago) it has become a full blown proposal on 2.x, 3.x and 4.x releases. Some of us haven't been tracking this (at least me and a few others who indicated offline as such) assuming this is only about letting Jenkins run JDK7, but it has the potential to impact all future work.

I propose we fork this thread into a new one which clarifies the topic clearly for others to follow too.

Thanks,
+Vinod

On Jun 23, 2014, at 1:53 PM, sanjay Radia <sa...@hortonworks.com> wrote:

> 
> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
>> This is why I'd like to keep my original proposal on the table: keep going
>> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
>> by April next year. It doesn't need to be a big bang release either. I'd be
>> delighted if we could rolling upgrade from one to the other. I just didn't
>> want to rule out the inclusion of some very compelling feature outright.
>> Trust me though, I'd be the first person to ask about compatibility if such
>> a feature does come up.
> 
> 
> Given your above statement  on compatibility (such as rolling upgrades),  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and instead merely be 2.x? Or do you have any incompatible changes to Hadoop protocol or APIs in mind during the same time period?
> 
> sanjay
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Sandy Ryza <sa...@cloudera.com>.
Andrew, correct me if I'm misunderstanding, but the incompatible change
that would require a major version bump is dropping support for JDK6.


On Mon, Jun 23, 2014 at 1:53 PM, sanjay Radia <sa...@hortonworks.com>
wrote:

>
> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
>
> > This is why I'd like to keep my original proposal on the table: keep
> going
> > with branch-2 in the near term, while working towards a JDK8-based
> Hadoop 3
> > by April next year. It doesn't need to be a big bang release either. I'd
> be
> > delighted if we could rolling upgrade from one to the other. I just
> didn't
> > want to rule out the inclusion of some very compelling feature outright.
> > Trust me though, I'd be the first person to ask about compatibility if
> such
> > a feature does come up.
>
>
> Given your above statement  on compatibility (such as rolling upgrades),
>  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and
> instead merely be 2.x? Or do you have any incompatible changes to Hadoop
> protocol or APIs in mind during the same time period?
>
> sanjay
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Plans of moving towards JDK7 in trunk

Posted by sanjay Radia <sa...@hortonworks.com>.
On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:

> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.


Given your above statement  on compatibility (such as rolling upgrades),  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and instead merely be 2.x? Or do you have any incompatible changes to Hadoop protocol or APIs in mind during the same time period?

sanjay
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Arun C Murthy <ac...@hortonworks.com>.
After further consideration, here is an alternate.

On Jun 21, 2014, at 11:14 AM, "Arun C. Murthy" <ac...@hortonworks.com> wrote:
> 
> JDK6 eol was Feb 2013 and, a year later, we are still have customers using it - which means we can't drop it yet.
> 
> http://www.oracle.com/technetwork/java/eol-135779.html
> 
> Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by April of next year... I suspect this means we'd have to support JDK7 at least till late 2015. I think, that, is really key regardless of version numbers.
> 
> Furthermore, if we, as a community, maintain discipline in terms of wire-compat, rolling-upgrades etc. we are better off making a major release every year - as you put, no more 'Big Bang' releases.


Looking at the big picture, I believe the users of Apache Hadoop would be better served by us if we prioritized operational aspects such as rolling upgrades, wire-compatibility, binary etc. for a couple of years.

Since not everyone has moved to hadoop-2 yet, talk of more incompatibility between hadoop-2/hadoop-3 or between hadoop-3/hadoop-4 within the next 12 months would certainly be a big issue for users - especially w.r.t rolling upgrades, wire-compat etc.

So, I think we should prioritize these operational aspects for users above everything else. Sure, jdk versions, features etc. are important, but lower in priority.

I'd also like to reiterate my concern on *dropping* support for a JDK7 - we need to support it till end of 2015 at the very least; happy to ship a version of Hadoop which is JDK8 only in 2015 - it just needs to support rolling-upgrades from the JDK7 Hadoop till end of 2015.

With that in mind... I actually like Andrew's suggestion below:

>  On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
>  I'd be more okay with an intermediate release with no incompatible changes
>  whatsoever besides bumping the JDK requirement to JDK7.

Taking that thought to it's logical conclusion, we can de-couple the dual concerns of JDK versions and major releases but bumping up our software dependencies (JDK, guice etc.) at well-defined and well-articulated releases.

The reason to so would be to ensure we *do not* sneak in operational incompatibilities in the guise of bumping JDK versions.

So, we could do something like:
# hadoop-2.30+ is JDK7, but provides rolling upgrades and wire-compat with hadoop-2.2+; say in Oct 2014
# hadoop-2.50+ is JDK8, but provides rolling upgrades and wire-compat with hadoop-2.2+; say in June 2015 (or even earlier).

This scheme certainly has some dis-advantages, however it has the significant advantage of making it *very* clear to end-users and administrators that we take operational aspects seriously.

Also, this is something we already have done i.e. we updated some of our software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as dramatic as JDK. Here are some examples:
https://issues.apache.org/jira/browse/HADOOP-9991
https://issues.apache.org/jira/browse/HADOOP-10102
https://issues.apache.org/jira/browse/HADOOP-10103
https://issues.apache.org/jira/browse/HADOOP-10104
https://issues.apache.org/jira/browse/HADOOP-10503

In summary, the key goals we should keep in mind are:
# Operational aspects such as rolling upgrades, wire-compat etc. for the next couple of years.
# Support JDK7 till end of 2015 at least, even if we decide to support JDK8 sometime in 2015. Just ensure wire-compat, rolling-upgrades etc.

Thoughts?

thanks,
Arun
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Arun C Murthy <ac...@hortonworks.com>.
After further consideration, here is an alternate.

On Jun 21, 2014, at 11:14 AM, "Arun C. Murthy" <ac...@hortonworks.com> wrote:
> 
> JDK6 eol was Feb 2013 and, a year later, we are still have customers using it - which means we can't drop it yet.
> 
> http://www.oracle.com/technetwork/java/eol-135779.html
> 
> Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by April of next year... I suspect this means we'd have to support JDK7 at least till late 2015. I think, that, is really key regardless of version numbers.
> 
> Furthermore, if we, as a community, maintain discipline in terms of wire-compat, rolling-upgrades etc. we are better off making a major release every year - as you put, no more 'Big Bang' releases.


Looking at the big picture, I believe the users of Apache Hadoop would be better served by us if we prioritized operational aspects such as rolling upgrades, wire-compatibility, binary etc. for a couple of years.

Since not everyone has moved to hadoop-2 yet, talk of more incompatibility between hadoop-2/hadoop-3 or between hadoop-3/hadoop-4 within the next 12 months would certainly be a big issue for users - especially w.r.t rolling upgrades, wire-compat etc.

So, I think we should prioritize these operational aspects for users above everything else. Sure, jdk versions, features etc. are important, but lower in priority.

I'd also like to reiterate my concern on *dropping* support for a JDK7 - we need to support it till end of 2015 at the very least; happy to ship a version of Hadoop which is JDK8 only in 2015 - it just needs to support rolling-upgrades from the JDK7 Hadoop till end of 2015.

With that in mind... I actually like Andrew's suggestion below:

>  On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
>  I'd be more okay with an intermediate release with no incompatible changes
>  whatsoever besides bumping the JDK requirement to JDK7.

Taking that thought to it's logical conclusion, we can de-couple the dual concerns of JDK versions and major releases but bumping up our software dependencies (JDK, guice etc.) at well-defined and well-articulated releases.

The reason to so would be to ensure we *do not* sneak in operational incompatibilities in the guise of bumping JDK versions.

So, we could do something like:
# hadoop-2.30+ is JDK7, but provides rolling upgrades and wire-compat with hadoop-2.2+; say in Oct 2014
# hadoop-2.50+ is JDK8, but provides rolling upgrades and wire-compat with hadoop-2.2+; say in June 2015 (or even earlier).

This scheme certainly has some dis-advantages, however it has the significant advantage of making it *very* clear to end-users and administrators that we take operational aspects seriously.

Also, this is something we already have done i.e. we updated some of our software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as dramatic as JDK. Here are some examples:
https://issues.apache.org/jira/browse/HADOOP-9991
https://issues.apache.org/jira/browse/HADOOP-10102
https://issues.apache.org/jira/browse/HADOOP-10103
https://issues.apache.org/jira/browse/HADOOP-10104
https://issues.apache.org/jira/browse/HADOOP-10503

In summary, the key goals we should keep in mind are:
# Operational aspects such as rolling upgrades, wire-compat etc. for the next couple of years.
# Support JDK7 till end of 2015 at least, even if we decide to support JDK8 sometime in 2015. Just ensure wire-compat, rolling-upgrades etc.

Thoughts?

thanks,
Arun
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Arun C Murthy <ac...@hortonworks.com>.
After further consideration, here is an alternate.

On Jun 21, 2014, at 11:14 AM, "Arun C. Murthy" <ac...@hortonworks.com> wrote:
> 
> JDK6 eol was Feb 2013 and, a year later, we are still have customers using it - which means we can't drop it yet.
> 
> http://www.oracle.com/technetwork/java/eol-135779.html
> 
> Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by April of next year... I suspect this means we'd have to support JDK7 at least till late 2015. I think, that, is really key regardless of version numbers.
> 
> Furthermore, if we, as a community, maintain discipline in terms of wire-compat, rolling-upgrades etc. we are better off making a major release every year - as you put, no more 'Big Bang' releases.


Looking at the big picture, I believe the users of Apache Hadoop would be better served by us if we prioritized operational aspects such as rolling upgrades, wire-compatibility, binary etc. for a couple of years.

Since not everyone has moved to hadoop-2 yet, talk of more incompatibility between hadoop-2/hadoop-3 or between hadoop-3/hadoop-4 within the next 12 months would certainly be a big issue for users - especially w.r.t rolling upgrades, wire-compat etc.

So, I think we should prioritize these operational aspects for users above everything else. Sure, jdk versions, features etc. are important, but lower in priority.

I'd also like to reiterate my concern on *dropping* support for a JDK7 - we need to support it till end of 2015 at the very least; happy to ship a version of Hadoop which is JDK8 only in 2015 - it just needs to support rolling-upgrades from the JDK7 Hadoop till end of 2015.

With that in mind... I actually like Andrew's suggestion below:

>  On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
>  I'd be more okay with an intermediate release with no incompatible changes
>  whatsoever besides bumping the JDK requirement to JDK7.

Taking that thought to it's logical conclusion, we can de-couple the dual concerns of JDK versions and major releases but bumping up our software dependencies (JDK, guice etc.) at well-defined and well-articulated releases.

The reason to so would be to ensure we *do not* sneak in operational incompatibilities in the guise of bumping JDK versions.

So, we could do something like:
# hadoop-2.30+ is JDK7, but provides rolling upgrades and wire-compat with hadoop-2.2+; say in Oct 2014
# hadoop-2.50+ is JDK8, but provides rolling upgrades and wire-compat with hadoop-2.2+; say in June 2015 (or even earlier).

This scheme certainly has some dis-advantages, however it has the significant advantage of making it *very* clear to end-users and administrators that we take operational aspects seriously.

Also, this is something we already have done i.e. we updated some of our software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as dramatic as JDK. Here are some examples:
https://issues.apache.org/jira/browse/HADOOP-9991
https://issues.apache.org/jira/browse/HADOOP-10102
https://issues.apache.org/jira/browse/HADOOP-10103
https://issues.apache.org/jira/browse/HADOOP-10104
https://issues.apache.org/jira/browse/HADOOP-10503

In summary, the key goals we should keep in mind are:
# Operational aspects such as rolling upgrades, wire-compat etc. for the next couple of years.
# Support JDK7 till end of 2015 at least, even if we decide to support JDK8 sometime in 2015. Just ensure wire-compat, rolling-upgrades etc.

Thoughts?

thanks,
Arun
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Arun C Murthy <ac...@hortonworks.com>.
After further consideration, here is an alternate.

On Jun 21, 2014, at 11:14 AM, "Arun C. Murthy" <ac...@hortonworks.com> wrote:
> 
> JDK6 eol was Feb 2013 and, a year later, we are still have customers using it - which means we can't drop it yet.
> 
> http://www.oracle.com/technetwork/java/eol-135779.html
> 
> Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by April of next year... I suspect this means we'd have to support JDK7 at least till late 2015. I think, that, is really key regardless of version numbers.
> 
> Furthermore, if we, as a community, maintain discipline in terms of wire-compat, rolling-upgrades etc. we are better off making a major release every year - as you put, no more 'Big Bang' releases.


Looking at the big picture, I believe the users of Apache Hadoop would be better served by us if we prioritized operational aspects such as rolling upgrades, wire-compatibility, binary etc. for a couple of years.

Since not everyone has moved to hadoop-2 yet, talk of more incompatibility between hadoop-2/hadoop-3 or between hadoop-3/hadoop-4 within the next 12 months would certainly be a big issue for users - especially w.r.t rolling upgrades, wire-compat etc.

So, I think we should prioritize these operational aspects for users above everything else. Sure, jdk versions, features etc. are important, but lower in priority.

I'd also like to reiterate my concern on *dropping* support for a JDK7 - we need to support it till end of 2015 at the very least; happy to ship a version of Hadoop which is JDK8 only in 2015 - it just needs to support rolling-upgrades from the JDK7 Hadoop till end of 2015.

With that in mind... I actually like Andrew's suggestion below:

>  On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
>  I'd be more okay with an intermediate release with no incompatible changes
>  whatsoever besides bumping the JDK requirement to JDK7.

Taking that thought to it's logical conclusion, we can de-couple the dual concerns of JDK versions and major releases but bumping up our software dependencies (JDK, guice etc.) at well-defined and well-articulated releases.

The reason to so would be to ensure we *do not* sneak in operational incompatibilities in the guise of bumping JDK versions.

So, we could do something like:
# hadoop-2.30+ is JDK7, but provides rolling upgrades and wire-compat with hadoop-2.2+; say in Oct 2014
# hadoop-2.50+ is JDK8, but provides rolling upgrades and wire-compat with hadoop-2.2+; say in June 2015 (or even earlier).

This scheme certainly has some dis-advantages, however it has the significant advantage of making it *very* clear to end-users and administrators that we take operational aspects seriously.

Also, this is something we already have done i.e. we updated some of our software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as dramatic as JDK. Here are some examples:
https://issues.apache.org/jira/browse/HADOOP-9991
https://issues.apache.org/jira/browse/HADOOP-10102
https://issues.apache.org/jira/browse/HADOOP-10103
https://issues.apache.org/jira/browse/HADOOP-10104
https://issues.apache.org/jira/browse/HADOOP-10503

In summary, the key goals we should keep in mind are:
# Operational aspects such as rolling upgrades, wire-compat etc. for the next couple of years.
# Support JDK7 till end of 2015 at least, even if we decide to support JDK8 sometime in 2015. Just ensure wire-compat, rolling-upgrades etc.

Thoughts?

thanks,
Arun
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by "Arun C. Murthy" <ac...@hortonworks.com>.
Andrew,


> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
> Hi Steve, let me confirm that I understand your proposal correctly:
> 
> - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
> bumped library versions
> - Release a Hadoop 4 mid next year, based on JDK8
> 
> I question the utility of an intermediate Hadoop 3 like this. Assuming that
> it gets out in September (i.e. roughly when a 2.6 would land), we're
> looking at a valid lifespan of about 7 months before JDK7 is EOL i

JDK6 eol was Feb 2013 and, a year later, we are still have customers using it - which means we can't drop it yet.

http://www.oracle.com/technetwork/java/eol-135779.html

Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by April of next year... I suspect this means we'd have to support JDK7 at least till late 2015. I think, that, is really key regardless of version numbers.

Furthermore, if we, as a community, maintain discipline in terms of wire-compat, rolling-upgrades etc. we are better off making a major release every year - as you put, no more 'Big Bang' releases.

 We have to, as a development community, ourselves get over the 'trauma' of major releases - I do realize the irony here - but it's requisite to help our users feel confident in upgrading at a reasonable rate.

So, something like this could work:
# hadoop-2 / jdk6 - Oct 2013
# hadoop-3 / jdk7 - Oct 2014
# hadoop-4 / jdk8 - Oct 2015

Having said that, it would also be prudent to co-release hadoop-2/hadoop-3 & hadoop-3/hadoop-4 with requisite jdk versions. Maybe even hadoop-4 beta by middle of 2015. As such, it a good idea to allow trunk to move to jdk7 now - it's good practice as we will have to do the same for jdk8.

It does help, a lot, that we have now de-coupled user dependencies from the system with YARN. For e.g. we could run hadoop-2 MR on hadoop-3 YARN, even if there is some work remaining... see MAPREDUCE-4551. Future reliance on technologies like Docker will help further.

Thoughts?

Arun

> If this release also breaks compatibility by changing library versions,
> then it looks less and less appealing from a user perspective. I suspect it
> would end up seeing low adoption as everyone waits (at most) 7 months for
> the JDK8-based release to emerge.
> 
> I'd be more okay with an intermediate release with no incompatible changes
> whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
> be a weak release considering that branch-2 already runs fine on JDK7, and
> it looks somewhat bad publicly as we burn another major release number less
> than a year since 2.x going GA.
> 
> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.
> 
> I'll also posit that people will shy away from using JDK8 features while
> branch-2 remains in active use. There's definitely some new shiny there,
> but nothing compelling enough to me personally when weighed against the
> pain of harder branch-2 backports.
> 
> Let's try to keep this thread focused on the planning side of things
> though, deferring JDK-feature-related discussion to a different thread.
> We'd need to draw up a code-style doc on the wiki, but it sounds like
> something Steve and/or I could draft initially.
> 
> Thanks,
> Andrew
> 
> 
>> On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> 
>> 
>> On Jun 20, 2014, at 9:51 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> 
>>>> On 20 June 2014 21:35, Steve Loughran <st...@hortonworks.com> wrote:
>>>> 
>>>> 
>>>> This actually argues in favour of
>>>> 
>>>> -renaming branch-2 branch-3 after a release
>>>> -making trunk hadoop-4
>>>> 
>>>> -getting hadoop 3 released off the new branch-3 out in 2014, effectively
>>>> being an iteration of branch-2 with updated java , moves of (off?)
>> guava,
>>>> off jetty, lib changes, but no other significant "big bang" features
>>>> 
>>>> 
>>>> Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
>>>> particular, anything that goes into Hadoop 4 for which there's no
>> intent to
>>>> support in hadoop 2 & 3, can use the java 8 language features sooner
>> rather
>>>> than later.
>>> I should add that I'm willing to be the person who gets the Java-7 based
>>> Hadoop  3.x out the door later this year
>> 
>> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
>> share the pain… ;-)
>> 
>> Arun
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Steve Loughran <st...@hortonworks.com>.
On 21 June 2014 08:01, Andrew Wang <an...@cloudera.com> wrote:

> Hi Steve, let me confirm that I understand your proposal correctly:
>
> - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
> bumped library versions
> - Release a Hadoop 4 mid next year, based on JDK8
>
> I question the utility of an intermediate Hadoop 3 like this. Assuming that
> it gets out in September (i.e. roughly when a 2.6 would land), we're
> looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
> If this release also breaks compatibility by changing library versions,
> then it looks less and less appealing from a user perspective. I suspect it
> would end up seeing low adoption as everyone waits (at most) 7 months for
> the JDK8-based release to emerge.
>


I'm saying that we'd replace hadooop 2.6 with a 3.x release that, along
with the 2.6 changes, ups the java version and the JARs and dependencies
which we are frozen with in Hadoop 2.x

this issue of dependencies may not be so visible in hadoop's own codebase,
but when you write any downstream project, the majority of the xml
<clauses> in your POM file is about excluding stuff Hadoop pulls in. I've
been quietly trying to address this at HADOOP-9991, but we've reached the
limit of what can get in.

I'd be happy enough with the original "Stata Plan": a release of Hadoop 2.x
that says "java 7 + new libs", but given we've committed to not doing that,
releasing a Hadoop 3 stating that lets us get a hadoop with a modern set of
underpinnings out in 2014


>
> I'd be more okay with an intermediate release with no incompatible changes
> whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
> be a weak release considering that branch-2 already runs fine on JDK7, and
> it looks somewhat bad publicly as we burn another major release number less
> than a year since 2.x going GA.
>


it'll be > 1 year for 2.x to 3,

And to be realistic, the move to java 8+ across the entire hadoop stack
will probably take 1y too.


>
> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.
>
> I'll also posit that people will shy away from using JDK8 features while
> branch-2 remains in active use. There's definitely some new shiny there,
> but nothing compelling enough to me personally when weighed against the
> pain of harder branch-2 backports.
>


branch 2 would be frozen and tell everyone "move to java 7+", everything
downstream gets updated binaries and a chance to move forwards.

There's another issue, which is one Alejandro highlit:

---------- Forwarded message ----------
From: Alejandro Abdelnur <tu...@cloudera.com>
Date: 10 April 2014 10:30
Subject: Re: Plans of moving towards JDK7 in trunk
To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>


A bit of a different angle.

As the bottom of the stack Hadoop has to be conservative in adopting
things, but it should not preclude consumers of Hadoop (downstream projects
and Hadoop application developers) to have additional requirements such as
a higher JDK API than JDK6.

Hadoop 2.x should stick to using JDK6  API
Hadoop 2.x should be tested with multiple runtimes: JDK6, JDK7 and
eventually JDK8
Downstream projects and Hadoop application developers are free to require
any JDK6+ version for development and runtime.

Hadoop 3.x should allow using JDK7 API, bumping the minimum runtime
requirement to JDK7 and be tested with JDK7 and JDK8 runtimes.

---------- Forwarded message ----------

The minimum version of Java that Hadoop mandates is going to be the minimum
version of Java that the entire stack has to adopt, and the minimum version
of Java that has to be run in the datacentre.

I wonder about how easily it will be for us all to go to the big hadoop
sites and say "java 8+ only", as well as to all those Hadoop projects that
want to run on java 7 and say "upgrade time". I think we'll hit a lot of
inertia -and, to be fair- it's due to Hadoop core's long-standing support
for Java 6. If Hadoop 2.x had always been java7+ it would be simpler, but
we all know the trauma of getting hadoop 2.2 out the door and our lack of
enthusiasm for any major dependency updates apart from the protobuf one.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Steve Loughran <st...@hortonworks.com>.
On 21 June 2014 08:01, Andrew Wang <an...@cloudera.com> wrote:

> Hi Steve, let me confirm that I understand your proposal correctly:
>
> - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
> bumped library versions
> - Release a Hadoop 4 mid next year, based on JDK8
>
> I question the utility of an intermediate Hadoop 3 like this. Assuming that
> it gets out in September (i.e. roughly when a 2.6 would land), we're
> looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
> If this release also breaks compatibility by changing library versions,
> then it looks less and less appealing from a user perspective. I suspect it
> would end up seeing low adoption as everyone waits (at most) 7 months for
> the JDK8-based release to emerge.
>


I'm saying that we'd replace hadooop 2.6 with a 3.x release that, along
with the 2.6 changes, ups the java version and the JARs and dependencies
which we are frozen with in Hadoop 2.x

this issue of dependencies may not be so visible in hadoop's own codebase,
but when you write any downstream project, the majority of the xml
<clauses> in your POM file is about excluding stuff Hadoop pulls in. I've
been quietly trying to address this at HADOOP-9991, but we've reached the
limit of what can get in.

I'd be happy enough with the original "Stata Plan": a release of Hadoop 2.x
that says "java 7 + new libs", but given we've committed to not doing that,
releasing a Hadoop 3 stating that lets us get a hadoop with a modern set of
underpinnings out in 2014


>
> I'd be more okay with an intermediate release with no incompatible changes
> whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
> be a weak release considering that branch-2 already runs fine on JDK7, and
> it looks somewhat bad publicly as we burn another major release number less
> than a year since 2.x going GA.
>


it'll be > 1 year for 2.x to 3,

And to be realistic, the move to java 8+ across the entire hadoop stack
will probably take 1y too.


>
> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.
>
> I'll also posit that people will shy away from using JDK8 features while
> branch-2 remains in active use. There's definitely some new shiny there,
> but nothing compelling enough to me personally when weighed against the
> pain of harder branch-2 backports.
>


branch 2 would be frozen and tell everyone "move to java 7+", everything
downstream gets updated binaries and a chance to move forwards.

There's another issue, which is one Alejandro highlit:

---------- Forwarded message ----------
From: Alejandro Abdelnur <tu...@cloudera.com>
Date: 10 April 2014 10:30
Subject: Re: Plans of moving towards JDK7 in trunk
To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>


A bit of a different angle.

As the bottom of the stack Hadoop has to be conservative in adopting
things, but it should not preclude consumers of Hadoop (downstream projects
and Hadoop application developers) to have additional requirements such as
a higher JDK API than JDK6.

Hadoop 2.x should stick to using JDK6  API
Hadoop 2.x should be tested with multiple runtimes: JDK6, JDK7 and
eventually JDK8
Downstream projects and Hadoop application developers are free to require
any JDK6+ version for development and runtime.

Hadoop 3.x should allow using JDK7 API, bumping the minimum runtime
requirement to JDK7 and be tested with JDK7 and JDK8 runtimes.

---------- Forwarded message ----------

The minimum version of Java that Hadoop mandates is going to be the minimum
version of Java that the entire stack has to adopt, and the minimum version
of Java that has to be run in the datacentre.

I wonder about how easily it will be for us all to go to the big hadoop
sites and say "java 8+ only", as well as to all those Hadoop projects that
want to run on java 7 and say "upgrade time". I think we'll hit a lot of
inertia -and, to be fair- it's due to Hadoop core's long-standing support
for Java 6. If Hadoop 2.x had always been java7+ it would be simpler, but
we all know the trauma of getting hadoop 2.2 out the door and our lack of
enthusiasm for any major dependency updates apart from the protobuf one.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by "Arun C. Murthy" <ac...@hortonworks.com>.
Andrew,


> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
> Hi Steve, let me confirm that I understand your proposal correctly:
> 
> - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
> bumped library versions
> - Release a Hadoop 4 mid next year, based on JDK8
> 
> I question the utility of an intermediate Hadoop 3 like this. Assuming that
> it gets out in September (i.e. roughly when a 2.6 would land), we're
> looking at a valid lifespan of about 7 months before JDK7 is EOL i

JDK6 eol was Feb 2013 and, a year later, we are still have customers using it - which means we can't drop it yet.

http://www.oracle.com/technetwork/java/eol-135779.html

Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by April of next year... I suspect this means we'd have to support JDK7 at least till late 2015. I think, that, is really key regardless of version numbers.

Furthermore, if we, as a community, maintain discipline in terms of wire-compat, rolling-upgrades etc. we are better off making a major release every year - as you put, no more 'Big Bang' releases.

 We have to, as a development community, ourselves get over the 'trauma' of major releases - I do realize the irony here - but it's requisite to help our users feel confident in upgrading at a reasonable rate.

So, something like this could work:
# hadoop-2 / jdk6 - Oct 2013
# hadoop-3 / jdk7 - Oct 2014
# hadoop-4 / jdk8 - Oct 2015

Having said that, it would also be prudent to co-release hadoop-2/hadoop-3 & hadoop-3/hadoop-4 with requisite jdk versions. Maybe even hadoop-4 beta by middle of 2015. As such, it a good idea to allow trunk to move to jdk7 now - it's good practice as we will have to do the same for jdk8.

It does help, a lot, that we have now de-coupled user dependencies from the system with YARN. For e.g. we could run hadoop-2 MR on hadoop-3 YARN, even if there is some work remaining... see MAPREDUCE-4551. Future reliance on technologies like Docker will help further.

Thoughts?

Arun

> If this release also breaks compatibility by changing library versions,
> then it looks less and less appealing from a user perspective. I suspect it
> would end up seeing low adoption as everyone waits (at most) 7 months for
> the JDK8-based release to emerge.
> 
> I'd be more okay with an intermediate release with no incompatible changes
> whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
> be a weak release considering that branch-2 already runs fine on JDK7, and
> it looks somewhat bad publicly as we burn another major release number less
> than a year since 2.x going GA.
> 
> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.
> 
> I'll also posit that people will shy away from using JDK8 features while
> branch-2 remains in active use. There's definitely some new shiny there,
> but nothing compelling enough to me personally when weighed against the
> pain of harder branch-2 backports.
> 
> Let's try to keep this thread focused on the planning side of things
> though, deferring JDK-feature-related discussion to a different thread.
> We'd need to draw up a code-style doc on the wiki, but it sounds like
> something Steve and/or I could draft initially.
> 
> Thanks,
> Andrew
> 
> 
>> On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> 
>> 
>> On Jun 20, 2014, at 9:51 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> 
>>>> On 20 June 2014 21:35, Steve Loughran <st...@hortonworks.com> wrote:
>>>> 
>>>> 
>>>> This actually argues in favour of
>>>> 
>>>> -renaming branch-2 branch-3 after a release
>>>> -making trunk hadoop-4
>>>> 
>>>> -getting hadoop 3 released off the new branch-3 out in 2014, effectively
>>>> being an iteration of branch-2 with updated java , moves of (off?)
>> guava,
>>>> off jetty, lib changes, but no other significant "big bang" features
>>>> 
>>>> 
>>>> Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
>>>> particular, anything that goes into Hadoop 4 for which there's no
>> intent to
>>>> support in hadoop 2 & 3, can use the java 8 language features sooner
>> rather
>>>> than later.
>>> I should add that I'm willing to be the person who gets the Java-7 based
>>> Hadoop  3.x out the door later this year
>> 
>> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
>> share the pain… ;-)
>> 
>> Arun
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Steve Loughran <st...@hortonworks.com>.
On 21 June 2014 08:01, Andrew Wang <an...@cloudera.com> wrote:

> Hi Steve, let me confirm that I understand your proposal correctly:
>
> - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
> bumped library versions
> - Release a Hadoop 4 mid next year, based on JDK8
>
> I question the utility of an intermediate Hadoop 3 like this. Assuming that
> it gets out in September (i.e. roughly when a 2.6 would land), we're
> looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
> If this release also breaks compatibility by changing library versions,
> then it looks less and less appealing from a user perspective. I suspect it
> would end up seeing low adoption as everyone waits (at most) 7 months for
> the JDK8-based release to emerge.
>


I'm saying that we'd replace hadooop 2.6 with a 3.x release that, along
with the 2.6 changes, ups the java version and the JARs and dependencies
which we are frozen with in Hadoop 2.x

this issue of dependencies may not be so visible in hadoop's own codebase,
but when you write any downstream project, the majority of the xml
<clauses> in your POM file is about excluding stuff Hadoop pulls in. I've
been quietly trying to address this at HADOOP-9991, but we've reached the
limit of what can get in.

I'd be happy enough with the original "Stata Plan": a release of Hadoop 2.x
that says "java 7 + new libs", but given we've committed to not doing that,
releasing a Hadoop 3 stating that lets us get a hadoop with a modern set of
underpinnings out in 2014


>
> I'd be more okay with an intermediate release with no incompatible changes
> whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
> be a weak release considering that branch-2 already runs fine on JDK7, and
> it looks somewhat bad publicly as we burn another major release number less
> than a year since 2.x going GA.
>


it'll be > 1 year for 2.x to 3,

And to be realistic, the move to java 8+ across the entire hadoop stack
will probably take 1y too.


>
> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.
>
> I'll also posit that people will shy away from using JDK8 features while
> branch-2 remains in active use. There's definitely some new shiny there,
> but nothing compelling enough to me personally when weighed against the
> pain of harder branch-2 backports.
>


branch 2 would be frozen and tell everyone "move to java 7+", everything
downstream gets updated binaries and a chance to move forwards.

There's another issue, which is one Alejandro highlit:

---------- Forwarded message ----------
From: Alejandro Abdelnur <tu...@cloudera.com>
Date: 10 April 2014 10:30
Subject: Re: Plans of moving towards JDK7 in trunk
To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>


A bit of a different angle.

As the bottom of the stack Hadoop has to be conservative in adopting
things, but it should not preclude consumers of Hadoop (downstream projects
and Hadoop application developers) to have additional requirements such as
a higher JDK API than JDK6.

Hadoop 2.x should stick to using JDK6  API
Hadoop 2.x should be tested with multiple runtimes: JDK6, JDK7 and
eventually JDK8
Downstream projects and Hadoop application developers are free to require
any JDK6+ version for development and runtime.

Hadoop 3.x should allow using JDK7 API, bumping the minimum runtime
requirement to JDK7 and be tested with JDK7 and JDK8 runtimes.

---------- Forwarded message ----------

The minimum version of Java that Hadoop mandates is going to be the minimum
version of Java that the entire stack has to adopt, and the minimum version
of Java that has to be run in the datacentre.

I wonder about how easily it will be for us all to go to the big hadoop
sites and say "java 8+ only", as well as to all those Hadoop projects that
want to run on java 7 and say "upgrade time". I think we'll hit a lot of
inertia -and, to be fair- it's due to Hadoop core's long-standing support
for Java 6. If Hadoop 2.x had always been java7+ it would be simpler, but
we all know the trauma of getting hadoop 2.2 out the door and our lack of
enthusiasm for any major dependency updates apart from the protobuf one.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by sanjay Radia <sa...@hortonworks.com>.
On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:

> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.


Given your above statement  on compatibility (such as rolling upgrades),  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and instead merely be 2.x? Or do you have any incompatible changes to Hadoop protocol or APIs in mind during the same time period?

sanjay
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Steve Loughran <st...@hortonworks.com>.
On 21 June 2014 08:01, Andrew Wang <an...@cloudera.com> wrote:

> Hi Steve, let me confirm that I understand your proposal correctly:
>
> - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
> bumped library versions
> - Release a Hadoop 4 mid next year, based on JDK8
>
> I question the utility of an intermediate Hadoop 3 like this. Assuming that
> it gets out in September (i.e. roughly when a 2.6 would land), we're
> looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
> If this release also breaks compatibility by changing library versions,
> then it looks less and less appealing from a user perspective. I suspect it
> would end up seeing low adoption as everyone waits (at most) 7 months for
> the JDK8-based release to emerge.
>


I'm saying that we'd replace hadooop 2.6 with a 3.x release that, along
with the 2.6 changes, ups the java version and the JARs and dependencies
which we are frozen with in Hadoop 2.x

this issue of dependencies may not be so visible in hadoop's own codebase,
but when you write any downstream project, the majority of the xml
<clauses> in your POM file is about excluding stuff Hadoop pulls in. I've
been quietly trying to address this at HADOOP-9991, but we've reached the
limit of what can get in.

I'd be happy enough with the original "Stata Plan": a release of Hadoop 2.x
that says "java 7 + new libs", but given we've committed to not doing that,
releasing a Hadoop 3 stating that lets us get a hadoop with a modern set of
underpinnings out in 2014


>
> I'd be more okay with an intermediate release with no incompatible changes
> whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
> be a weak release considering that branch-2 already runs fine on JDK7, and
> it looks somewhat bad publicly as we burn another major release number less
> than a year since 2.x going GA.
>


it'll be > 1 year for 2.x to 3,

And to be realistic, the move to java 8+ across the entire hadoop stack
will probably take 1y too.


>
> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.
>
> I'll also posit that people will shy away from using JDK8 features while
> branch-2 remains in active use. There's definitely some new shiny there,
> but nothing compelling enough to me personally when weighed against the
> pain of harder branch-2 backports.
>


branch 2 would be frozen and tell everyone "move to java 7+", everything
downstream gets updated binaries and a chance to move forwards.

There's another issue, which is one Alejandro highlit:

---------- Forwarded message ----------
From: Alejandro Abdelnur <tu...@cloudera.com>
Date: 10 April 2014 10:30
Subject: Re: Plans of moving towards JDK7 in trunk
To: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>


A bit of a different angle.

As the bottom of the stack Hadoop has to be conservative in adopting
things, but it should not preclude consumers of Hadoop (downstream projects
and Hadoop application developers) to have additional requirements such as
a higher JDK API than JDK6.

Hadoop 2.x should stick to using JDK6  API
Hadoop 2.x should be tested with multiple runtimes: JDK6, JDK7 and
eventually JDK8
Downstream projects and Hadoop application developers are free to require
any JDK6+ version for development and runtime.

Hadoop 3.x should allow using JDK7 API, bumping the minimum runtime
requirement to JDK7 and be tested with JDK7 and JDK8 runtimes.

---------- Forwarded message ----------

The minimum version of Java that Hadoop mandates is going to be the minimum
version of Java that the entire stack has to adopt, and the minimum version
of Java that has to be run in the datacentre.

I wonder about how easily it will be for us all to go to the big hadoop
sites and say "java 8+ only", as well as to all those Hadoop projects that
want to run on java 7 and say "upgrade time". I think we'll hit a lot of
inertia -and, to be fair- it's due to Hadoop core's long-standing support
for Java 6. If Hadoop 2.x had always been java7+ it would be simpler, but
we all know the trauma of getting hadoop 2.2 out the door and our lack of
enthusiasm for any major dependency updates apart from the protobuf one.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by sanjay Radia <sa...@hortonworks.com>.
On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:

> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.


Given your above statement  on compatibility (such as rolling upgrades),  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and instead merely be 2.x? Or do you have any incompatible changes to Hadoop protocol or APIs in mind during the same time period?

sanjay
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by "Arun C. Murthy" <ac...@hortonworks.com>.
Andrew,


> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
> Hi Steve, let me confirm that I understand your proposal correctly:
> 
> - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
> bumped library versions
> - Release a Hadoop 4 mid next year, based on JDK8
> 
> I question the utility of an intermediate Hadoop 3 like this. Assuming that
> it gets out in September (i.e. roughly when a 2.6 would land), we're
> looking at a valid lifespan of about 7 months before JDK7 is EOL i

JDK6 eol was Feb 2013 and, a year later, we are still have customers using it - which means we can't drop it yet.

http://www.oracle.com/technetwork/java/eol-135779.html

Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by April of next year... I suspect this means we'd have to support JDK7 at least till late 2015. I think, that, is really key regardless of version numbers.

Furthermore, if we, as a community, maintain discipline in terms of wire-compat, rolling-upgrades etc. we are better off making a major release every year - as you put, no more 'Big Bang' releases.

 We have to, as a development community, ourselves get over the 'trauma' of major releases - I do realize the irony here - but it's requisite to help our users feel confident in upgrading at a reasonable rate.

So, something like this could work:
# hadoop-2 / jdk6 - Oct 2013
# hadoop-3 / jdk7 - Oct 2014
# hadoop-4 / jdk8 - Oct 2015

Having said that, it would also be prudent to co-release hadoop-2/hadoop-3 & hadoop-3/hadoop-4 with requisite jdk versions. Maybe even hadoop-4 beta by middle of 2015. As such, it a good idea to allow trunk to move to jdk7 now - it's good practice as we will have to do the same for jdk8.

It does help, a lot, that we have now de-coupled user dependencies from the system with YARN. For e.g. we could run hadoop-2 MR on hadoop-3 YARN, even if there is some work remaining... see MAPREDUCE-4551. Future reliance on technologies like Docker will help further.

Thoughts?

Arun

> If this release also breaks compatibility by changing library versions,
> then it looks less and less appealing from a user perspective. I suspect it
> would end up seeing low adoption as everyone waits (at most) 7 months for
> the JDK8-based release to emerge.
> 
> I'd be more okay with an intermediate release with no incompatible changes
> whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
> be a weak release considering that branch-2 already runs fine on JDK7, and
> it looks somewhat bad publicly as we burn another major release number less
> than a year since 2.x going GA.
> 
> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.
> 
> I'll also posit that people will shy away from using JDK8 features while
> branch-2 remains in active use. There's definitely some new shiny there,
> but nothing compelling enough to me personally when weighed against the
> pain of harder branch-2 backports.
> 
> Let's try to keep this thread focused on the planning side of things
> though, deferring JDK-feature-related discussion to a different thread.
> We'd need to draw up a code-style doc on the wiki, but it sounds like
> something Steve and/or I could draft initially.
> 
> Thanks,
> Andrew
> 
> 
>> On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> 
>> 
>> On Jun 20, 2014, at 9:51 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> 
>>>> On 20 June 2014 21:35, Steve Loughran <st...@hortonworks.com> wrote:
>>>> 
>>>> 
>>>> This actually argues in favour of
>>>> 
>>>> -renaming branch-2 branch-3 after a release
>>>> -making trunk hadoop-4
>>>> 
>>>> -getting hadoop 3 released off the new branch-3 out in 2014, effectively
>>>> being an iteration of branch-2 with updated java , moves of (off?)
>> guava,
>>>> off jetty, lib changes, but no other significant "big bang" features
>>>> 
>>>> 
>>>> Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
>>>> particular, anything that goes into Hadoop 4 for which there's no
>> intent to
>>>> support in hadoop 2 & 3, can use the java 8 language features sooner
>> rather
>>>> than later.
>>> I should add that I'm willing to be the person who gets the Java-7 based
>>> Hadoop  3.x out the door later this year
>> 
>> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
>> share the pain… ;-)
>> 
>> Arun
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by "Arun C. Murthy" <ac...@hortonworks.com>.
Andrew,


> On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:
> 
> Hi Steve, let me confirm that I understand your proposal correctly:
> 
> - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
> bumped library versions
> - Release a Hadoop 4 mid next year, based on JDK8
> 
> I question the utility of an intermediate Hadoop 3 like this. Assuming that
> it gets out in September (i.e. roughly when a 2.6 would land), we're
> looking at a valid lifespan of about 7 months before JDK7 is EOL i

JDK6 eol was Feb 2013 and, a year later, we are still have customers using it - which means we can't drop it yet.

http://www.oracle.com/technetwork/java/eol-135779.html

Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by April of next year... I suspect this means we'd have to support JDK7 at least till late 2015. I think, that, is really key regardless of version numbers.

Furthermore, if we, as a community, maintain discipline in terms of wire-compat, rolling-upgrades etc. we are better off making a major release every year - as you put, no more 'Big Bang' releases.

 We have to, as a development community, ourselves get over the 'trauma' of major releases - I do realize the irony here - but it's requisite to help our users feel confident in upgrading at a reasonable rate.

So, something like this could work:
# hadoop-2 / jdk6 - Oct 2013
# hadoop-3 / jdk7 - Oct 2014
# hadoop-4 / jdk8 - Oct 2015

Having said that, it would also be prudent to co-release hadoop-2/hadoop-3 & hadoop-3/hadoop-4 with requisite jdk versions. Maybe even hadoop-4 beta by middle of 2015. As such, it a good idea to allow trunk to move to jdk7 now - it's good practice as we will have to do the same for jdk8.

It does help, a lot, that we have now de-coupled user dependencies from the system with YARN. For e.g. we could run hadoop-2 MR on hadoop-3 YARN, even if there is some work remaining... see MAPREDUCE-4551. Future reliance on technologies like Docker will help further.

Thoughts?

Arun

> If this release also breaks compatibility by changing library versions,
> then it looks less and less appealing from a user perspective. I suspect it
> would end up seeing low adoption as everyone waits (at most) 7 months for
> the JDK8-based release to emerge.
> 
> I'd be more okay with an intermediate release with no incompatible changes
> whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
> be a weak release considering that branch-2 already runs fine on JDK7, and
> it looks somewhat bad publicly as we burn another major release number less
> than a year since 2.x going GA.
> 
> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.
> 
> I'll also posit that people will shy away from using JDK8 features while
> branch-2 remains in active use. There's definitely some new shiny there,
> but nothing compelling enough to me personally when weighed against the
> pain of harder branch-2 backports.
> 
> Let's try to keep this thread focused on the planning side of things
> though, deferring JDK-feature-related discussion to a different thread.
> We'd need to draw up a code-style doc on the wiki, but it sounds like
> something Steve and/or I could draft initially.
> 
> Thanks,
> Andrew
> 
> 
>> On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> 
>> 
>> On Jun 20, 2014, at 9:51 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> 
>>>> On 20 June 2014 21:35, Steve Loughran <st...@hortonworks.com> wrote:
>>>> 
>>>> 
>>>> This actually argues in favour of
>>>> 
>>>> -renaming branch-2 branch-3 after a release
>>>> -making trunk hadoop-4
>>>> 
>>>> -getting hadoop 3 released off the new branch-3 out in 2014, effectively
>>>> being an iteration of branch-2 with updated java , moves of (off?)
>> guava,
>>>> off jetty, lib changes, but no other significant "big bang" features
>>>> 
>>>> 
>>>> Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
>>>> particular, anything that goes into Hadoop 4 for which there's no
>> intent to
>>>> support in hadoop 2 & 3, can use the java 8 language features sooner
>> rather
>>>> than later.
>>> I should add that I'm willing to be the person who gets the Java-7 based
>>> Hadoop  3.x out the door later this year
>> 
>> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
>> share the pain… ;-)
>> 
>> Arun
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by sanjay Radia <sa...@hortonworks.com>.
On Jun 21, 2014, at 8:01 AM, Andrew Wang <an...@cloudera.com> wrote:

> This is why I'd like to keep my original proposal on the table: keep going
> with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
> by April next year. It doesn't need to be a big bang release either. I'd be
> delighted if we could rolling upgrade from one to the other. I just didn't
> want to rule out the inclusion of some very compelling feature outright.
> Trust me though, I'd be the first person to ask about compatibility if such
> a feature does come up.


Given your above statement  on compatibility (such as rolling upgrades),  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and instead merely be 2.x? Or do you have any incompatible changes to Hadoop protocol or APIs in mind during the same time period?

sanjay
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Plans of moving towards JDK7 in trunk

Posted by Andrew Wang <an...@cloudera.com>.
Hi Steve, let me confirm that I understand your proposal correctly:

- Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
bumped library versions
- Release a Hadoop 4 mid next year, based on JDK8

I question the utility of an intermediate Hadoop 3 like this. Assuming that
it gets out in September (i.e. roughly when a 2.6 would land), we're
looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
If this release also breaks compatibility by changing library versions,
then it looks less and less appealing from a user perspective. I suspect it
would end up seeing low adoption as everyone waits (at most) 7 months for
the JDK8-based release to emerge.

I'd be more okay with an intermediate release with no incompatible changes
whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
be a weak release considering that branch-2 already runs fine on JDK7, and
it looks somewhat bad publicly as we burn another major release number less
than a year since 2.x going GA.

This is why I'd like to keep my original proposal on the table: keep going
with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
by April next year. It doesn't need to be a big bang release either. I'd be
delighted if we could rolling upgrade from one to the other. I just didn't
want to rule out the inclusion of some very compelling feature outright.
Trust me though, I'd be the first person to ask about compatibility if such
a feature does come up.

I'll also posit that people will shy away from using JDK8 features while
branch-2 remains in active use. There's definitely some new shiny there,
but nothing compelling enough to me personally when weighed against the
pain of harder branch-2 backports.

Let's try to keep this thread focused on the planning side of things
though, deferring JDK-feature-related discussion to a different thread.
We'd need to draw up a code-style doc on the wiki, but it sounds like
something Steve and/or I could draft initially.

Thanks,
Andrew


On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

>
> On Jun 20, 2014, at 9:51 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> > On 20 June 2014 21:35, Steve Loughran <st...@hortonworks.com> wrote:
> >
> >>
> >> This actually argues in favour of
> >>
> >> -renaming branch-2 branch-3 after a release
> >> -making trunk hadoop-4
> >>
> >> -getting hadoop 3 released off the new branch-3 out in 2014, effectively
> >> being an iteration of branch-2 with updated java , moves of (off?)
> guava,
> >> off jetty, lib changes, but no other significant "big bang" features
> >>
> >>
> >> Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
> >> particular, anything that goes into Hadoop 4 for which there's no
> intent to
> >> support in hadoop 2 & 3, can use the java 8 language features sooner
> rather
> >> than later.
> >>
> >>
> >>
> > I should add that I'm willing to be the person who gets the Java-7 based
> > Hadoop  3.x out the door later this year
>
> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
> share the pain… ;-)
>
> Arun
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Plans of moving towards JDK7 in trunk

Posted by Andrew Wang <an...@cloudera.com>.
Hi Steve, let me confirm that I understand your proposal correctly:

- Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
bumped library versions
- Release a Hadoop 4 mid next year, based on JDK8

I question the utility of an intermediate Hadoop 3 like this. Assuming that
it gets out in September (i.e. roughly when a 2.6 would land), we're
looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
If this release also breaks compatibility by changing library versions,
then it looks less and less appealing from a user perspective. I suspect it
would end up seeing low adoption as everyone waits (at most) 7 months for
the JDK8-based release to emerge.

I'd be more okay with an intermediate release with no incompatible changes
whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
be a weak release considering that branch-2 already runs fine on JDK7, and
it looks somewhat bad publicly as we burn another major release number less
than a year since 2.x going GA.

This is why I'd like to keep my original proposal on the table: keep going
with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
by April next year. It doesn't need to be a big bang release either. I'd be
delighted if we could rolling upgrade from one to the other. I just didn't
want to rule out the inclusion of some very compelling feature outright.
Trust me though, I'd be the first person to ask about compatibility if such
a feature does come up.

I'll also posit that people will shy away from using JDK8 features while
branch-2 remains in active use. There's definitely some new shiny there,
but nothing compelling enough to me personally when weighed against the
pain of harder branch-2 backports.

Let's try to keep this thread focused on the planning side of things
though, deferring JDK-feature-related discussion to a different thread.
We'd need to draw up a code-style doc on the wiki, but it sounds like
something Steve and/or I could draft initially.

Thanks,
Andrew


On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

>
> On Jun 20, 2014, at 9:51 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> > On 20 June 2014 21:35, Steve Loughran <st...@hortonworks.com> wrote:
> >
> >>
> >> This actually argues in favour of
> >>
> >> -renaming branch-2 branch-3 after a release
> >> -making trunk hadoop-4
> >>
> >> -getting hadoop 3 released off the new branch-3 out in 2014, effectively
> >> being an iteration of branch-2 with updated java , moves of (off?)
> guava,
> >> off jetty, lib changes, but no other significant "big bang" features
> >>
> >>
> >> Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
> >> particular, anything that goes into Hadoop 4 for which there's no
> intent to
> >> support in hadoop 2 & 3, can use the java 8 language features sooner
> rather
> >> than later.
> >>
> >>
> >>
> > I should add that I'm willing to be the person who gets the Java-7 based
> > Hadoop  3.x out the door later this year
>
> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
> share the pain… ;-)
>
> Arun
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Plans of moving towards JDK7 in trunk

Posted by Andrew Wang <an...@cloudera.com>.
Hi Steve, let me confirm that I understand your proposal correctly:

- Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
bumped library versions
- Release a Hadoop 4 mid next year, based on JDK8

I question the utility of an intermediate Hadoop 3 like this. Assuming that
it gets out in September (i.e. roughly when a 2.6 would land), we're
looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
If this release also breaks compatibility by changing library versions,
then it looks less and less appealing from a user perspective. I suspect it
would end up seeing low adoption as everyone waits (at most) 7 months for
the JDK8-based release to emerge.

I'd be more okay with an intermediate release with no incompatible changes
whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
be a weak release considering that branch-2 already runs fine on JDK7, and
it looks somewhat bad publicly as we burn another major release number less
than a year since 2.x going GA.

This is why I'd like to keep my original proposal on the table: keep going
with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
by April next year. It doesn't need to be a big bang release either. I'd be
delighted if we could rolling upgrade from one to the other. I just didn't
want to rule out the inclusion of some very compelling feature outright.
Trust me though, I'd be the first person to ask about compatibility if such
a feature does come up.

I'll also posit that people will shy away from using JDK8 features while
branch-2 remains in active use. There's definitely some new shiny there,
but nothing compelling enough to me personally when weighed against the
pain of harder branch-2 backports.

Let's try to keep this thread focused on the planning side of things
though, deferring JDK-feature-related discussion to a different thread.
We'd need to draw up a code-style doc on the wiki, but it sounds like
something Steve and/or I could draft initially.

Thanks,
Andrew


On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

>
> On Jun 20, 2014, at 9:51 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> > On 20 June 2014 21:35, Steve Loughran <st...@hortonworks.com> wrote:
> >
> >>
> >> This actually argues in favour of
> >>
> >> -renaming branch-2 branch-3 after a release
> >> -making trunk hadoop-4
> >>
> >> -getting hadoop 3 released off the new branch-3 out in 2014, effectively
> >> being an iteration of branch-2 with updated java , moves of (off?)
> guava,
> >> off jetty, lib changes, but no other significant "big bang" features
> >>
> >>
> >> Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
> >> particular, anything that goes into Hadoop 4 for which there's no
> intent to
> >> support in hadoop 2 & 3, can use the java 8 language features sooner
> rather
> >> than later.
> >>
> >>
> >>
> > I should add that I'm willing to be the person who gets the Java-7 based
> > Hadoop  3.x out the door later this year
>
> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
> share the pain… ;-)
>
> Arun
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Plans of moving towards JDK7 in trunk

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> > Hadoop  3.x out the door later this year
>
> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
> share the pain… ;-)


Hey Arun, you may have missed that Andrew volunteered for doing this as
well (the thread is long, so easy to miss).

Cheers

-- 
Alejandro

Re: Plans of moving towards JDK7 in trunk

Posted by Andrew Wang <an...@cloudera.com>.
Hi Steve, let me confirm that I understand your proposal correctly:

- Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
bumped library versions
- Release a Hadoop 4 mid next year, based on JDK8

I question the utility of an intermediate Hadoop 3 like this. Assuming that
it gets out in September (i.e. roughly when a 2.6 would land), we're
looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
If this release also breaks compatibility by changing library versions,
then it looks less and less appealing from a user perspective. I suspect it
would end up seeing low adoption as everyone waits (at most) 7 months for
the JDK8-based release to emerge.

I'd be more okay with an intermediate release with no incompatible changes
whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
be a weak release considering that branch-2 already runs fine on JDK7, and
it looks somewhat bad publicly as we burn another major release number less
than a year since 2.x going GA.

This is why I'd like to keep my original proposal on the table: keep going
with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
by April next year. It doesn't need to be a big bang release either. I'd be
delighted if we could rolling upgrade from one to the other. I just didn't
want to rule out the inclusion of some very compelling feature outright.
Trust me though, I'd be the first person to ask about compatibility if such
a feature does come up.

I'll also posit that people will shy away from using JDK8 features while
branch-2 remains in active use. There's definitely some new shiny there,
but nothing compelling enough to me personally when weighed against the
pain of harder branch-2 backports.

Let's try to keep this thread focused on the planning side of things
though, deferring JDK-feature-related discussion to a different thread.
We'd need to draw up a code-style doc on the wiki, but it sounds like
something Steve and/or I could draft initially.

Thanks,
Andrew


On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

>
> On Jun 20, 2014, at 9:51 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> > On 20 June 2014 21:35, Steve Loughran <st...@hortonworks.com> wrote:
> >
> >>
> >> This actually argues in favour of
> >>
> >> -renaming branch-2 branch-3 after a release
> >> -making trunk hadoop-4
> >>
> >> -getting hadoop 3 released off the new branch-3 out in 2014, effectively
> >> being an iteration of branch-2 with updated java , moves of (off?)
> guava,
> >> off jetty, lib changes, but no other significant "big bang" features
> >>
> >>
> >> Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
> >> particular, anything that goes into Hadoop 4 for which there's no
> intent to
> >> support in hadoop 2 & 3, can use the java 8 language features sooner
> rather
> >> than later.
> >>
> >>
> >>
> > I should add that I'm willing to be the person who gets the Java-7 based
> > Hadoop  3.x out the door later this year
>
> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
> share the pain… ;-)
>
> Arun
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Plans of moving towards JDK7 in trunk

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> > Hadoop  3.x out the door later this year
>
> +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
> share the pain… ;-)


Hey Arun, you may have missed that Andrew volunteered for doing this as
well (the thread is long, so easy to miss).

Cheers

-- 
Alejandro