You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Steve Loughran <st...@hortonworks.com> on 2015/03/09 22:15:08 UTC

Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.

Issue: JDK 8 vs 7

It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.

You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.

What we can't do in hadoop coretoday is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.

So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.

Issue: Incompatible changes

Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...

Issue: Getting trunk out the door

The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this: no recompilation necessary

Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.

It seems to me that I could go

git checkout trunk
mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.

A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team

This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.

Comments?

-Steve

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Arun Murthy <ac...@hortonworks.com>.

Steve,

________________________________________
From: Steve Loughran <st...@hortonworks.com>
Sent: Monday, March 09, 2015 2:15 PM
To: mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Issue: Getting trunk out the door

The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary

Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.

It seems to me that I could go

git checkout trunk
        mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.

A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team

This seems like a great idea, something I hadn't considered before since most patches were flowing into branch-2 anyway - makes a lot of sense.

We could just drop branch-2 while we are at it too. It's just a pain to maintain an extra branch. Also, we should formalize that major features should always come via feature branches - allows for some oversight on compatibility etc. as a whole (not piecemeal) when the feature branch is merged.

In particular, let's also make sure we ship the script changes in a compatible manner. Happy to help.

Given that Vinod has stepped up for 2.7, would you like to drive 2.8? 

Practically, this is reality already, but something to formalize: having RMs per dot release (Karthik for 2.5, Vinod for 2.7,  Steve for 2.8 etc.).

thanks,
Arun

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Er, that should read "as Allen commented"  C.

On Tue, Mar 10, 2015 at 11:55 AM, Colin P. McCabe <cm...@apache.org> wrote:
> Hi Arun,
>
> Not all changes which are incompatible can be "fixed"-- sometimes an
> incompatibility is a necessary part of a change.  For example, taking
> a really old library dependency with known security issues off the
> CLASSPATH will create incompatibilities, but it's also necessary.  A
> minimum JDK version bump also falls in that category.  There are also
> cases where we need to drop support for really obsolete and baroque
> features from the past.  For example, it would be nice if we could
> finally get rid of the code to read pre-transactional edit logs.  It's
> a substantial amount of code.  We could argue that we should just
> support legacy stuff forever, but code quality will suffer.
>
> These changes need to be made sooner or later, and a major version
> bump is an ideal place to make them.  I think that making these
> changes in a 2.x release is hostile to operators, as Alan commented.
> That's what we're trying to avoid by discussing Hadoop 3.x.
>
> Colin
>
> On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <ac...@hortonworks.com> wrote:
>> Colin,
>>
>>  Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?
>>
>> Arun
>>
>> ________________________________________
>> From: Colin P. McCabe <cm...@apache.org>
>> Sent: Monday, March 09, 2015 3:05 PM
>> To: hdfs-dev@hadoop.apache.org
>> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
>>
>> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
>> to plan a new Hadoop release against a version of Java that is almost
>> obsolete and (soon) no longer receiving security updates.  I think
>> people will be willing to roll out a new version of Java for Hadoop
>> 3.x.
>>
>> Similarly, the whole point of bumping the major version number is the
>> ability to make incompatible changes.  There are already a bunch of
>> incompatible changes in the trunk branch.  Are you proposing to revert
>> those?  Or push them into newly created feature branches?  This
>> doesn't seem like a good idea to me.
>>
>> I would be in favor of backporting targetted incompatible changes from
>> trunk to branch-2.  For example, we could consider pulling in Allen's
>> shell script rewrite.  But pulling in all of trunk seems like a bad
>> idea at this point, if we want a 2.x release.
>>
>> best,
>> Colin
>>
>> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>>>
>>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>>>
>>> Issue: JDK 8 vs 7
>>>
>>> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>>>
>>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>>>
>>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>>>
>>> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>>>
>>> Issue: Incompatible changes
>>>
>>> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>>>
>>> Issue: Getting trunk out the door
>>>
>>> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>>>
>>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>>>
>>> It seems to me that I could go
>>>
>>> git checkout trunk
>>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>>>
>>> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>>>
>>> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>>>
>>> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>>>
>>> Comments?
>>>
>>> -Steve
>>>
>>>
>>>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Er, that should read "as Allen commented"  C.

On Tue, Mar 10, 2015 at 11:55 AM, Colin P. McCabe <cm...@apache.org> wrote:
> Hi Arun,
>
> Not all changes which are incompatible can be "fixed"-- sometimes an
> incompatibility is a necessary part of a change.  For example, taking
> a really old library dependency with known security issues off the
> CLASSPATH will create incompatibilities, but it's also necessary.  A
> minimum JDK version bump also falls in that category.  There are also
> cases where we need to drop support for really obsolete and baroque
> features from the past.  For example, it would be nice if we could
> finally get rid of the code to read pre-transactional edit logs.  It's
> a substantial amount of code.  We could argue that we should just
> support legacy stuff forever, but code quality will suffer.
>
> These changes need to be made sooner or later, and a major version
> bump is an ideal place to make them.  I think that making these
> changes in a 2.x release is hostile to operators, as Alan commented.
> That's what we're trying to avoid by discussing Hadoop 3.x.
>
> Colin
>
> On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <ac...@hortonworks.com> wrote:
>> Colin,
>>
>>  Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?
>>
>> Arun
>>
>> ________________________________________
>> From: Colin P. McCabe <cm...@apache.org>
>> Sent: Monday, March 09, 2015 3:05 PM
>> To: hdfs-dev@hadoop.apache.org
>> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
>>
>> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
>> to plan a new Hadoop release against a version of Java that is almost
>> obsolete and (soon) no longer receiving security updates.  I think
>> people will be willing to roll out a new version of Java for Hadoop
>> 3.x.
>>
>> Similarly, the whole point of bumping the major version number is the
>> ability to make incompatible changes.  There are already a bunch of
>> incompatible changes in the trunk branch.  Are you proposing to revert
>> those?  Or push them into newly created feature branches?  This
>> doesn't seem like a good idea to me.
>>
>> I would be in favor of backporting targetted incompatible changes from
>> trunk to branch-2.  For example, we could consider pulling in Allen's
>> shell script rewrite.  But pulling in all of trunk seems like a bad
>> idea at this point, if we want a 2.x release.
>>
>> best,
>> Colin
>>
>> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>>>
>>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>>>
>>> Issue: JDK 8 vs 7
>>>
>>> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>>>
>>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>>>
>>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>>>
>>> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>>>
>>> Issue: Incompatible changes
>>>
>>> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>>>
>>> Issue: Getting trunk out the door
>>>
>>> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>>>
>>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>>>
>>> It seems to me that I could go
>>>
>>> git checkout trunk
>>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>>>
>>> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>>>
>>> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>>>
>>> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>>>
>>> Comments?
>>>
>>> -Steve
>>>
>>>
>>>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Er, that should read "as Allen commented"  C.

On Tue, Mar 10, 2015 at 11:55 AM, Colin P. McCabe <cm...@apache.org> wrote:
> Hi Arun,
>
> Not all changes which are incompatible can be "fixed"-- sometimes an
> incompatibility is a necessary part of a change.  For example, taking
> a really old library dependency with known security issues off the
> CLASSPATH will create incompatibilities, but it's also necessary.  A
> minimum JDK version bump also falls in that category.  There are also
> cases where we need to drop support for really obsolete and baroque
> features from the past.  For example, it would be nice if we could
> finally get rid of the code to read pre-transactional edit logs.  It's
> a substantial amount of code.  We could argue that we should just
> support legacy stuff forever, but code quality will suffer.
>
> These changes need to be made sooner or later, and a major version
> bump is an ideal place to make them.  I think that making these
> changes in a 2.x release is hostile to operators, as Alan commented.
> That's what we're trying to avoid by discussing Hadoop 3.x.
>
> Colin
>
> On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <ac...@hortonworks.com> wrote:
>> Colin,
>>
>>  Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?
>>
>> Arun
>>
>> ________________________________________
>> From: Colin P. McCabe <cm...@apache.org>
>> Sent: Monday, March 09, 2015 3:05 PM
>> To: hdfs-dev@hadoop.apache.org
>> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
>>
>> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
>> to plan a new Hadoop release against a version of Java that is almost
>> obsolete and (soon) no longer receiving security updates.  I think
>> people will be willing to roll out a new version of Java for Hadoop
>> 3.x.
>>
>> Similarly, the whole point of bumping the major version number is the
>> ability to make incompatible changes.  There are already a bunch of
>> incompatible changes in the trunk branch.  Are you proposing to revert
>> those?  Or push them into newly created feature branches?  This
>> doesn't seem like a good idea to me.
>>
>> I would be in favor of backporting targetted incompatible changes from
>> trunk to branch-2.  For example, we could consider pulling in Allen's
>> shell script rewrite.  But pulling in all of trunk seems like a bad
>> idea at this point, if we want a 2.x release.
>>
>> best,
>> Colin
>>
>> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>>>
>>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>>>
>>> Issue: JDK 8 vs 7
>>>
>>> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>>>
>>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>>>
>>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>>>
>>> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>>>
>>> Issue: Incompatible changes
>>>
>>> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>>>
>>> Issue: Getting trunk out the door
>>>
>>> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>>>
>>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>>>
>>> It seems to me that I could go
>>>
>>> git checkout trunk
>>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>>>
>>> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>>>
>>> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>>>
>>> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>>>
>>> Comments?
>>>
>>> -Steve
>>>
>>>
>>>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Er, that should read "as Allen commented"  C.

On Tue, Mar 10, 2015 at 11:55 AM, Colin P. McCabe <cm...@apache.org> wrote:
> Hi Arun,
>
> Not all changes which are incompatible can be "fixed"-- sometimes an
> incompatibility is a necessary part of a change.  For example, taking
> a really old library dependency with known security issues off the
> CLASSPATH will create incompatibilities, but it's also necessary.  A
> minimum JDK version bump also falls in that category.  There are also
> cases where we need to drop support for really obsolete and baroque
> features from the past.  For example, it would be nice if we could
> finally get rid of the code to read pre-transactional edit logs.  It's
> a substantial amount of code.  We could argue that we should just
> support legacy stuff forever, but code quality will suffer.
>
> These changes need to be made sooner or later, and a major version
> bump is an ideal place to make them.  I think that making these
> changes in a 2.x release is hostile to operators, as Alan commented.
> That's what we're trying to avoid by discussing Hadoop 3.x.
>
> Colin
>
> On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <ac...@hortonworks.com> wrote:
>> Colin,
>>
>>  Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?
>>
>> Arun
>>
>> ________________________________________
>> From: Colin P. McCabe <cm...@apache.org>
>> Sent: Monday, March 09, 2015 3:05 PM
>> To: hdfs-dev@hadoop.apache.org
>> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
>>
>> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
>> to plan a new Hadoop release against a version of Java that is almost
>> obsolete and (soon) no longer receiving security updates.  I think
>> people will be willing to roll out a new version of Java for Hadoop
>> 3.x.
>>
>> Similarly, the whole point of bumping the major version number is the
>> ability to make incompatible changes.  There are already a bunch of
>> incompatible changes in the trunk branch.  Are you proposing to revert
>> those?  Or push them into newly created feature branches?  This
>> doesn't seem like a good idea to me.
>>
>> I would be in favor of backporting targetted incompatible changes from
>> trunk to branch-2.  For example, we could consider pulling in Allen's
>> shell script rewrite.  But pulling in all of trunk seems like a bad
>> idea at this point, if we want a 2.x release.
>>
>> best,
>> Colin
>>
>> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>>>
>>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>>>
>>> Issue: JDK 8 vs 7
>>>
>>> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>>>
>>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>>>
>>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>>>
>>> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>>>
>>> Issue: Incompatible changes
>>>
>>> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>>>
>>> Issue: Getting trunk out the door
>>>
>>> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>>>
>>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>>>
>>> It seems to me that I could go
>>>
>>> git checkout trunk
>>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>>>
>>> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>>>
>>> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>>>
>>> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>>>
>>> Comments?
>>>
>>> -Steve
>>>
>>>
>>>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Hi Arun,

Not all changes which are incompatible can be "fixed"-- sometimes an
incompatibility is a necessary part of a change.  For example, taking
a really old library dependency with known security issues off the
CLASSPATH will create incompatibilities, but it's also necessary.  A
minimum JDK version bump also falls in that category.  There are also
cases where we need to drop support for really obsolete and baroque
features from the past.  For example, it would be nice if we could
finally get rid of the code to read pre-transactional edit logs.  It's
a substantial amount of code.  We could argue that we should just
support legacy stuff forever, but code quality will suffer.

These changes need to be made sooner or later, and a major version
bump is an ideal place to make them.  I think that making these
changes in a 2.x release is hostile to operators, as Alan commented.
That's what we're trying to avoid by discussing Hadoop 3.x.

Colin

On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> Colin,
>
>  Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?
>
> Arun
>
> ________________________________________
> From: Colin P. McCabe <cm...@apache.org>
> Sent: Monday, March 09, 2015 3:05 PM
> To: hdfs-dev@hadoop.apache.org
> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
>
> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
>
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
>
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
>
> best,
> Colin
>
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>>
>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>>
>> Issue: JDK 8 vs 7
>>
>> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>>
>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>>
>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>>
>> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>>
>> Issue: Incompatible changes
>>
>> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>>
>> Issue: Getting trunk out the door
>>
>> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>>
>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>>
>> It seems to me that I could go
>>
>> git checkout trunk
>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>>
>> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>>
>> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>>
>> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>>
>> Comments?
>>
>> -Steve
>>
>>
>>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Hi Arun,

Not all changes which are incompatible can be "fixed"-- sometimes an
incompatibility is a necessary part of a change.  For example, taking
a really old library dependency with known security issues off the
CLASSPATH will create incompatibilities, but it's also necessary.  A
minimum JDK version bump also falls in that category.  There are also
cases where we need to drop support for really obsolete and baroque
features from the past.  For example, it would be nice if we could
finally get rid of the code to read pre-transactional edit logs.  It's
a substantial amount of code.  We could argue that we should just
support legacy stuff forever, but code quality will suffer.

These changes need to be made sooner or later, and a major version
bump is an ideal place to make them.  I think that making these
changes in a 2.x release is hostile to operators, as Alan commented.
That's what we're trying to avoid by discussing Hadoop 3.x.

Colin

On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> Colin,
>
>  Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?
>
> Arun
>
> ________________________________________
> From: Colin P. McCabe <cm...@apache.org>
> Sent: Monday, March 09, 2015 3:05 PM
> To: hdfs-dev@hadoop.apache.org
> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
>
> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
>
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
>
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
>
> best,
> Colin
>
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>>
>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>>
>> Issue: JDK 8 vs 7
>>
>> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>>
>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>>
>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>>
>> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>>
>> Issue: Incompatible changes
>>
>> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>>
>> Issue: Getting trunk out the door
>>
>> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>>
>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>>
>> It seems to me that I could go
>>
>> git checkout trunk
>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>>
>> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>>
>> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>>
>> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>>
>> Comments?
>>
>> -Steve
>>
>>
>>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Hi Arun,

Not all changes which are incompatible can be "fixed"-- sometimes an
incompatibility is a necessary part of a change.  For example, taking
a really old library dependency with known security issues off the
CLASSPATH will create incompatibilities, but it's also necessary.  A
minimum JDK version bump also falls in that category.  There are also
cases where we need to drop support for really obsolete and baroque
features from the past.  For example, it would be nice if we could
finally get rid of the code to read pre-transactional edit logs.  It's
a substantial amount of code.  We could argue that we should just
support legacy stuff forever, but code quality will suffer.

These changes need to be made sooner or later, and a major version
bump is an ideal place to make them.  I think that making these
changes in a 2.x release is hostile to operators, as Alan commented.
That's what we're trying to avoid by discussing Hadoop 3.x.

Colin

On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> Colin,
>
>  Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?
>
> Arun
>
> ________________________________________
> From: Colin P. McCabe <cm...@apache.org>
> Sent: Monday, March 09, 2015 3:05 PM
> To: hdfs-dev@hadoop.apache.org
> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
>
> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
>
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
>
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
>
> best,
> Colin
>
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>>
>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>>
>> Issue: JDK 8 vs 7
>>
>> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>>
>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>>
>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>>
>> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>>
>> Issue: Incompatible changes
>>
>> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>>
>> Issue: Getting trunk out the door
>>
>> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>>
>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>>
>> It seems to me that I could go
>>
>> git checkout trunk
>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>>
>> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>>
>> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>>
>> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>>
>> Comments?
>>
>> -Steve
>>
>>
>>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Hi Arun,

Not all changes which are incompatible can be "fixed"-- sometimes an
incompatibility is a necessary part of a change.  For example, taking
a really old library dependency with known security issues off the
CLASSPATH will create incompatibilities, but it's also necessary.  A
minimum JDK version bump also falls in that category.  There are also
cases where we need to drop support for really obsolete and baroque
features from the past.  For example, it would be nice if we could
finally get rid of the code to read pre-transactional edit logs.  It's
a substantial amount of code.  We could argue that we should just
support legacy stuff forever, but code quality will suffer.

These changes need to be made sooner or later, and a major version
bump is an ideal place to make them.  I think that making these
changes in a 2.x release is hostile to operators, as Alan commented.
That's what we're trying to avoid by discussing Hadoop 3.x.

Colin

On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <ac...@hortonworks.com> wrote:
> Colin,
>
>  Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?
>
> Arun
>
> ________________________________________
> From: Colin P. McCabe <cm...@apache.org>
> Sent: Monday, March 09, 2015 3:05 PM
> To: hdfs-dev@hadoop.apache.org
> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
>
> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
>
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
>
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
>
> best,
> Colin
>
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>>
>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>>
>> Issue: JDK 8 vs 7
>>
>> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>>
>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>>
>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>>
>> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>>
>> Issue: Incompatible changes
>>
>> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>>
>> Issue: Getting trunk out the door
>>
>> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>>
>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>>
>> It seems to me that I could go
>>
>> git checkout trunk
>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>>
>> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>>
>> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>>
>> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>>
>> Comments?
>>
>> -Steve
>>
>>
>>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Arun Murthy <ac...@hortonworks.com>.

Colin,

 Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?

Arun

________________________________________
From: Colin P. McCabe <cm...@apache.org>
Sent: Monday, March 09, 2015 3:05 PM
To: hdfs-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>
> Comments?
>
> -Steve
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Allen Wittenauer <aw...@altiscale.com>.


	Between this and the other thread, I’m seeing:

	* companies that were forced to make internal forks because their patches were ignored are now considered the deciders for whether we move forward
	* 5 years since the last branch off of trunk is considered ‘soon’
	* More good reasons to kill hadoop 2.7 and release hadoop 3.0 as the JDK7 release
	* We are now OPENLY hostile to operations teams
	* No one seems to really care that we’re about to create an absolute nightmare for anyone that uses maven repos, as they’ll need to keep track of which jars have been compiled with which JVM with zero hints from our build artifacts



On Mar 9, 2015, at 4:18 PM, Steve Loughran <st...@hortonworks.com> wrote:

> 
> 
> On 09/03/2015 15:56, "Andrew Wang" <an...@cloudera.com> wrote:
> 
>> I find this proposal very surprising. We've intentionally deferred
>> incompatible changes to trunk, because they are incompatible and do not
>> belong in a minor release. Now we are supposed to blur our eyes and
>> release
>> these changes anyway? I don't see this ending well.
> 
> I'm staring at CHANGES.TXT & thinking 'how can we ship something off trunk
> that has as many of these as we can get out ‹especially those shell script
> bits‹ in a way that doesn't break everything. Because there's a lot of
> improvements and bug fixes there which aren't going to be anyone's hands
> for a long time otherwise, not just due to any proposed 3.x release
> schedule, but because of the java 8 requirements as well as classloader
> stuff.
> 
> 
> 
>> 
>> One higher-level goal we should be working towards is tightening our
>> compatibility guarantees, not loosening them. This is why I've been
>> highlighting classpath isolation as a 3.0 feature, since this is one of
>> the
>> biggest issues faced by our users and downstreams. I think a 3.0 with an
>> improved compatibility story will make operators and downstreams much
>> happier than releasing trunk as 2.8.
>> 
>> Best,
>> Andrew
> 
> 
> I still want to see what's being proposed here. Having classpath isolation
> will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
> every app that imports hadoop-hdfs-client and say "your code just broke",
> not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
> Which, given that Java 7 is still something cluster ops teams are coming
> to terms with, is going to be a while
> 
> 
>> 
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Allen Wittenauer <aw...@altiscale.com>.


	Between this and the other thread, I’m seeing:

	* companies that were forced to make internal forks because their patches were ignored are now considered the deciders for whether we move forward
	* 5 years since the last branch off of trunk is considered ‘soon’
	* More good reasons to kill hadoop 2.7 and release hadoop 3.0 as the JDK7 release
	* We are now OPENLY hostile to operations teams
	* No one seems to really care that we’re about to create an absolute nightmare for anyone that uses maven repos, as they’ll need to keep track of which jars have been compiled with which JVM with zero hints from our build artifacts



On Mar 9, 2015, at 4:18 PM, Steve Loughran <st...@hortonworks.com> wrote:

> 
> 
> On 09/03/2015 15:56, "Andrew Wang" <an...@cloudera.com> wrote:
> 
>> I find this proposal very surprising. We've intentionally deferred
>> incompatible changes to trunk, because they are incompatible and do not
>> belong in a minor release. Now we are supposed to blur our eyes and
>> release
>> these changes anyway? I don't see this ending well.
> 
> I'm staring at CHANGES.TXT & thinking 'how can we ship something off trunk
> that has as many of these as we can get out ‹especially those shell script
> bits‹ in a way that doesn't break everything. Because there's a lot of
> improvements and bug fixes there which aren't going to be anyone's hands
> for a long time otherwise, not just due to any proposed 3.x release
> schedule, but because of the java 8 requirements as well as classloader
> stuff.
> 
> 
> 
>> 
>> One higher-level goal we should be working towards is tightening our
>> compatibility guarantees, not loosening them. This is why I've been
>> highlighting classpath isolation as a 3.0 feature, since this is one of
>> the
>> biggest issues faced by our users and downstreams. I think a 3.0 with an
>> improved compatibility story will make operators and downstreams much
>> happier than releasing trunk as 2.8.
>> 
>> Best,
>> Andrew
> 
> 
> I still want to see what's being proposed here. Having classpath isolation
> will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
> every app that imports hadoop-hdfs-client and say "your code just broke",
> not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
> Which, given that Java 7 is still something cluster ops teams are coming
> to terms with, is going to be a while
> 
> 
>> 
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Allen Wittenauer <aw...@altiscale.com>.


	Between this and the other thread, I’m seeing:

	* companies that were forced to make internal forks because their patches were ignored are now considered the deciders for whether we move forward
	* 5 years since the last branch off of trunk is considered ‘soon’
	* More good reasons to kill hadoop 2.7 and release hadoop 3.0 as the JDK7 release
	* We are now OPENLY hostile to operations teams
	* No one seems to really care that we’re about to create an absolute nightmare for anyone that uses maven repos, as they’ll need to keep track of which jars have been compiled with which JVM with zero hints from our build artifacts



On Mar 9, 2015, at 4:18 PM, Steve Loughran <st...@hortonworks.com> wrote:

> 
> 
> On 09/03/2015 15:56, "Andrew Wang" <an...@cloudera.com> wrote:
> 
>> I find this proposal very surprising. We've intentionally deferred
>> incompatible changes to trunk, because they are incompatible and do not
>> belong in a minor release. Now we are supposed to blur our eyes and
>> release
>> these changes anyway? I don't see this ending well.
> 
> I'm staring at CHANGES.TXT & thinking 'how can we ship something off trunk
> that has as many of these as we can get out ‹especially those shell script
> bits‹ in a way that doesn't break everything. Because there's a lot of
> improvements and bug fixes there which aren't going to be anyone's hands
> for a long time otherwise, not just due to any proposed 3.x release
> schedule, but because of the java 8 requirements as well as classloader
> stuff.
> 
> 
> 
>> 
>> One higher-level goal we should be working towards is tightening our
>> compatibility guarantees, not loosening them. This is why I've been
>> highlighting classpath isolation as a 3.0 feature, since this is one of
>> the
>> biggest issues faced by our users and downstreams. I think a 3.0 with an
>> improved compatibility story will make operators and downstreams much
>> happier than releasing trunk as 2.8.
>> 
>> Best,
>> Andrew
> 
> 
> I still want to see what's being proposed here. Having classpath isolation
> will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
> every app that imports hadoop-hdfs-client and say "your code just broke",
> not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
> Which, given that Java 7 is still something cluster ops teams are coming
> to terms with, is going to be a while
> 
> 
>> 
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Allen Wittenauer <aw...@altiscale.com>.


	Between this and the other thread, I’m seeing:

	* companies that were forced to make internal forks because their patches were ignored are now considered the deciders for whether we move forward
	* 5 years since the last branch off of trunk is considered ‘soon’
	* More good reasons to kill hadoop 2.7 and release hadoop 3.0 as the JDK7 release
	* We are now OPENLY hostile to operations teams
	* No one seems to really care that we’re about to create an absolute nightmare for anyone that uses maven repos, as they’ll need to keep track of which jars have been compiled with which JVM with zero hints from our build artifacts



On Mar 9, 2015, at 4:18 PM, Steve Loughran <st...@hortonworks.com> wrote:

> 
> 
> On 09/03/2015 15:56, "Andrew Wang" <an...@cloudera.com> wrote:
> 
>> I find this proposal very surprising. We've intentionally deferred
>> incompatible changes to trunk, because they are incompatible and do not
>> belong in a minor release. Now we are supposed to blur our eyes and
>> release
>> these changes anyway? I don't see this ending well.
> 
> I'm staring at CHANGES.TXT & thinking 'how can we ship something off trunk
> that has as many of these as we can get out ‹especially those shell script
> bits‹ in a way that doesn't break everything. Because there's a lot of
> improvements and bug fixes there which aren't going to be anyone's hands
> for a long time otherwise, not just due to any proposed 3.x release
> schedule, but because of the java 8 requirements as well as classloader
> stuff.
> 
> 
> 
>> 
>> One higher-level goal we should be working towards is tightening our
>> compatibility guarantees, not loosening them. This is why I've been
>> highlighting classpath isolation as a 3.0 feature, since this is one of
>> the
>> biggest issues faced by our users and downstreams. I think a 3.0 with an
>> improved compatibility story will make operators and downstreams much
>> happier than releasing trunk as 2.8.
>> 
>> Best,
>> Andrew
> 
> 
> I still want to see what's being proposed here. Having classpath isolation
> will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
> every app that imports hadoop-hdfs-client and say "your code just broke",
> not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
> Which, given that Java 7 is still something cluster ops teams are coming
> to terms with, is going to be a while
> 
> 
>> 
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Steve Loughran <st...@hortonworks.com>.

On 09/03/2015 15:56, "Andrew Wang" <an...@cloudera.com> wrote:

>I find this proposal very surprising. We've intentionally deferred
>incompatible changes to trunk, because they are incompatible and do not
>belong in a minor release. Now we are supposed to blur our eyes and
>release
>these changes anyway? I don't see this ending well.

I'm staring at CHANGES.TXT & thinking 'how can we ship something off trunk
that has as many of these as we can get out ‹especially those shell script
bits‹ in a way that doesn't break everything. Because there's a lot of
improvements and bug fixes there which aren't going to be anyone's hands
for a long time otherwise, not just due to any proposed 3.x release
schedule, but because of the java 8 requirements as well as classloader
stuff.

>
>One higher-level goal we should be working towards is tightening our
>compatibility guarantees, not loosening them. This is why I've been
>highlighting classpath isolation as a 3.0 feature, since this is one of
>the
>biggest issues faced by our users and downstreams. I think a 3.0 with an
>improved compatibility story will make operators and downstreams much
>happier than releasing trunk as 2.8.
>
>Best,
>Andrew

I still want to see what's being proposed here. Having classpath isolation
will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
every app that imports hadoop-hdfs-client and say "your code just broke",
not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
Which, given that Java 7 is still something cluster ops teams are coming
to terms with, is going to be a while

>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Steve Loughran <st...@hortonworks.com>.

On 09/03/2015 15:56, "Andrew Wang" <an...@cloudera.com> wrote:

>I find this proposal very surprising. We've intentionally deferred
>incompatible changes to trunk, because they are incompatible and do not
>belong in a minor release. Now we are supposed to blur our eyes and
>release
>these changes anyway? I don't see this ending well.

I'm staring at CHANGES.TXT & thinking 'how can we ship something off trunk
that has as many of these as we can get out ‹especially those shell script
bits‹ in a way that doesn't break everything. Because there's a lot of
improvements and bug fixes there which aren't going to be anyone's hands
for a long time otherwise, not just due to any proposed 3.x release
schedule, but because of the java 8 requirements as well as classloader
stuff.

>
>One higher-level goal we should be working towards is tightening our
>compatibility guarantees, not loosening them. This is why I've been
>highlighting classpath isolation as a 3.0 feature, since this is one of
>the
>biggest issues faced by our users and downstreams. I think a 3.0 with an
>improved compatibility story will make operators and downstreams much
>happier than releasing trunk as 2.8.
>
>Best,
>Andrew

I still want to see what's being proposed here. Having classpath isolation
will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
every app that imports hadoop-hdfs-client and say "your code just broke",
not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
Which, given that Java 7 is still something cluster ops teams are coming
to terms with, is going to be a while

>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Steve Loughran <st...@hortonworks.com>.

On 09/03/2015 15:56, "Andrew Wang" <an...@cloudera.com> wrote:

>I find this proposal very surprising. We've intentionally deferred
>incompatible changes to trunk, because they are incompatible and do not
>belong in a minor release. Now we are supposed to blur our eyes and
>release
>these changes anyway? I don't see this ending well.

I'm staring at CHANGES.TXT & thinking 'how can we ship something off trunk
that has as many of these as we can get out ‹especially those shell script
bits‹ in a way that doesn't break everything. Because there's a lot of
improvements and bug fixes there which aren't going to be anyone's hands
for a long time otherwise, not just due to any proposed 3.x release
schedule, but because of the java 8 requirements as well as classloader
stuff.

>
>One higher-level goal we should be working towards is tightening our
>compatibility guarantees, not loosening them. This is why I've been
>highlighting classpath isolation as a 3.0 feature, since this is one of
>the
>biggest issues faced by our users and downstreams. I think a 3.0 with an
>improved compatibility story will make operators and downstreams much
>happier than releasing trunk as 2.8.
>
>Best,
>Andrew

I still want to see what's being proposed here. Having classpath isolation
will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
every app that imports hadoop-hdfs-client and say "your code just broke",
not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
Which, given that Java 7 is still something cluster ops teams are coming
to terms with, is going to be a while

>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Steve Loughran <st...@hortonworks.com>.

On 09/03/2015 15:56, "Andrew Wang" <an...@cloudera.com> wrote:

>I find this proposal very surprising. We've intentionally deferred
>incompatible changes to trunk, because they are incompatible and do not
>belong in a minor release. Now we are supposed to blur our eyes and
>release
>these changes anyway? I don't see this ending well.

I'm staring at CHANGES.TXT & thinking 'how can we ship something off trunk
that has as many of these as we can get out ‹especially those shell script
bits‹ in a way that doesn't break everything. Because there's a lot of
improvements and bug fixes there which aren't going to be anyone's hands
for a long time otherwise, not just due to any proposed 3.x release
schedule, but because of the java 8 requirements as well as classloader
stuff.

>
>One higher-level goal we should be working towards is tightening our
>compatibility guarantees, not loosening them. This is why I've been
>highlighting classpath isolation as a 3.0 feature, since this is one of
>the
>biggest issues faced by our users and downstreams. I think a 3.0 with an
>improved compatibility story will make operators and downstreams much
>happier than releasing trunk as 2.8.
>
>Best,
>Andrew

I still want to see what's being proposed here. Having classpath isolation
will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
every app that imports hadoop-hdfs-client and say "your code just broke",
not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
Which, given that Java 7 is still something cluster ops teams are coming
to terms with, is going to be a while

>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Andrew Wang <an...@cloudera.com>.

I find this proposal very surprising. We've intentionally deferred
incompatible changes to trunk, because they are incompatible and do not
belong in a minor release. Now we are supposed to blur our eyes and release
these changes anyway? I don't see this ending well.

One higher-level goal we should be working towards is tightening our
compatibility guarantees, not loosening them. This is why I've been
highlighting classpath isolation as a 3.0 feature, since this is one of the
biggest issues faced by our users and downstreams. I think a 3.0 with an
improved compatibility story will make operators and downstreams much
happier than releasing trunk as 2.8.

Best,
Andrew

On Mon, Mar 9, 2015 at 3:05 PM, Colin P. McCabe <cm...@apache.org> wrote:

> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
>
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
>
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
>
> best,
> Colin
>
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
> >
> > If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
> >
> > Issue: JDK 8 vs 7
> >
> > It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
> >
> > You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
> >
> > What we can't do in hadoop coretoday  is set javac.version=1.8 & use
> java 8 code. Downstream code ca do that (Hive, etc); they just need to
> accept that they don't get to play on JDK7 clusters if they embrace
> l-expressions.
> >
> > So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
> >
> > Issue: Incompatible changes
> >
> > Without knowing what is proposed for "an incompatible classpath change",
> I can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
> >
> > Issue: Getting trunk out the door
> >
> > The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
> >
> > Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
> >
> > It seems to me that I could go
> >
> > git checkout trunk
> >         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
> >
> > We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
> >
> > A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
> >
> > This lets us tick off the "recent trunk release" and "fixed shell
> scripts" items, pushing out those benefits to people sooner rather than
> later, and puts off the "Hello, we've just broken your code" event for
> another 12+ months.
> >
> > Comments?
> >
> > -Steve
> >
> >
> >
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Arun Murthy <ac...@hortonworks.com>.

Colin,

 Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?

Arun

________________________________________
From: Colin P. McCabe <cm...@apache.org>
Sent: Monday, March 09, 2015 3:05 PM
To: hdfs-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>
> Comments?
>
> -Steve
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Arun Murthy <ac...@hortonworks.com>.

Colin,

 Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?

Arun

________________________________________
From: Colin P. McCabe <cm...@apache.org>
Sent: Monday, March 09, 2015 3:05 PM
To: hdfs-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>
> Comments?
>
> -Steve
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Andrew Wang <an...@cloudera.com>.

I find this proposal very surprising. We've intentionally deferred
incompatible changes to trunk, because they are incompatible and do not
belong in a minor release. Now we are supposed to blur our eyes and release
these changes anyway? I don't see this ending well.

One higher-level goal we should be working towards is tightening our
compatibility guarantees, not loosening them. This is why I've been
highlighting classpath isolation as a 3.0 feature, since this is one of the
biggest issues faced by our users and downstreams. I think a 3.0 with an
improved compatibility story will make operators and downstreams much
happier than releasing trunk as 2.8.

Best,
Andrew

On Mon, Mar 9, 2015 at 3:05 PM, Colin P. McCabe <cm...@apache.org> wrote:

> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
>
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
>
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
>
> best,
> Colin
>
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
> >
> > If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
> >
> > Issue: JDK 8 vs 7
> >
> > It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
> >
> > You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
> >
> > What we can't do in hadoop coretoday  is set javac.version=1.8 & use
> java 8 code. Downstream code ca do that (Hive, etc); they just need to
> accept that they don't get to play on JDK7 clusters if they embrace
> l-expressions.
> >
> > So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
> >
> > Issue: Incompatible changes
> >
> > Without knowing what is proposed for "an incompatible classpath change",
> I can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
> >
> > Issue: Getting trunk out the door
> >
> > The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
> >
> > Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
> >
> > It seems to me that I could go
> >
> > git checkout trunk
> >         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
> >
> > We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
> >
> > A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
> >
> > This lets us tick off the "recent trunk release" and "fixed shell
> scripts" items, pushing out those benefits to people sooner rather than
> later, and puts off the "Hello, we've just broken your code" event for
> another 12+ months.
> >
> > Comments?
> >
> > -Steve
> >
> >
> >
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Andrew Wang <an...@cloudera.com>.

I find this proposal very surprising. We've intentionally deferred
incompatible changes to trunk, because they are incompatible and do not
belong in a minor release. Now we are supposed to blur our eyes and release
these changes anyway? I don't see this ending well.

One higher-level goal we should be working towards is tightening our
compatibility guarantees, not loosening them. This is why I've been
highlighting classpath isolation as a 3.0 feature, since this is one of the
biggest issues faced by our users and downstreams. I think a 3.0 with an
improved compatibility story will make operators and downstreams much
happier than releasing trunk as 2.8.

Best,
Andrew

On Mon, Mar 9, 2015 at 3:05 PM, Colin P. McCabe <cm...@apache.org> wrote:

> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
>
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
>
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
>
> best,
> Colin
>
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
> >
> > If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
> >
> > Issue: JDK 8 vs 7
> >
> > It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
> >
> > You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
> >
> > What we can't do in hadoop coretoday  is set javac.version=1.8 & use
> java 8 code. Downstream code ca do that (Hive, etc); they just need to
> accept that they don't get to play on JDK7 clusters if they embrace
> l-expressions.
> >
> > So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
> >
> > Issue: Incompatible changes
> >
> > Without knowing what is proposed for "an incompatible classpath change",
> I can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
> >
> > Issue: Getting trunk out the door
> >
> > The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
> >
> > Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
> >
> > It seems to me that I could go
> >
> > git checkout trunk
> >         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
> >
> > We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
> >
> > A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
> >
> > This lets us tick off the "recent trunk release" and "fixed shell
> scripts" items, pushing out those benefits to people sooner rather than
> later, and puts off the "Hello, we've just broken your code" event for
> another 12+ months.
> >
> > Comments?
> >
> > -Steve
> >
> >
> >
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Arun Murthy <ac...@hortonworks.com>.

Colin,

 Do you have a list of incompatible changes other than the shell-script rewrite? If we do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So, I don't see the difference?

Arun

________________________________________
From: Colin P. McCabe <cm...@apache.org>
Sent: Monday, March 09, 2015 3:05 PM
To: hdfs-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>
> Comments?
>
> -Steve
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Andrew Wang <an...@cloudera.com>.

I find this proposal very surprising. We've intentionally deferred
incompatible changes to trunk, because they are incompatible and do not
belong in a minor release. Now we are supposed to blur our eyes and release
these changes anyway? I don't see this ending well.

One higher-level goal we should be working towards is tightening our
compatibility guarantees, not loosening them. This is why I've been
highlighting classpath isolation as a 3.0 feature, since this is one of the
biggest issues faced by our users and downstreams. I think a 3.0 with an
improved compatibility story will make operators and downstreams much
happier than releasing trunk as 2.8.

Best,
Andrew

On Mon, Mar 9, 2015 at 3:05 PM, Colin P. McCabe <cm...@apache.org> wrote:

> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
>
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
>
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
>
> best,
> Colin
>
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
> >
> > If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
> >
> > Issue: JDK 8 vs 7
> >
> > It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
> >
> > You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
> >
> > What we can't do in hadoop coretoday  is set javac.version=1.8 & use
> java 8 code. Downstream code ca do that (Hive, etc); they just need to
> accept that they don't get to play on JDK7 clusters if they embrace
> l-expressions.
> >
> > So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
> >
> > Issue: Incompatible changes
> >
> > Without knowing what is proposed for "an incompatible classpath change",
> I can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
> >
> > Issue: Getting trunk out the door
> >
> > The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
> >
> > Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
> >
> > It seems to me that I could go
> >
> > git checkout trunk
> >         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
> >
> > We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
> >
> > A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
> >
> > This lets us tick off the "recent trunk release" and "fixed shell
> scripts" items, pushing out those benefits to people sooner rather than
> later, and puts off the "Hello, we've just broken your code" event for
> another 12+ months.
> >
> > Comments?
> >
> > -Steve
> >
> >
> >
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>
> Comments?
>
> -Steve
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>
> Comments?
>
> -Steve
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>
> Comments?
>
> -Steve
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Steve Loughran <st...@hortonworks.com>.

On 10/03/2015 13:39, "Allen Wittenauer" <aw...@altiscale.com> wrote:

>	"Currently, there is NO policy on when Hadoop's dependencies can change.²

I seem to recall that was one of mine

>
>	But it is heavily implied that this is a bad thing to do:

Everybody hates us. More specifically, bumping up guava is more traumatic
than Java 6-> 7, as we're reasonably confident that JDK7 is a proper
superset of JDK6, whereas we know that doesn't hold for Guava 11->15.

Better compatibility policy: "we regret having Guava on our classpath"

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Allen Wittenauer <aw...@altiscale.com>.

On Mar 10, 2015, at 12:40 PM, Karthik Kambatla <ka...@cloudera.com> wrote:

> 
> Are we okay with breaking other forms of compatibility for Hadoop-3, like
> behavior, dependencies, JDK, classpath, environment? I think so. Are we
> okay with breaking these forms of compatibility in future Hadoop-2.x?
> Likely not. Does our compatibility policy allow these changes in 2.x?
> Mostly yes, but that is because we don't have policies for a lot of these
> things that affect end-users.

	I’d disagree with that last statement.  The compatibility guarantees in Compatibility.md covers all of these examples. 

Changing the JDK:
	* Build Artifacts
	* Hardware/Software Requirements
	* Hadoop ABI

API compatibility:
	* Java API
	* Build artifacts
	* Hadoop ABI

Wire compatibility violations:
	* Wire compatibility
	* Hadoop ABI

Environment:
	* Depends upon what is meant by that, but it’s pretty much all of the above, plus CLI, env var, etc.


	All of these are very clear that this stuff should change in a major version only in order not to disrupt our users.  The only one we can change are dependencies, covered under class path:

	"Currently, there is NO policy on when Hadoop's dependencies can change.”

	But it is heavily implied that this is a bad thing to do:

"Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' class paths."

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Allen Wittenauer <aw...@altiscale.com>.

On Mar 10, 2015, at 12:40 PM, Karthik Kambatla <ka...@cloudera.com> wrote:

> 
> Are we okay with breaking other forms of compatibility for Hadoop-3, like
> behavior, dependencies, JDK, classpath, environment? I think so. Are we
> okay with breaking these forms of compatibility in future Hadoop-2.x?
> Likely not. Does our compatibility policy allow these changes in 2.x?
> Mostly yes, but that is because we don't have policies for a lot of these
> things that affect end-users.

	I’d disagree with that last statement.  The compatibility guarantees in Compatibility.md covers all of these examples. 

Changing the JDK:
	* Build Artifacts
	* Hardware/Software Requirements
	* Hadoop ABI

API compatibility:
	* Java API
	* Build artifacts
	* Hadoop ABI

Wire compatibility violations:
	* Wire compatibility
	* Hadoop ABI

Environment:
	* Depends upon what is meant by that, but it’s pretty much all of the above, plus CLI, env var, etc.


	All of these are very clear that this stuff should change in a major version only in order not to disrupt our users.  The only one we can change are dependencies, covered under class path:

	"Currently, there is NO policy on when Hadoop's dependencies can change.”

	But it is heavily implied that this is a bad thing to do:

"Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' class paths."

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Allen Wittenauer <aw...@altiscale.com>.

On Mar 10, 2015, at 12:40 PM, Karthik Kambatla <ka...@cloudera.com> wrote:

> 
> Are we okay with breaking other forms of compatibility for Hadoop-3, like
> behavior, dependencies, JDK, classpath, environment? I think so. Are we
> okay with breaking these forms of compatibility in future Hadoop-2.x?
> Likely not. Does our compatibility policy allow these changes in 2.x?
> Mostly yes, but that is because we don't have policies for a lot of these
> things that affect end-users.

	I’d disagree with that last statement.  The compatibility guarantees in Compatibility.md covers all of these examples. 

Changing the JDK:
	* Build Artifacts
	* Hardware/Software Requirements
	* Hadoop ABI

API compatibility:
	* Java API
	* Build artifacts
	* Hadoop ABI

Wire compatibility violations:
	* Wire compatibility
	* Hadoop ABI

Environment:
	* Depends upon what is meant by that, but it’s pretty much all of the above, plus CLI, env var, etc.


	All of these are very clear that this stuff should change in a major version only in order not to disrupt our users.  The only one we can change are dependencies, covered under class path:

	"Currently, there is NO policy on when Hadoop's dependencies can change.”

	But it is heavily implied that this is a bad thing to do:

"Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' class paths."

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Allen Wittenauer <aw...@altiscale.com>.

On Mar 10, 2015, at 12:40 PM, Karthik Kambatla <ka...@cloudera.com> wrote:

> 
> Are we okay with breaking other forms of compatibility for Hadoop-3, like
> behavior, dependencies, JDK, classpath, environment? I think so. Are we
> okay with breaking these forms of compatibility in future Hadoop-2.x?
> Likely not. Does our compatibility policy allow these changes in 2.x?
> Mostly yes, but that is because we don't have policies for a lot of these
> things that affect end-users.

	I’d disagree with that last statement.  The compatibility guarantees in Compatibility.md covers all of these examples. 

Changing the JDK:
	* Build Artifacts
	* Hardware/Software Requirements
	* Hadoop ABI

API compatibility:
	* Java API
	* Build artifacts
	* Hadoop ABI

Wire compatibility violations:
	* Wire compatibility
	* Hadoop ABI

Environment:
	* Depends upon what is meant by that, but it’s pretty much all of the above, plus CLI, env var, etc.


	All of these are very clear that this stuff should change in a major version only in order not to disrupt our users.  The only one we can change are dependencies, covered under class path:

	"Currently, there is NO policy on when Hadoop's dependencies can change.”

	But it is heavily implied that this is a bad thing to do:

"Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' class paths."

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Karthik Kambatla <ka...@cloudera.com>.

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java
> 8 code. Downstream code ca do that (Hive, etc); they just need to accept
> that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I
> can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
>

We should probably start qualifying the word incompatible more often.

Are we okay with an API incompatible Hadoop-3? No.

Are we okay with an wire-incompatible Hadoop-3? Likely not.

Are we okay with breaking other forms of compatibility for Hadoop-3, like
behavior, dependencies, JDK, classpath, environment? I think so. Are we
okay with breaking these forms of compatibility in future Hadoop-2.x?
Likely not. Does our compatibility policy allow these changes in 2.x?
Mostly yes, but that is because we don't have policies for a lot of these
things that affect end-users. The reason we don't have a policy, IMO, is a
combination of (1) we haven't spent enough time thinking about them, (2)
without things like classpath isolation, we end up tying developers' hands
if we don't let them change the dependencies. I propose we update our
compat guidelines to be stricter, and do whatever is required to get there.
Is it okay to change our compat guidelines incompatibly? May be, it
warrants a Hadoop-3? I don't know yet.

And, some other policies like bumping min JDK requirement are allowed in
minor releases. Users might be okay with certain JDK bumps (6 to 7, since
no one seems to be using 6 anymore), but users most definitely care about
some other bumps (7 - 8). If we want to remove this subjective evaluation,
I am open to requiring a major version for JDK upgrades (not support, but
language features) even if it meant we have to wait until 3.0 for JDK
upgrade.

>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts"
> items, pushing out those benefits to people sooner rather than later, and
> puts off the "Hello, we've just broken your code" event for another 12+
> months.
>
> Comments?
>
> -Steve
>
>
>
>

-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Arun Murthy <ac...@hortonworks.com>.

Steve,

________________________________________
From: Steve Loughran <st...@hortonworks.com>
Sent: Monday, March 09, 2015 2:15 PM
To: mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Issue: Getting trunk out the door

The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary

Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.

It seems to me that I could go

git checkout trunk
        mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.

A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team

This seems like a great idea, something I hadn't considered before since most patches were flowing into branch-2 anyway - makes a lot of sense.

We could just drop branch-2 while we are at it too. It's just a pain to maintain an extra branch. Also, we should formalize that major features should always come via feature branches - allows for some oversight on compatibility etc. as a whole (not piecemeal) when the feature branch is merged.

In particular, let's also make sure we ship the script changes in a compatible manner. Happy to help.

Given that Vinod has stepped up for 2.7, would you like to drive 2.8? 

Practically, this is reality already, but something to formalize: having RMs per dot release (Karthik for 2.5, Vinod for 2.7,  Steve for 2.8 etc.).

thanks,
Arun

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Karthik Kambatla <ka...@cloudera.com>.

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java
> 8 code. Downstream code ca do that (Hive, etc); they just need to accept
> that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I
> can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
>

We should probably start qualifying the word incompatible more often.

Are we okay with an API incompatible Hadoop-3? No.

Are we okay with an wire-incompatible Hadoop-3? Likely not.

Are we okay with breaking other forms of compatibility for Hadoop-3, like
behavior, dependencies, JDK, classpath, environment? I think so. Are we
okay with breaking these forms of compatibility in future Hadoop-2.x?
Likely not. Does our compatibility policy allow these changes in 2.x?
Mostly yes, but that is because we don't have policies for a lot of these
things that affect end-users. The reason we don't have a policy, IMO, is a
combination of (1) we haven't spent enough time thinking about them, (2)
without things like classpath isolation, we end up tying developers' hands
if we don't let them change the dependencies. I propose we update our
compat guidelines to be stricter, and do whatever is required to get there.
Is it okay to change our compat guidelines incompatibly? May be, it
warrants a Hadoop-3? I don't know yet.

And, some other policies like bumping min JDK requirement are allowed in
minor releases. Users might be okay with certain JDK bumps (6 to 7, since
no one seems to be using 6 anymore), but users most definitely care about
some other bumps (7 - 8). If we want to remove this subjective evaluation,
I am open to requiring a major version for JDK upgrades (not support, but
language features) even if it meant we have to wait until 3.0 for JDK
upgrade.

>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts"
> items, pushing out those benefits to people sooner rather than later, and
> puts off the "Hello, we've just broken your code" event for another 12+
> months.
>
> Comments?
>
> -Steve
>
>
>
>

-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Karthik Kambatla <ka...@cloudera.com>.

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java
> 8 code. Downstream code ca do that (Hive, etc); they just need to accept
> that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I
> can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
>

We should probably start qualifying the word incompatible more often.

Are we okay with an API incompatible Hadoop-3? No.

Are we okay with an wire-incompatible Hadoop-3? Likely not.

Are we okay with breaking other forms of compatibility for Hadoop-3, like
behavior, dependencies, JDK, classpath, environment? I think so. Are we
okay with breaking these forms of compatibility in future Hadoop-2.x?
Likely not. Does our compatibility policy allow these changes in 2.x?
Mostly yes, but that is because we don't have policies for a lot of these
things that affect end-users. The reason we don't have a policy, IMO, is a
combination of (1) we haven't spent enough time thinking about them, (2)
without things like classpath isolation, we end up tying developers' hands
if we don't let them change the dependencies. I propose we update our
compat guidelines to be stricter, and do whatever is required to get there.
Is it okay to change our compat guidelines incompatibly? May be, it
warrants a Hadoop-3? I don't know yet.

And, some other policies like bumping min JDK requirement are allowed in
minor releases. Users might be okay with certain JDK bumps (6 to 7, since
no one seems to be using 6 anymore), but users most definitely care about
some other bumps (7 - 8). If we want to remove this subjective evaluation,
I am open to requiring a major version for JDK upgrades (not support, but
language features) even if it meant we have to wait until 3.0 for JDK
upgrade.

>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts"
> items, pushing out those benefits to people sooner rather than later, and
> puts off the "Hello, we've just broken your code" event for another 12+
> months.
>
> Comments?
>
> -Steve
>
>
>
>

-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Sagar Thacker <sa...@gmail.com>.

Hello all, please remove me from the message thread. I ave stopped working
on hadoop. Thank you
On Mar 10, 2015 2:45 AM, "Steve Loughran" <st...@hortonworks.com> wrote:

>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java
> 8 code. Downstream code ca do that (Hive, etc); they just need to accept
> that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I
> can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts"
> items, pushing out those benefits to people sooner rather than later, and
> puts off the "Hello, we've just broken your code" event for another 12+
> months.
>
> Comments?
>
> -Steve
>
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Arun Murthy <ac...@hortonworks.com>.

Steve,

________________________________________
From: Steve Loughran <st...@hortonworks.com>
Sent: Monday, March 09, 2015 2:15 PM
To: mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Issue: Getting trunk out the door

The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary

Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.

It seems to me that I could go

git checkout trunk
        mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.

A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team

This seems like a great idea, something I hadn't considered before since most patches were flowing into branch-2 anyway - makes a lot of sense.

We could just drop branch-2 while we are at it too. It's just a pain to maintain an extra branch. Also, we should formalize that major features should always come via feature branches - allows for some oversight on compatibility etc. as a whole (not piecemeal) when the feature branch is merged.

In particular, let's also make sure we ship the script changes in a compatible manner. Happy to help.

Given that Vinod has stepped up for 2.7, would you like to drive 2.8? 

Practically, this is reality already, but something to formalize: having RMs per dot release (Karthik for 2.5, Vinod for 2.7,  Steve for 2.8 etc.).

thanks,
Arun

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by "Colin P. McCabe" <cm...@apache.org>.

Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, "rewrite your code" event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've just broken your code" event for another 12+ months.
>
> Comments?
>
> -Steve
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Arun Murthy <ac...@hortonworks.com>.

Steve,

________________________________________
From: Steve Loughran <st...@hortonworks.com>
Sent: Monday, March 09, 2015 2:15 PM
To: mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Issue: Getting trunk out the door

The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop & other downstream hadoop stacks, but developers don't need to worry about this:  no recompilation necessary

Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.

It seems to me that I could go

git checkout trunk
        mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code & JDK7+ clusters.

A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules & generally hate the hadoop dev team

This seems like a great idea, something I hadn't considered before since most patches were flowing into branch-2 anyway - makes a lot of sense.

We could just drop branch-2 while we are at it too. It's just a pain to maintain an extra branch. Also, we should formalize that major features should always come via feature branches - allows for some oversight on compatibility etc. as a whole (not piecemeal) when the feature branch is merged.

In particular, let's also make sure we ship the script changes in a compatible manner. Happy to help.

Given that Vinod has stepped up for 2.7, would you like to drive 2.8? 

Practically, this is reality already, but something to formalize: having RMs per dot release (Karthik for 2.5, Vinod for 2.7,  Steve for 2.8 etc.).

thanks,
Arun

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Posted by Karthik Kambatla <ka...@cloudera.com>.

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java
> 8 code. Downstream code ca do that (Hive, etc); they just need to accept
> that they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I
> can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
>

We should probably start qualifying the word incompatible more often.

Are we okay with an API incompatible Hadoop-3? No.

Are we okay with an wire-incompatible Hadoop-3? Likely not.

Are we okay with breaking other forms of compatibility for Hadoop-3, like
behavior, dependencies, JDK, classpath, environment? I think so. Are we
okay with breaking these forms of compatibility in future Hadoop-2.x?
Likely not. Does our compatibility policy allow these changes in 2.x?
Mostly yes, but that is because we don't have policies for a lot of these
things that affect end-users. The reason we don't have a policy, IMO, is a
combination of (1) we haven't spent enough time thinking about them, (2)
without things like classpath isolation, we end up tying developers' hands
if we don't let them change the dependencies. I propose we update our
compat guidelines to be stricter, and do whatever is required to get there.
Is it okay to change our compat guidelines incompatibly? May be, it
warrants a Hadoop-3? I don't know yet.

And, some other policies like bumping min JDK requirement are allowed in
minor releases. Users might be okay with certain JDK bumps (6 to 7, since
no one seems to be using 6 anymore), but users most definitely care about
some other bumps (7 - 8). If we want to remove this subjective evaluation,
I am open to requiring a major version for JDK upgrades (not support, but
language features) even if it meant we have to wait until 3.0 for JDK
upgrade.

>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts"
> items, pushing out those benefits to people sooner rather than later, and
> puts off the "Hello, we've just broken your code" event for another 12+
> months.
>
> Comments?
>
> -Steve
>
>
>
>

-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es