You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by Hitesh Shah <hi...@apache.org> on 2015/02/26 20:03:58 UTC

[DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Hi folks, 

Chris raised a good point earlier in terms of publishing jars for use against different versions of hadoop. For the most part, I think we have done well to ensure that the user-facing modules are version agnostic but the same does not hold for other modules which are times are needed by other applications for testing.

There aren’t really too many different options we can try.  The simplest option I can think of is just to build tez against different versions of hadoop with the tez.version set to something along the lines of “tez.version-hadoop.version”. This would imply having tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of view, depending on the option we pick, users will need to switch their dependencies to point to an appropriate version based on what version of hadoop they are using. For apps such as hive and pig, they will need to manage picking a particular version of tez based on which hadoop profile they are building against. 

Any other suggestions for publishing version dependent jars?

For binary releases, should we release only the minimal tarball? or both the minimal and full tar balls? The full tarball is the recommended deployment model as it is more robust towards compatibility on a changing cluster. It should work in most scenarios as long as the hadoop client libraries that Tez depends on are compatible with the servers running on the cluster.

General questions for the community/past release managers: 
   - Should we retain the simple version ( i.e. plain only x.y.z ) when building against the default version of hadoop as determined by Tez? This “default.version” will have a tendency to evolve over time :) . These simple version jars would be in addition to the version specific jars. 
   - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6 or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version so we should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if 2.2.1 exists. 
   
Any other comments? 

thanks
— Hitesh



Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Posted by Andre Kelpe <ak...@concurrentinc.com>.
Not everybody is ready to move to a new Hadoop version every so often. As Chris already mentioned it is a good idea to keep artifact names stable and detect features at runtime. We are doing that in Cascading as well: We compile it against one version of Hadoop, but do everything we can to keep it compatible with older and newer releases (currently 9 releases): https://github.com/Cascading/cascading.compatibility. This is more work for us as an upstream, but makes the live of our users a lot easier.  Note that we do not publish a release per version, we ensure that the one release is binary compatible.

I believe Tez should provide a binary release that is tested and compatible with multiple versions of hadoop, instead of “compile your own”. While I understand that the ASF only demands source releases, I believe having binary releases, which are compatible with multiple versions of hadoop, will help with adoption, since it removes friction downstream.

- André



> On 08 Mar 2015, at 22:54, Bikas Saha <bi...@hortonworks.com> wrote:
> 
> As an aside, Flink could consider moving to a more current version. There have been many key improvements in Timeline Server, preemption, node labels, resource monitoring etc. that users may want to take advantage of.
> 
> If Tez publishes Hadoop version specific binaries to maven then Flink and others may be able to consume them directly during development.
> 
> Bikas
> 
> -----Original Message-----
> From: Robert Metzger [mailto:rmetzger@apache.org] 
> Sent: Sunday, March 08, 2015 6:40 AM
> To: dev@tez.apache.org
> Subject: Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies
> 
> Hi Hitesh,
> 
> I've talked about this with Kostas, let me check on some of our assumptions.
> 
> You can compile Flink against a hadoop1 and hadoop2 profile. We would include flink-on-tez only into our (default) hadoop2 profile.
> For that profile, we use Hadoop 2.2.0.
> 
> You can see on maven central, that we publish two versions of each flink module for each release, a 0.8.1-hadoop1 and a 0.8.1 version.
> This way users from both Hadoop APIs can use our system.
> 
> Adding Tez as a dependency to Flink (hadoop2) would cause a dependency conflict on the Hadoop version. Our parent pom enforces Hadoop 2.2.0 for all dependencies, so we force Tez to use Hadoop 2.2.0 as well.
> In my understanding the compilation fails in that case.
> 
> If there would be a Tez version compatible with Hadoop 2.2.0 in mvn central, we could add the "flink-on-tez" module to maven central.
> 
> If thats not possible, users who want to use Flink-on-Tez have to compile Flink against Hadoop 2.6.0 themselves. Its only one maven command, but less convenient than something on mvn central.
> 
> 
> On Fri, Mar 6, 2015 at 8:03 PM, Hitesh Shah <hi...@apache.org> wrote:
> 
>> Thanks for the feedback, Kostas,
>> 
>> One clarification though - are you saying Tez should publish jars to 
>> maven central built against different versions of Hadoop? If yes, is 
>> this mainly due to the hadoop dependencies that Tez pulls in or due to 
>> any incompatibilities that you have noticed?
>> 
>> thanks
>> — Hitesh
>> 
>> 
>> On Mar 6, 2015, at 9:03 AM, Kostas Tzoumas <kt...@apache.org> wrote:
>> 
>>> Publishing jars for different Hadoop dependencies, and in particular 
>>> for Hadoop 2.2 would also be beneficial for Flink on Tez as we offer 
>>> maven archetypes for users to create Flink applications.  Currently, 
>>> we need to ask users that want to run Flink apps with Tez as backend 
>>> to compile the Flink code themselves due to a Hadoop version mismatch.
>>> 
>>> 
>>> 
>>> On Thu, Mar 5, 2015 at 1:46 AM, Hitesh Shah <hi...@apache.org> wrote:
>>> 
>>>> From an ASF perspective, verifiable releases are only source releases.
>> The
>>>> binaries are just convenience artifacts that can also made 
>>>> available
>> with a
>>>> given release. Hence in terms of supporting multiple hadoop 
>>>> versions,
>> we do
>>>> want to allow various users/distros to compile Tez against their
>> particular
>>>> version of hadoop.
>>>> 
>>>> From a run-time point of view , if Tez compiled against hadoop-2.6 
>>>> is
>> run
>>>> on a 2.4 cluster, it should work normally as long as acls are 
>>>> disabled ( via tez config tez.am.acls.enabled ). That said, there 
>>>> are probably some improvements that could be done to handle the 
>>>> case where acls are
>> enabled
>>>> on a 2.4 cluster in a more cleaner manner.
>>>> 
>>>> thanks
>>>> — Hitesh
>>>> 
>>>> On Mar 4, 2015, at 9:21 AM, Chris K Wensel <ch...@wensel.net> wrote:
>>>> 
>>>>> compile what against hadoop 2.4? Tez? Hopefully no one except Tez 
>>>>> devs
>>>> ever compile Tez (once the apache committers offer up pre-built
>> binaries, I
>>>> only ever do for this reason).
>>>>> 
>>>>> if compiling application code against Tez and Hadoop 2.4, the jar 
>>>>> won't
>>>> come into play unless running tests (so i believe).
>>>>> 
>>>>> I would then enhance option two to gracefully fail if -acls (the
>>>> Manager) is not applicable (on hadoop 2.4) but mistakenly included 
>>>> in
>> the
>>>> 2.4 classpath (testing app code against hadoop 2.4)
>>>>> 
>>>>> of course then this is really option 1 now with two jars.
>>>>> 
>>>>> ckw
>>>>> 
>>>>>> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <hi...@apache.org> wrote:
>>>>>> 
>>>>>> Thanks for the suggestions, Chris. Filed TEZ-2168 for this.
>>>>>> 
>>>>>> At this point, I am inclined to follow option 2 mainly to retain 
>>>>>> the
>>>> ability for users to compile against hadoop 2.4. I am not sure if 
>>>> there
>> is
>>>> a simple and performant way ( without using reflection for all 2.6
>> specific
>>>> calls ) to retain compile compatibility with option 1.
>>>>>> 
>>>>>> Any other comments for other folks on this issue in general or on 
>>>>>> the
>> 2
>>>> options that Chris suggested?
>>>>>> 
>>>>>> thanks
>>>>>> — Hitesh
>>>>>> 
>>>>>> 
>>>>>> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <ch...@wensel.net> wrote:
>>>>>> 
>>>>>>> The immediate issue is having two mutually exclusive artifacts:
>>>> tez-yarn-timeline-history and tez-yarn-timeline-history
>>>>>>> 
>>>>>>> outside of ATSHistoryACLPolicyManager, the code is identical. 
>>>>>>> just
>> the
>>>> dependencies are changed.
>>>>>>> 
>>>>>>> TezClient attempts to load this Manager, under the assumption if 
>>>>>>> it
>>>> exists, it is running on hadoop 2.6. (running on 2.4 is fatal)
>>>>>>> 
>>>>>>> My recommendation would be never to change artifact names (or
>>>> conditionally choose them) inside of major releases, but accreting 
>>>> new, optional, ones as versions progress is fine.
>>>>>>> 
>>>>>>> thus I would either:
>>>>>>> 
>>>>>>> create a single artifact tez-yarn-timeline-history compiled with 
>>>>>>> a
>>>> default dep of hadoop 2.6, that includes the Manager. update the
>> TezClient
>>>> code to gracefully fail if the Manager is not applicable (the 
>>>> runtime
>> env
>>>> is Hadoop 2.4).
>>>>>>> 
>>>>>>> or
>>>>>>> 
>>>>>>> offer tez-yarn-timeline-history-with-acls as an optional 
>>>>>>> artifact for
>>>> Hadoop 2.6 deployments, with the single Manager class in it, which 
>>>> in
>> turn
>>>> requires the tez-yarn-timeline-history artifact -- which is 
>>>> sufficient
>> for
>>>> a 2.4 runtime. if the user provides the additional -with-acls 
>>>> artifact, they are knowingly going to have problems on Hadoop 2.4.
>>>>>>> 
>>>>>>> I prefer the first as it keeps my build file simple. graceful
>>>> degradation of services per environment (with appropriate logging) 
>>>> is a well accepted practice.
>>>>>>> 
>>>>>>> and you can now test Tez across multiple versions Hadoop/Yarn at
>>>> runtime (outside of compile time).
>>>>>>> 
>>>>>>> we do this with Cascading, just simple build file modifications 
>>>>>>> to
>>>> verify binary compatibility (vendors fork this repo to verify their 
>>>> distributions, and been known to find critical bugs):
>>>>>>> 
>>>>>>> https://github.com/Cascading/cascading.compatibility
>>>>>>> 
>>>>>>> ckw
>>>>>>> 
>>>>>>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hi...@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>> Hi folks,
>>>>>>>> 
>>>>>>>> Chris raised a good point earlier in terms of publishing jars 
>>>>>>>> for
>> use
>>>> against different versions of hadoop. For the most part, I think we 
>>>> have done well to ensure that the user-facing modules are version 
>>>> agnostic
>> but
>>>> the same does not hold for other modules which are times are needed 
>>>> by other applications for testing.
>>>>>>>> 
>>>>>>>> There aren’t really too many different options we can try.  The
>>>> simplest option I can think of is just to build tez against 
>>>> different versions of hadoop with the tez.version set to something 
>>>> along the
>> lines of
>>>> “tez.version-hadoop.version”. This would imply having
>>>> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability
>> point of
>>>> view, depending on the option we pick, users will need to switch 
>>>> their dependencies to point to an appropriate version based on what 
>>>> version of hadoop they are using. For apps such as hive and pig, 
>>>> they will need to manage picking a particular version of tez based 
>>>> on which hadoop profile they are building against.
>>>>>>>> 
>>>>>>>> Any other suggestions for publishing version dependent jars?
>>>>>>>> 
>>>>>>>> For binary releases, should we release only the minimal 
>>>>>>>> tarball? or
>>>> both the minimal and full tar balls? The full tarball is the 
>>>> recommended deployment model as it is more robust towards 
>>>> compatibility on a
>> changing
>>>> cluster. It should work in most scenarios as long as the hadoop 
>>>> client libraries that Tez depends on are compatible with the 
>>>> servers running on the cluster.
>>>>>>>> 
>>>>>>>> General questions for the community/past release managers:
>>>>>>>> - Should we retain the simple version ( i.e. plain only x.y.z ) 
>>>>>>>> when
>>>> building against the default version of hadoop as determined by Tez?
>> This
>>>> “default.version” will have a tendency to evolve over time :) . 
>>>> These simple version jars would be in addition to the version specific jars.
>>>>>>>> - What versions of hadoop should we compile against? 2.2, 2.4 
>>>>>>>> and
>> 2.6
>>>> or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor
>> version
>>>> so we should pick the latest version in each line i.e. 2.2.1 over 
>>>> 2.2.0
>> if
>>>> 2.2.1 exists.
>>>>>>>> 
>>>>>>>> Any other comments?
>>>>>>>> 
>>>>>>>> thanks
>>>>>>>> — Hitesh
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> —
>>>>>>> Chris K Wensel
>>>>>>> chris@wensel.net
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> —
>>>>> Chris K Wensel
>>>>> chris@wensel.net
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 

--
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com





RE: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Posted by Bikas Saha <bi...@hortonworks.com>.
As an aside, Flink could consider moving to a more current version. There have been many key improvements in Timeline Server, preemption, node labels, resource monitoring etc. that users may want to take advantage of.

If Tez publishes Hadoop version specific binaries to maven then Flink and others may be able to consume them directly during development.

Bikas

-----Original Message-----
From: Robert Metzger [mailto:rmetzger@apache.org] 
Sent: Sunday, March 08, 2015 6:40 AM
To: dev@tez.apache.org
Subject: Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Hi Hitesh,

I've talked about this with Kostas, let me check on some of our assumptions.

You can compile Flink against a hadoop1 and hadoop2 profile. We would include flink-on-tez only into our (default) hadoop2 profile.
For that profile, we use Hadoop 2.2.0.

You can see on maven central, that we publish two versions of each flink module for each release, a 0.8.1-hadoop1 and a 0.8.1 version.
This way users from both Hadoop APIs can use our system.

Adding Tez as a dependency to Flink (hadoop2) would cause a dependency conflict on the Hadoop version. Our parent pom enforces Hadoop 2.2.0 for all dependencies, so we force Tez to use Hadoop 2.2.0 as well.
In my understanding the compilation fails in that case.

If there would be a Tez version compatible with Hadoop 2.2.0 in mvn central, we could add the "flink-on-tez" module to maven central.

If thats not possible, users who want to use Flink-on-Tez have to compile Flink against Hadoop 2.6.0 themselves. Its only one maven command, but less convenient than something on mvn central.


On Fri, Mar 6, 2015 at 8:03 PM, Hitesh Shah <hi...@apache.org> wrote:

> Thanks for the feedback, Kostas,
>
> One clarification though - are you saying Tez should publish jars to 
> maven central built against different versions of Hadoop? If yes, is 
> this mainly due to the hadoop dependencies that Tez pulls in or due to 
> any incompatibilities that you have noticed?
>
> thanks
> — Hitesh
>
>
> On Mar 6, 2015, at 9:03 AM, Kostas Tzoumas <kt...@apache.org> wrote:
>
> > Publishing jars for different Hadoop dependencies, and in particular 
> > for Hadoop 2.2 would also be beneficial for Flink on Tez as we offer 
> > maven archetypes for users to create Flink applications.  Currently, 
> > we need to ask users that want to run Flink apps with Tez as backend 
> > to compile the Flink code themselves due to a Hadoop version mismatch.
> >
> >
> >
> > On Thu, Mar 5, 2015 at 1:46 AM, Hitesh Shah <hi...@apache.org> wrote:
> >
> >> From an ASF perspective, verifiable releases are only source releases.
> The
> >> binaries are just convenience artifacts that can also made 
> >> available
> with a
> >> given release. Hence in terms of supporting multiple hadoop 
> >> versions,
> we do
> >> want to allow various users/distros to compile Tez against their
> particular
> >> version of hadoop.
> >>
> >> From a run-time point of view , if Tez compiled against hadoop-2.6 
> >> is
> run
> >> on a 2.4 cluster, it should work normally as long as acls are 
> >> disabled ( via tez config tez.am.acls.enabled ). That said, there 
> >> are probably some improvements that could be done to handle the 
> >> case where acls are
> enabled
> >> on a 2.4 cluster in a more cleaner manner.
> >>
> >> thanks
> >> — Hitesh
> >>
> >> On Mar 4, 2015, at 9:21 AM, Chris K Wensel <ch...@wensel.net> wrote:
> >>
> >>> compile what against hadoop 2.4? Tez? Hopefully no one except Tez 
> >>> devs
> >> ever compile Tez (once the apache committers offer up pre-built
> binaries, I
> >> only ever do for this reason).
> >>>
> >>> if compiling application code against Tez and Hadoop 2.4, the jar 
> >>> won't
> >> come into play unless running tests (so i believe).
> >>>
> >>> I would then enhance option two to gracefully fail if -acls (the
> >> Manager) is not applicable (on hadoop 2.4) but mistakenly included 
> >> in
> the
> >> 2.4 classpath (testing app code against hadoop 2.4)
> >>>
> >>> of course then this is really option 1 now with two jars.
> >>>
> >>> ckw
> >>>
> >>>> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <hi...@apache.org> wrote:
> >>>>
> >>>> Thanks for the suggestions, Chris. Filed TEZ-2168 for this.
> >>>>
> >>>> At this point, I am inclined to follow option 2 mainly to retain 
> >>>> the
> >> ability for users to compile against hadoop 2.4. I am not sure if 
> >> there
> is
> >> a simple and performant way ( without using reflection for all 2.6
> specific
> >> calls ) to retain compile compatibility with option 1.
> >>>>
> >>>> Any other comments for other folks on this issue in general or on 
> >>>> the
> 2
> >> options that Chris suggested?
> >>>>
> >>>> thanks
> >>>> — Hitesh
> >>>>
> >>>>
> >>>> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <ch...@wensel.net> wrote:
> >>>>
> >>>>> The immediate issue is having two mutually exclusive artifacts:
> >> tez-yarn-timeline-history and tez-yarn-timeline-history
> >>>>>
> >>>>> outside of ATSHistoryACLPolicyManager, the code is identical. 
> >>>>> just
> the
> >> dependencies are changed.
> >>>>>
> >>>>> TezClient attempts to load this Manager, under the assumption if 
> >>>>> it
> >> exists, it is running on hadoop 2.6. (running on 2.4 is fatal)
> >>>>>
> >>>>> My recommendation would be never to change artifact names (or
> >> conditionally choose them) inside of major releases, but accreting 
> >> new, optional, ones as versions progress is fine.
> >>>>>
> >>>>> thus I would either:
> >>>>>
> >>>>> create a single artifact tez-yarn-timeline-history compiled with 
> >>>>> a
> >> default dep of hadoop 2.6, that includes the Manager. update the
> TezClient
> >> code to gracefully fail if the Manager is not applicable (the 
> >> runtime
> env
> >> is Hadoop 2.4).
> >>>>>
> >>>>> or
> >>>>>
> >>>>> offer tez-yarn-timeline-history-with-acls as an optional 
> >>>>> artifact for
> >> Hadoop 2.6 deployments, with the single Manager class in it, which 
> >> in
> turn
> >> requires the tez-yarn-timeline-history artifact -- which is 
> >> sufficient
> for
> >> a 2.4 runtime. if the user provides the additional -with-acls 
> >> artifact, they are knowingly going to have problems on Hadoop 2.4.
> >>>>>
> >>>>> I prefer the first as it keeps my build file simple. graceful
> >> degradation of services per environment (with appropriate logging) 
> >> is a well accepted practice.
> >>>>>
> >>>>> and you can now test Tez across multiple versions Hadoop/Yarn at
> >> runtime (outside of compile time).
> >>>>>
> >>>>> we do this with Cascading, just simple build file modifications 
> >>>>> to
> >> verify binary compatibility (vendors fork this repo to verify their 
> >> distributions, and been known to find critical bugs):
> >>>>>
> >>>>> https://github.com/Cascading/cascading.compatibility
> >>>>>
> >>>>> ckw
> >>>>>
> >>>>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hi...@apache.org>
> wrote:
> >>>>>>
> >>>>>> Hi folks,
> >>>>>>
> >>>>>> Chris raised a good point earlier in terms of publishing jars 
> >>>>>> for
> use
> >> against different versions of hadoop. For the most part, I think we 
> >> have done well to ensure that the user-facing modules are version 
> >> agnostic
> but
> >> the same does not hold for other modules which are times are needed 
> >> by other applications for testing.
> >>>>>>
> >>>>>> There aren’t really too many different options we can try.  The
> >> simplest option I can think of is just to build tez against 
> >> different versions of hadoop with the tez.version set to something 
> >> along the
> lines of
> >> “tez.version-hadoop.version”. This would imply having
> >> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability
> point of
> >> view, depending on the option we pick, users will need to switch 
> >> their dependencies to point to an appropriate version based on what 
> >> version of hadoop they are using. For apps such as hive and pig, 
> >> they will need to manage picking a particular version of tez based 
> >> on which hadoop profile they are building against.
> >>>>>>
> >>>>>> Any other suggestions for publishing version dependent jars?
> >>>>>>
> >>>>>> For binary releases, should we release only the minimal 
> >>>>>> tarball? or
> >> both the minimal and full tar balls? The full tarball is the 
> >> recommended deployment model as it is more robust towards 
> >> compatibility on a
> changing
> >> cluster. It should work in most scenarios as long as the hadoop 
> >> client libraries that Tez depends on are compatible with the 
> >> servers running on the cluster.
> >>>>>>
> >>>>>> General questions for the community/past release managers:
> >>>>>> - Should we retain the simple version ( i.e. plain only x.y.z ) 
> >>>>>> when
> >> building against the default version of hadoop as determined by Tez?
> This
> >> “default.version” will have a tendency to evolve over time :) . 
> >> These simple version jars would be in addition to the version specific jars.
> >>>>>> - What versions of hadoop should we compile against? 2.2, 2.4 
> >>>>>> and
> 2.6
> >> or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor
> version
> >> so we should pick the latest version in each line i.e. 2.2.1 over 
> >> 2.2.0
> if
> >> 2.2.1 exists.
> >>>>>>
> >>>>>> Any other comments?
> >>>>>>
> >>>>>> thanks
> >>>>>> — Hitesh
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> —
> >>>>> Chris K Wensel
> >>>>> chris@wensel.net
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>> —
> >>> Chris K Wensel
> >>> chris@wensel.net
> >>>
> >>>
> >>>
> >>>
> >>
> >>
>
>

Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Posted by Robert Metzger <rm...@apache.org>.
Hi Hitesh,

I've talked about this with Kostas, let me check on some of our assumptions.

You can compile Flink against a hadoop1 and hadoop2 profile. We would
include flink-on-tez only into our (default) hadoop2 profile.
For that profile, we use Hadoop 2.2.0.

You can see on maven central, that we publish two versions of each flink
module for each release, a 0.8.1-hadoop1 and a 0.8.1 version.
This way users from both Hadoop APIs can use our system.

Adding Tez as a dependency to Flink (hadoop2) would cause a dependency
conflict on the Hadoop version. Our parent pom enforces Hadoop 2.2.0 for
all dependencies, so we force Tez to use Hadoop 2.2.0 as well.
In my understanding the compilation fails in that case.

If there would be a Tez version compatible with Hadoop 2.2.0 in mvn
central, we could add the "flink-on-tez" module to maven central.

If thats not possible, users who want to use Flink-on-Tez have to compile
Flink against Hadoop 2.6.0 themselves. Its only one maven command, but less
convenient than something on mvn central.


On Fri, Mar 6, 2015 at 8:03 PM, Hitesh Shah <hi...@apache.org> wrote:

> Thanks for the feedback, Kostas,
>
> One clarification though - are you saying Tez should publish jars to maven
> central built against different versions of Hadoop? If yes, is this mainly
> due to the hadoop dependencies that Tez pulls in or due to any
> incompatibilities that you have noticed?
>
> thanks
> — Hitesh
>
>
> On Mar 6, 2015, at 9:03 AM, Kostas Tzoumas <kt...@apache.org> wrote:
>
> > Publishing jars for different Hadoop dependencies, and in particular for
> > Hadoop 2.2 would also be beneficial for Flink on Tez as we offer maven
> > archetypes for users to create Flink applications.  Currently, we need to
> > ask users that want to run Flink apps with Tez as backend to compile the
> > Flink code themselves due to a Hadoop version mismatch.
> >
> >
> >
> > On Thu, Mar 5, 2015 at 1:46 AM, Hitesh Shah <hi...@apache.org> wrote:
> >
> >> From an ASF perspective, verifiable releases are only source releases.
> The
> >> binaries are just convenience artifacts that can also made available
> with a
> >> given release. Hence in terms of supporting multiple hadoop versions,
> we do
> >> want to allow various users/distros to compile Tez against their
> particular
> >> version of hadoop.
> >>
> >> From a run-time point of view , if Tez compiled against hadoop-2.6 is
> run
> >> on a 2.4 cluster, it should work normally as long as acls are disabled (
> >> via tez config tez.am.acls.enabled ). That said, there are probably some
> >> improvements that could be done to handle the case where acls are
> enabled
> >> on a 2.4 cluster in a more cleaner manner.
> >>
> >> thanks
> >> — Hitesh
> >>
> >> On Mar 4, 2015, at 9:21 AM, Chris K Wensel <ch...@wensel.net> wrote:
> >>
> >>> compile what against hadoop 2.4? Tez? Hopefully no one except Tez devs
> >> ever compile Tez (once the apache committers offer up pre-built
> binaries, I
> >> only ever do for this reason).
> >>>
> >>> if compiling application code against Tez and Hadoop 2.4, the jar won't
> >> come into play unless running tests (so i believe).
> >>>
> >>> I would then enhance option two to gracefully fail if -acls (the
> >> Manager) is not applicable (on hadoop 2.4) but mistakenly included in
> the
> >> 2.4 classpath (testing app code against hadoop 2.4)
> >>>
> >>> of course then this is really option 1 now with two jars.
> >>>
> >>> ckw
> >>>
> >>>> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <hi...@apache.org> wrote:
> >>>>
> >>>> Thanks for the suggestions, Chris. Filed TEZ-2168 for this.
> >>>>
> >>>> At this point, I am inclined to follow option 2 mainly to retain the
> >> ability for users to compile against hadoop 2.4. I am not sure if there
> is
> >> a simple and performant way ( without using reflection for all 2.6
> specific
> >> calls ) to retain compile compatibility with option 1.
> >>>>
> >>>> Any other comments for other folks on this issue in general or on the
> 2
> >> options that Chris suggested?
> >>>>
> >>>> thanks
> >>>> — Hitesh
> >>>>
> >>>>
> >>>> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <ch...@wensel.net> wrote:
> >>>>
> >>>>> The immediate issue is having two mutually exclusive artifacts:
> >> tez-yarn-timeline-history and tez-yarn-timeline-history
> >>>>>
> >>>>> outside of ATSHistoryACLPolicyManager, the code is identical. just
> the
> >> dependencies are changed.
> >>>>>
> >>>>> TezClient attempts to load this Manager, under the assumption if it
> >> exists, it is running on hadoop 2.6. (running on 2.4 is fatal)
> >>>>>
> >>>>> My recommendation would be never to change artifact names (or
> >> conditionally choose them) inside of major releases, but accreting new,
> >> optional, ones as versions progress is fine.
> >>>>>
> >>>>> thus I would either:
> >>>>>
> >>>>> create a single artifact tez-yarn-timeline-history compiled with a
> >> default dep of hadoop 2.6, that includes the Manager. update the
> TezClient
> >> code to gracefully fail if the Manager is not applicable (the runtime
> env
> >> is Hadoop 2.4).
> >>>>>
> >>>>> or
> >>>>>
> >>>>> offer tez-yarn-timeline-history-with-acls as an optional artifact for
> >> Hadoop 2.6 deployments, with the single Manager class in it, which in
> turn
> >> requires the tez-yarn-timeline-history artifact -- which is sufficient
> for
> >> a 2.4 runtime. if the user provides the additional -with-acls artifact,
> >> they are knowingly going to have problems on Hadoop 2.4.
> >>>>>
> >>>>> I prefer the first as it keeps my build file simple. graceful
> >> degradation of services per environment (with appropriate logging) is a
> >> well accepted practice.
> >>>>>
> >>>>> and you can now test Tez across multiple versions Hadoop/Yarn at
> >> runtime (outside of compile time).
> >>>>>
> >>>>> we do this with Cascading, just simple build file modifications to
> >> verify binary compatibility (vendors fork this repo to verify their
> >> distributions, and been known to find critical bugs):
> >>>>>
> >>>>> https://github.com/Cascading/cascading.compatibility
> >>>>>
> >>>>> ckw
> >>>>>
> >>>>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hi...@apache.org>
> wrote:
> >>>>>>
> >>>>>> Hi folks,
> >>>>>>
> >>>>>> Chris raised a good point earlier in terms of publishing jars for
> use
> >> against different versions of hadoop. For the most part, I think we have
> >> done well to ensure that the user-facing modules are version agnostic
> but
> >> the same does not hold for other modules which are times are needed by
> >> other applications for testing.
> >>>>>>
> >>>>>> There aren’t really too many different options we can try.  The
> >> simplest option I can think of is just to build tez against different
> >> versions of hadoop with the tez.version set to something along the
> lines of
> >> “tez.version-hadoop.version”. This would imply having
> >> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability
> point of
> >> view, depending on the option we pick, users will need to switch their
> >> dependencies to point to an appropriate version based on what version of
> >> hadoop they are using. For apps such as hive and pig, they will need to
> >> manage picking a particular version of tez based on which hadoop profile
> >> they are building against.
> >>>>>>
> >>>>>> Any other suggestions for publishing version dependent jars?
> >>>>>>
> >>>>>> For binary releases, should we release only the minimal tarball? or
> >> both the minimal and full tar balls? The full tarball is the recommended
> >> deployment model as it is more robust towards compatibility on a
> changing
> >> cluster. It should work in most scenarios as long as the hadoop client
> >> libraries that Tez depends on are compatible with the servers running on
> >> the cluster.
> >>>>>>
> >>>>>> General questions for the community/past release managers:
> >>>>>> - Should we retain the simple version ( i.e. plain only x.y.z ) when
> >> building against the default version of hadoop as determined by Tez?
> This
> >> “default.version” will have a tendency to evolve over time :) . These
> >> simple version jars would be in addition to the version specific jars.
> >>>>>> - What versions of hadoop should we compile against? 2.2, 2.4 and
> 2.6
> >> or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor
> version
> >> so we should pick the latest version in each line i.e. 2.2.1 over 2.2.0
> if
> >> 2.2.1 exists.
> >>>>>>
> >>>>>> Any other comments?
> >>>>>>
> >>>>>> thanks
> >>>>>> — Hitesh
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> —
> >>>>> Chris K Wensel
> >>>>> chris@wensel.net
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>> —
> >>> Chris K Wensel
> >>> chris@wensel.net
> >>>
> >>>
> >>>
> >>>
> >>
> >>
>
>

Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Posted by Hitesh Shah <hi...@apache.org>.
Thanks for the feedback, Kostas, 

One clarification though - are you saying Tez should publish jars to maven central built against different versions of Hadoop? If yes, is this mainly due to the hadoop dependencies that Tez pulls in or due to any incompatibilities that you have noticed? 

thanks
— Hitesh


On Mar 6, 2015, at 9:03 AM, Kostas Tzoumas <kt...@apache.org> wrote:

> Publishing jars for different Hadoop dependencies, and in particular for
> Hadoop 2.2 would also be beneficial for Flink on Tez as we offer maven
> archetypes for users to create Flink applications.  Currently, we need to
> ask users that want to run Flink apps with Tez as backend to compile the
> Flink code themselves due to a Hadoop version mismatch.
> 
> 
> 
> On Thu, Mar 5, 2015 at 1:46 AM, Hitesh Shah <hi...@apache.org> wrote:
> 
>> From an ASF perspective, verifiable releases are only source releases. The
>> binaries are just convenience artifacts that can also made available with a
>> given release. Hence in terms of supporting multiple hadoop versions, we do
>> want to allow various users/distros to compile Tez against their particular
>> version of hadoop.
>> 
>> From a run-time point of view , if Tez compiled against hadoop-2.6 is run
>> on a 2.4 cluster, it should work normally as long as acls are disabled (
>> via tez config tez.am.acls.enabled ). That said, there are probably some
>> improvements that could be done to handle the case where acls are enabled
>> on a 2.4 cluster in a more cleaner manner.
>> 
>> thanks
>> — Hitesh
>> 
>> On Mar 4, 2015, at 9:21 AM, Chris K Wensel <ch...@wensel.net> wrote:
>> 
>>> compile what against hadoop 2.4? Tez? Hopefully no one except Tez devs
>> ever compile Tez (once the apache committers offer up pre-built binaries, I
>> only ever do for this reason).
>>> 
>>> if compiling application code against Tez and Hadoop 2.4, the jar won't
>> come into play unless running tests (so i believe).
>>> 
>>> I would then enhance option two to gracefully fail if -acls (the
>> Manager) is not applicable (on hadoop 2.4) but mistakenly included in the
>> 2.4 classpath (testing app code against hadoop 2.4)
>>> 
>>> of course then this is really option 1 now with two jars.
>>> 
>>> ckw
>>> 
>>>> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <hi...@apache.org> wrote:
>>>> 
>>>> Thanks for the suggestions, Chris. Filed TEZ-2168 for this.
>>>> 
>>>> At this point, I am inclined to follow option 2 mainly to retain the
>> ability for users to compile against hadoop 2.4. I am not sure if there is
>> a simple and performant way ( without using reflection for all 2.6 specific
>> calls ) to retain compile compatibility with option 1.
>>>> 
>>>> Any other comments for other folks on this issue in general or on the 2
>> options that Chris suggested?
>>>> 
>>>> thanks
>>>> — Hitesh
>>>> 
>>>> 
>>>> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <ch...@wensel.net> wrote:
>>>> 
>>>>> The immediate issue is having two mutually exclusive artifacts:
>> tez-yarn-timeline-history and tez-yarn-timeline-history
>>>>> 
>>>>> outside of ATSHistoryACLPolicyManager, the code is identical. just the
>> dependencies are changed.
>>>>> 
>>>>> TezClient attempts to load this Manager, under the assumption if it
>> exists, it is running on hadoop 2.6. (running on 2.4 is fatal)
>>>>> 
>>>>> My recommendation would be never to change artifact names (or
>> conditionally choose them) inside of major releases, but accreting new,
>> optional, ones as versions progress is fine.
>>>>> 
>>>>> thus I would either:
>>>>> 
>>>>> create a single artifact tez-yarn-timeline-history compiled with a
>> default dep of hadoop 2.6, that includes the Manager. update the TezClient
>> code to gracefully fail if the Manager is not applicable (the runtime env
>> is Hadoop 2.4).
>>>>> 
>>>>> or
>>>>> 
>>>>> offer tez-yarn-timeline-history-with-acls as an optional artifact for
>> Hadoop 2.6 deployments, with the single Manager class in it, which in turn
>> requires the tez-yarn-timeline-history artifact -- which is sufficient for
>> a 2.4 runtime. if the user provides the additional -with-acls artifact,
>> they are knowingly going to have problems on Hadoop 2.4.
>>>>> 
>>>>> I prefer the first as it keeps my build file simple. graceful
>> degradation of services per environment (with appropriate logging) is a
>> well accepted practice.
>>>>> 
>>>>> and you can now test Tez across multiple versions Hadoop/Yarn at
>> runtime (outside of compile time).
>>>>> 
>>>>> we do this with Cascading, just simple build file modifications to
>> verify binary compatibility (vendors fork this repo to verify their
>> distributions, and been known to find critical bugs):
>>>>> 
>>>>> https://github.com/Cascading/cascading.compatibility
>>>>> 
>>>>> ckw
>>>>> 
>>>>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hi...@apache.org> wrote:
>>>>>> 
>>>>>> Hi folks,
>>>>>> 
>>>>>> Chris raised a good point earlier in terms of publishing jars for use
>> against different versions of hadoop. For the most part, I think we have
>> done well to ensure that the user-facing modules are version agnostic but
>> the same does not hold for other modules which are times are needed by
>> other applications for testing.
>>>>>> 
>>>>>> There aren’t really too many different options we can try.  The
>> simplest option I can think of is just to build tez against different
>> versions of hadoop with the tez.version set to something along the lines of
>> “tez.version-hadoop.version”. This would imply having
>> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of
>> view, depending on the option we pick, users will need to switch their
>> dependencies to point to an appropriate version based on what version of
>> hadoop they are using. For apps such as hive and pig, they will need to
>> manage picking a particular version of tez based on which hadoop profile
>> they are building against.
>>>>>> 
>>>>>> Any other suggestions for publishing version dependent jars?
>>>>>> 
>>>>>> For binary releases, should we release only the minimal tarball? or
>> both the minimal and full tar balls? The full tarball is the recommended
>> deployment model as it is more robust towards compatibility on a changing
>> cluster. It should work in most scenarios as long as the hadoop client
>> libraries that Tez depends on are compatible with the servers running on
>> the cluster.
>>>>>> 
>>>>>> General questions for the community/past release managers:
>>>>>> - Should we retain the simple version ( i.e. plain only x.y.z ) when
>> building against the default version of hadoop as determined by Tez? This
>> “default.version” will have a tendency to evolve over time :) . These
>> simple version jars would be in addition to the version specific jars.
>>>>>> - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6
>> or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version
>> so we should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if
>> 2.2.1 exists.
>>>>>> 
>>>>>> Any other comments?
>>>>>> 
>>>>>> thanks
>>>>>> — Hitesh
>>>>>> 
>>>>>> 
>>>>> 
>>>>> —
>>>>> Chris K Wensel
>>>>> chris@wensel.net
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> —
>>> Chris K Wensel
>>> chris@wensel.net
>>> 
>>> 
>>> 
>>> 
>> 
>> 


Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Posted by Kostas Tzoumas <kt...@apache.org>.
Publishing jars for different Hadoop dependencies, and in particular for
Hadoop 2.2 would also be beneficial for Flink on Tez as we offer maven
archetypes for users to create Flink applications.  Currently, we need to
ask users that want to run Flink apps with Tez as backend to compile the
Flink code themselves due to a Hadoop version mismatch.



On Thu, Mar 5, 2015 at 1:46 AM, Hitesh Shah <hi...@apache.org> wrote:

> From an ASF perspective, verifiable releases are only source releases. The
> binaries are just convenience artifacts that can also made available with a
> given release. Hence in terms of supporting multiple hadoop versions, we do
> want to allow various users/distros to compile Tez against their particular
> version of hadoop.
>
> From a run-time point of view , if Tez compiled against hadoop-2.6 is run
> on a 2.4 cluster, it should work normally as long as acls are disabled (
> via tez config tez.am.acls.enabled ). That said, there are probably some
> improvements that could be done to handle the case where acls are enabled
> on a 2.4 cluster in a more cleaner manner.
>
> thanks
> — Hitesh
>
> On Mar 4, 2015, at 9:21 AM, Chris K Wensel <ch...@wensel.net> wrote:
>
> > compile what against hadoop 2.4? Tez? Hopefully no one except Tez devs
> ever compile Tez (once the apache committers offer up pre-built binaries, I
> only ever do for this reason).
> >
> > if compiling application code against Tez and Hadoop 2.4, the jar won't
> come into play unless running tests (so i believe).
> >
> > I would then enhance option two to gracefully fail if -acls (the
> Manager) is not applicable (on hadoop 2.4) but mistakenly included in the
> 2.4 classpath (testing app code against hadoop 2.4)
> >
> > of course then this is really option 1 now with two jars.
> >
> > ckw
> >
> >> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <hi...@apache.org> wrote:
> >>
> >> Thanks for the suggestions, Chris. Filed TEZ-2168 for this.
> >>
> >> At this point, I am inclined to follow option 2 mainly to retain the
> ability for users to compile against hadoop 2.4. I am not sure if there is
> a simple and performant way ( without using reflection for all 2.6 specific
> calls ) to retain compile compatibility with option 1.
> >>
> >> Any other comments for other folks on this issue in general or on the 2
> options that Chris suggested?
> >>
> >> thanks
> >> — Hitesh
> >>
> >>
> >> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <ch...@wensel.net> wrote:
> >>
> >>> The immediate issue is having two mutually exclusive artifacts:
> tez-yarn-timeline-history and tez-yarn-timeline-history
> >>>
> >>> outside of ATSHistoryACLPolicyManager, the code is identical. just the
> dependencies are changed.
> >>>
> >>> TezClient attempts to load this Manager, under the assumption if it
> exists, it is running on hadoop 2.6. (running on 2.4 is fatal)
> >>>
> >>> My recommendation would be never to change artifact names (or
> conditionally choose them) inside of major releases, but accreting new,
> optional, ones as versions progress is fine.
> >>>
> >>> thus I would either:
> >>>
> >>> create a single artifact tez-yarn-timeline-history compiled with a
> default dep of hadoop 2.6, that includes the Manager. update the TezClient
> code to gracefully fail if the Manager is not applicable (the runtime env
> is Hadoop 2.4).
> >>>
> >>> or
> >>>
> >>> offer tez-yarn-timeline-history-with-acls as an optional artifact for
> Hadoop 2.6 deployments, with the single Manager class in it, which in turn
> requires the tez-yarn-timeline-history artifact -- which is sufficient for
> a 2.4 runtime. if the user provides the additional -with-acls artifact,
> they are knowingly going to have problems on Hadoop 2.4.
> >>>
> >>> I prefer the first as it keeps my build file simple. graceful
> degradation of services per environment (with appropriate logging) is a
> well accepted practice.
> >>>
> >>> and you can now test Tez across multiple versions Hadoop/Yarn at
> runtime (outside of compile time).
> >>>
> >>> we do this with Cascading, just simple build file modifications to
> verify binary compatibility (vendors fork this repo to verify their
> distributions, and been known to find critical bugs):
> >>>
> >>> https://github.com/Cascading/cascading.compatibility
> >>>
> >>> ckw
> >>>
> >>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hi...@apache.org> wrote:
> >>>>
> >>>> Hi folks,
> >>>>
> >>>> Chris raised a good point earlier in terms of publishing jars for use
> against different versions of hadoop. For the most part, I think we have
> done well to ensure that the user-facing modules are version agnostic but
> the same does not hold for other modules which are times are needed by
> other applications for testing.
> >>>>
> >>>> There aren’t really too many different options we can try.  The
> simplest option I can think of is just to build tez against different
> versions of hadoop with the tez.version set to something along the lines of
> “tez.version-hadoop.version”. This would imply having
> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of
> view, depending on the option we pick, users will need to switch their
> dependencies to point to an appropriate version based on what version of
> hadoop they are using. For apps such as hive and pig, they will need to
> manage picking a particular version of tez based on which hadoop profile
> they are building against.
> >>>>
> >>>> Any other suggestions for publishing version dependent jars?
> >>>>
> >>>> For binary releases, should we release only the minimal tarball? or
> both the minimal and full tar balls? The full tarball is the recommended
> deployment model as it is more robust towards compatibility on a changing
> cluster. It should work in most scenarios as long as the hadoop client
> libraries that Tez depends on are compatible with the servers running on
> the cluster.
> >>>>
> >>>> General questions for the community/past release managers:
> >>>> - Should we retain the simple version ( i.e. plain only x.y.z ) when
> building against the default version of hadoop as determined by Tez? This
> “default.version” will have a tendency to evolve over time :) . These
> simple version jars would be in addition to the version specific jars.
> >>>> - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6
> or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version
> so we should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if
> 2.2.1 exists.
> >>>>
> >>>> Any other comments?
> >>>>
> >>>> thanks
> >>>> — Hitesh
> >>>>
> >>>>
> >>>
> >>> —
> >>> Chris K Wensel
> >>> chris@wensel.net
> >>>
> >>>
> >>>
> >>>
> >>
> >
> > —
> > Chris K Wensel
> > chris@wensel.net
> >
> >
> >
> >
>
>

Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Posted by Hitesh Shah <hi...@apache.org>.
From an ASF perspective, verifiable releases are only source releases. The binaries are just convenience artifacts that can also made available with a given release. Hence in terms of supporting multiple hadoop versions, we do want to allow various users/distros to compile Tez against their particular version of hadoop. 

From a run-time point of view , if Tez compiled against hadoop-2.6 is run on a 2.4 cluster, it should work normally as long as acls are disabled ( via tez config tez.am.acls.enabled ). That said, there are probably some improvements that could be done to handle the case where acls are enabled on a 2.4 cluster in a more cleaner manner.

thanks
— Hitesh

On Mar 4, 2015, at 9:21 AM, Chris K Wensel <ch...@wensel.net> wrote:

> compile what against hadoop 2.4? Tez? Hopefully no one except Tez devs ever compile Tez (once the apache committers offer up pre-built binaries, I only ever do for this reason).
> 
> if compiling application code against Tez and Hadoop 2.4, the jar won't come into play unless running tests (so i believe).
> 
> I would then enhance option two to gracefully fail if -acls (the Manager) is not applicable (on hadoop 2.4) but mistakenly included in the 2.4 classpath (testing app code against hadoop 2.4)
> 
> of course then this is really option 1 now with two jars.
> 
> ckw
> 
>> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <hi...@apache.org> wrote:
>> 
>> Thanks for the suggestions, Chris. Filed TEZ-2168 for this. 
>> 
>> At this point, I am inclined to follow option 2 mainly to retain the ability for users to compile against hadoop 2.4. I am not sure if there is a simple and performant way ( without using reflection for all 2.6 specific calls ) to retain compile compatibility with option 1.
>> 
>> Any other comments for other folks on this issue in general or on the 2 options that Chris suggested? 
>> 
>> thanks
>> — Hitesh
>> 
>> 
>> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <ch...@wensel.net> wrote:
>> 
>>> The immediate issue is having two mutually exclusive artifacts: tez-yarn-timeline-history and tez-yarn-timeline-history
>>> 
>>> outside of ATSHistoryACLPolicyManager, the code is identical. just the dependencies are changed.
>>> 
>>> TezClient attempts to load this Manager, under the assumption if it exists, it is running on hadoop 2.6. (running on 2.4 is fatal)
>>> 
>>> My recommendation would be never to change artifact names (or conditionally choose them) inside of major releases, but accreting new, optional, ones as versions progress is fine.
>>> 
>>> thus I would either:
>>> 
>>> create a single artifact tez-yarn-timeline-history compiled with a default dep of hadoop 2.6, that includes the Manager. update the TezClient code to gracefully fail if the Manager is not applicable (the runtime env is Hadoop 2.4).
>>> 
>>> or
>>> 
>>> offer tez-yarn-timeline-history-with-acls as an optional artifact for Hadoop 2.6 deployments, with the single Manager class in it, which in turn requires the tez-yarn-timeline-history artifact -- which is sufficient for a 2.4 runtime. if the user provides the additional -with-acls artifact, they are knowingly going to have problems on Hadoop 2.4.
>>> 
>>> I prefer the first as it keeps my build file simple. graceful degradation of services per environment (with appropriate logging) is a well accepted practice.
>>> 
>>> and you can now test Tez across multiple versions Hadoop/Yarn at runtime (outside of compile time).
>>> 
>>> we do this with Cascading, just simple build file modifications to verify binary compatibility (vendors fork this repo to verify their distributions, and been known to find critical bugs):
>>> 
>>> https://github.com/Cascading/cascading.compatibility
>>> 
>>> ckw
>>> 
>>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hi...@apache.org> wrote:
>>>> 
>>>> Hi folks, 
>>>> 
>>>> Chris raised a good point earlier in terms of publishing jars for use against different versions of hadoop. For the most part, I think we have done well to ensure that the user-facing modules are version agnostic but the same does not hold for other modules which are times are needed by other applications for testing.
>>>> 
>>>> There aren’t really too many different options we can try.  The simplest option I can think of is just to build tez against different versions of hadoop with the tez.version set to something along the lines of “tez.version-hadoop.version”. This would imply having tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of view, depending on the option we pick, users will need to switch their dependencies to point to an appropriate version based on what version of hadoop they are using. For apps such as hive and pig, they will need to manage picking a particular version of tez based on which hadoop profile they are building against. 
>>>> 
>>>> Any other suggestions for publishing version dependent jars?
>>>> 
>>>> For binary releases, should we release only the minimal tarball? or both the minimal and full tar balls? The full tarball is the recommended deployment model as it is more robust towards compatibility on a changing cluster. It should work in most scenarios as long as the hadoop client libraries that Tez depends on are compatible with the servers running on the cluster.
>>>> 
>>>> General questions for the community/past release managers: 
>>>> - Should we retain the simple version ( i.e. plain only x.y.z ) when building against the default version of hadoop as determined by Tez? This “default.version” will have a tendency to evolve over time :) . These simple version jars would be in addition to the version specific jars. 
>>>> - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6 or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version so we should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if 2.2.1 exists. 
>>>> 
>>>> Any other comments? 
>>>> 
>>>> thanks
>>>> — Hitesh
>>>> 
>>>> 
>>> 
>>> —
>>> Chris K Wensel
>>> chris@wensel.net
>>> 
>>> 
>>> 
>>> 
>> 
> 
> —
> Chris K Wensel
> chris@wensel.net
> 
> 
> 
> 


Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Posted by Chris K Wensel <ch...@wensel.net>.
compile what against hadoop 2.4? Tez? Hopefully no one except Tez devs ever compile Tez (once the apache committers offer up pre-built binaries, I only ever do for this reason).

if compiling application code against Tez and Hadoop 2.4, the jar won't come into play unless running tests (so i believe).

I would then enhance option two to gracefully fail if -acls (the Manager) is not applicable (on hadoop 2.4) but mistakenly included in the 2.4 classpath (testing app code against hadoop 2.4)

of course then this is really option 1 now with two jars.

ckw

> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <hi...@apache.org> wrote:
> 
> Thanks for the suggestions, Chris. Filed TEZ-2168 for this. 
> 
> At this point, I am inclined to follow option 2 mainly to retain the ability for users to compile against hadoop 2.4. I am not sure if there is a simple and performant way ( without using reflection for all 2.6 specific calls ) to retain compile compatibility with option 1.
> 
> Any other comments for other folks on this issue in general or on the 2 options that Chris suggested? 
> 
> thanks
> — Hitesh
> 
> 
> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <ch...@wensel.net> wrote:
> 
>> The immediate issue is having two mutually exclusive artifacts: tez-yarn-timeline-history and tez-yarn-timeline-history
>> 
>> outside of ATSHistoryACLPolicyManager, the code is identical. just the dependencies are changed.
>> 
>> TezClient attempts to load this Manager, under the assumption if it exists, it is running on hadoop 2.6. (running on 2.4 is fatal)
>> 
>> My recommendation would be never to change artifact names (or conditionally choose them) inside of major releases, but accreting new, optional, ones as versions progress is fine.
>> 
>> thus I would either:
>> 
>> create a single artifact tez-yarn-timeline-history compiled with a default dep of hadoop 2.6, that includes the Manager. update the TezClient code to gracefully fail if the Manager is not applicable (the runtime env is Hadoop 2.4).
>> 
>> or
>> 
>> offer tez-yarn-timeline-history-with-acls as an optional artifact for Hadoop 2.6 deployments, with the single Manager class in it, which in turn requires the tez-yarn-timeline-history artifact -- which is sufficient for a 2.4 runtime. if the user provides the additional -with-acls artifact, they are knowingly going to have problems on Hadoop 2.4.
>> 
>> I prefer the first as it keeps my build file simple. graceful degradation of services per environment (with appropriate logging) is a well accepted practice.
>> 
>> and you can now test Tez across multiple versions Hadoop/Yarn at runtime (outside of compile time).
>> 
>> we do this with Cascading, just simple build file modifications to verify binary compatibility (vendors fork this repo to verify their distributions, and been known to find critical bugs):
>> 
>> https://github.com/Cascading/cascading.compatibility
>> 
>> ckw
>> 
>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hi...@apache.org> wrote:
>>> 
>>> Hi folks, 
>>> 
>>> Chris raised a good point earlier in terms of publishing jars for use against different versions of hadoop. For the most part, I think we have done well to ensure that the user-facing modules are version agnostic but the same does not hold for other modules which are times are needed by other applications for testing.
>>> 
>>> There aren’t really too many different options we can try.  The simplest option I can think of is just to build tez against different versions of hadoop with the tez.version set to something along the lines of “tez.version-hadoop.version”. This would imply having tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of view, depending on the option we pick, users will need to switch their dependencies to point to an appropriate version based on what version of hadoop they are using. For apps such as hive and pig, they will need to manage picking a particular version of tez based on which hadoop profile they are building against. 
>>> 
>>> Any other suggestions for publishing version dependent jars?
>>> 
>>> For binary releases, should we release only the minimal tarball? or both the minimal and full tar balls? The full tarball is the recommended deployment model as it is more robust towards compatibility on a changing cluster. It should work in most scenarios as long as the hadoop client libraries that Tez depends on are compatible with the servers running on the cluster.
>>> 
>>> General questions for the community/past release managers: 
>>> - Should we retain the simple version ( i.e. plain only x.y.z ) when building against the default version of hadoop as determined by Tez? This “default.version” will have a tendency to evolve over time :) . These simple version jars would be in addition to the version specific jars. 
>>> - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6 or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version so we should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if 2.2.1 exists. 
>>> 
>>> Any other comments? 
>>> 
>>> thanks
>>> — Hitesh
>>> 
>>> 
>> 
>> —
>> Chris K Wensel
>> chris@wensel.net
>> 
>> 
>> 
>> 
> 

—
Chris K Wensel
chris@wensel.net





Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Posted by Hitesh Shah <hi...@apache.org>.
Thanks for the suggestions, Chris. Filed TEZ-2168 for this. 

At this point, I am inclined to follow option 2 mainly to retain the ability for users to compile against hadoop 2.4. I am not sure if there is a simple and performant way ( without using reflection for all 2.6 specific calls ) to retain compile compatibility with option 1.

Any other comments for other folks on this issue in general or on the 2 options that Chris suggested? 

thanks
— Hitesh


On Feb 26, 2015, at 1:18 PM, Chris K Wensel <ch...@wensel.net> wrote:

> The immediate issue is having two mutually exclusive artifacts: tez-yarn-timeline-history and tez-yarn-timeline-history
> 
> outside of ATSHistoryACLPolicyManager, the code is identical. just the dependencies are changed.
> 
> TezClient attempts to load this Manager, under the assumption if it exists, it is running on hadoop 2.6. (running on 2.4 is fatal)
> 
> My recommendation would be never to change artifact names (or conditionally choose them) inside of major releases, but accreting new, optional, ones as versions progress is fine.
> 
> thus I would either:
> 
> create a single artifact tez-yarn-timeline-history compiled with a default dep of hadoop 2.6, that includes the Manager. update the TezClient code to gracefully fail if the Manager is not applicable (the runtime env is Hadoop 2.4).
> 
> or
> 
> offer tez-yarn-timeline-history-with-acls as an optional artifact for Hadoop 2.6 deployments, with the single Manager class in it, which in turn requires the tez-yarn-timeline-history artifact -- which is sufficient for a 2.4 runtime. if the user provides the additional -with-acls artifact, they are knowingly going to have problems on Hadoop 2.4.
> 
> I prefer the first as it keeps my build file simple. graceful degradation of services per environment (with appropriate logging) is a well accepted practice.
> 
> and you can now test Tez across multiple versions Hadoop/Yarn at runtime (outside of compile time).
> 
> we do this with Cascading, just simple build file modifications to verify binary compatibility (vendors fork this repo to verify their distributions, and been known to find critical bugs):
> 
> https://github.com/Cascading/cascading.compatibility
> 
> ckw
> 
>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hi...@apache.org> wrote:
>> 
>> Hi folks, 
>> 
>> Chris raised a good point earlier in terms of publishing jars for use against different versions of hadoop. For the most part, I think we have done well to ensure that the user-facing modules are version agnostic but the same does not hold for other modules which are times are needed by other applications for testing.
>> 
>> There aren’t really too many different options we can try.  The simplest option I can think of is just to build tez against different versions of hadoop with the tez.version set to something along the lines of “tez.version-hadoop.version”. This would imply having tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of view, depending on the option we pick, users will need to switch their dependencies to point to an appropriate version based on what version of hadoop they are using. For apps such as hive and pig, they will need to manage picking a particular version of tez based on which hadoop profile they are building against. 
>> 
>> Any other suggestions for publishing version dependent jars?
>> 
>> For binary releases, should we release only the minimal tarball? or both the minimal and full tar balls? The full tarball is the recommended deployment model as it is more robust towards compatibility on a changing cluster. It should work in most scenarios as long as the hadoop client libraries that Tez depends on are compatible with the servers running on the cluster.
>> 
>> General questions for the community/past release managers: 
>>  - Should we retain the simple version ( i.e. plain only x.y.z ) when building against the default version of hadoop as determined by Tez? This “default.version” will have a tendency to evolve over time :) . These simple version jars would be in addition to the version specific jars. 
>>  - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6 or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version so we should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if 2.2.1 exists. 
>> 
>> Any other comments? 
>> 
>> thanks
>> — Hitesh
>> 
>> 
> 
> —
> Chris K Wensel
> chris@wensel.net
> 
> 
> 
> 


Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies

Posted by Chris K Wensel <ch...@wensel.net>.
The immediate issue is having two mutually exclusive artifacts: tez-yarn-timeline-history and tez-yarn-timeline-history

outside of ATSHistoryACLPolicyManager, the code is identical. just the dependencies are changed.

TezClient attempts to load this Manager, under the assumption if it exists, it is running on hadoop 2.6. (running on 2.4 is fatal)

My recommendation would be never to change artifact names (or conditionally choose them) inside of major releases, but accreting new, optional, ones as versions progress is fine.

thus I would either:

create a single artifact tez-yarn-timeline-history compiled with a default dep of hadoop 2.6, that includes the Manager. update the TezClient code to gracefully fail if the Manager is not applicable (the runtime env is Hadoop 2.4).

or

offer tez-yarn-timeline-history-with-acls as an optional artifact for Hadoop 2.6 deployments, with the single Manager class in it, which in turn requires the tez-yarn-timeline-history artifact -- which is sufficient for a 2.4 runtime. if the user provides the additional -with-acls artifact, they are knowingly going to have problems on Hadoop 2.4.

I prefer the first as it keeps my build file simple. graceful degradation of services per environment (with appropriate logging) is a well accepted practice.

and you can now test Tez across multiple versions Hadoop/Yarn at runtime (outside of compile time).

we do this with Cascading, just simple build file modifications to verify binary compatibility (vendors fork this repo to verify their distributions, and been known to find critical bugs):

https://github.com/Cascading/cascading.compatibility

ckw

> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hi...@apache.org> wrote:
> 
> Hi folks, 
> 
> Chris raised a good point earlier in terms of publishing jars for use against different versions of hadoop. For the most part, I think we have done well to ensure that the user-facing modules are version agnostic but the same does not hold for other modules which are times are needed by other applications for testing.
> 
> There aren’t really too many different options we can try.  The simplest option I can think of is just to build tez against different versions of hadoop with the tez.version set to something along the lines of “tez.version-hadoop.version”. This would imply having tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of view, depending on the option we pick, users will need to switch their dependencies to point to an appropriate version based on what version of hadoop they are using. For apps such as hive and pig, they will need to manage picking a particular version of tez based on which hadoop profile they are building against. 
> 
> Any other suggestions for publishing version dependent jars?
> 
> For binary releases, should we release only the minimal tarball? or both the minimal and full tar balls? The full tarball is the recommended deployment model as it is more robust towards compatibility on a changing cluster. It should work in most scenarios as long as the hadoop client libraries that Tez depends on are compatible with the servers running on the cluster.
> 
> General questions for the community/past release managers: 
>   - Should we retain the simple version ( i.e. plain only x.y.z ) when building against the default version of hadoop as determined by Tez? This “default.version” will have a tendency to evolve over time :) . These simple version jars would be in addition to the version specific jars. 
>   - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6 or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version so we should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if 2.2.1 exists. 
> 
> Any other comments? 
> 
> thanks
> — Hitesh
> 
> 

—
Chris K Wensel
chris@wensel.net