You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by John Vines <vi...@apache.org> on 2013/05/15 00:14:07 UTC

Re: Hadoop 2 compatibility issues - tangent

On that note, I was wondering if there were any suggestions for how to deal
with the laundry list of provided dependencies that Accumulo core has?
Writing packages against it is a bit ugly if not using the accumulo script
to start. Are there any maven utilities to automatically dissect provided
dependencies and make them included.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 6:09 PM, "Keith Turner" <ke...@deenlo.com> wrote:

> One note about option 4.  When using 1.4 users have to include hadoop core
> as a dependency in their pom. This must be done because the 1.4 Accumulo
> pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps in
> the profile are provided?
>
>
> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:
>
> > So, I've run into a problem with ACCUMULO-1402 that requires a larger
> > discussion about how Accumulo 1.5.0 should support Hadoop2.
> >
> > The problem is basically that profiles should not contain
> > dependencies, because profiles don't get activated transitively. A
> > slide deck by the Maven developers point this out as a bad practice...
> > yet it's a practice we rely on for our current implementation of
> > Hadoop2 support
> > (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> > slide 80).
> >
> > What this means is that even if we go through the work of publishing
> > binary artifacts compiled against Hadoop2, neither our Hadoop1
> > binaries or our Hadoop2 binaries will be able to transitively resolve
> > any dependencies defined in profiles. This has significant
> > implications to user code that depends on Accumulo Maven artifacts.
> > Every user will essentially have to explicitly add Hadoop dependencies
> > for every Accumulo artifact that has dependencies on Hadoop, either
> > because we directly or transitively depend on Hadoop (they'll have to
> > peek into the profiles in our POMs and copy/paste the profile into
> > their project). This becomes more complicated when we consider how
> > users will try to use things like Instamo.
> >
> > There are workarounds, but none of them are really pleasant.
> >
> > 1. The best way to support both major Hadoop APIs is to have separate
> > modules with separate dependencies directly in the POM. This is a fair
> > amount of work, and in my opinion, would be too disruptive for 1.5.0.
> > This solution also gets us separate binaries for separate supported
> > versions, which is useful.
> >
> > 2. A second option, and the preferred one I think for 1.5.0, is to put
> > a Hadoop2 patch in the branch's contrib directory
> > (branches/1.5/contrib) that patches the POM files to support building
> > against Hadoop2. (Acknowledgement to Keith for suggesting this
> > solution.)
> >
> > 3. A third option is to fork Accumulo, and maintain two separate
> > builds (a more traditional technique). This adds merging nightmare for
> > features/patches, but gets around some reflection hacks that we may
> > have been motivated to do in the past. I'm not a fan of this option,
> > particularly because I don't want to replicate the fork nightmare that
> > has been the history of early Hadoop itself.
> >
> > 4. The last option is to do nothing and to continue to build with the
> > separate profiles as we are, and make users discover and specify
> > transitive dependencies entirely on their own. I think this is the
> > worst option, as it essentially amounts to "ignore the problem".
> >
> > At the very least, it does not seem reasonable to complete
> > ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >
> > Thoughts? Discussion? Vote on option?
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
>

Re: Hadoop 2 compatibility issues - tangent

Posted by David Medinets <da...@gmail.com>.
You can have maven generate a file with the classpath dependencies and also
make a shaded jar. I use the classpath file for normal java processes and
the shaded jar file with 'hadoop jar'.


On Tue, May 14, 2013 at 6:14 PM, John Vines <vi...@apache.org> wrote:

> On that note, I was wondering if there were any suggestions for how to deal
> with the laundry list of provided dependencies that Accumulo core has?
> Writing packages against it is a bit ugly if not using the accumulo script
> to start. Are there any maven utilities to automatically dissect provided
> dependencies and make them included.
>
> Sent from my phone, please pardon the typos and brevity.
> On May 14, 2013 6:09 PM, "Keith Turner" <ke...@deenlo.com> wrote:
>
> > One note about option 4.  When using 1.4 users have to include hadoop
> core
> > as a dependency in their pom. This must be done because the 1.4 Accumulo
> > pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps
> in
> > the profile are provided?
> >
> >
> > On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
> wrote:
> >
> > > So, I've run into a problem with ACCUMULO-1402 that requires a larger
> > > discussion about how Accumulo 1.5.0 should support Hadoop2.
> > >
> > > The problem is basically that profiles should not contain
> > > dependencies, because profiles don't get activated transitively. A
> > > slide deck by the Maven developers point this out as a bad practice...
> > > yet it's a practice we rely on for our current implementation of
> > > Hadoop2 support
> > > (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> > > slide 80).
> > >
> > > What this means is that even if we go through the work of publishing
> > > binary artifacts compiled against Hadoop2, neither our Hadoop1
> > > binaries or our Hadoop2 binaries will be able to transitively resolve
> > > any dependencies defined in profiles. This has significant
> > > implications to user code that depends on Accumulo Maven artifacts.
> > > Every user will essentially have to explicitly add Hadoop dependencies
> > > for every Accumulo artifact that has dependencies on Hadoop, either
> > > because we directly or transitively depend on Hadoop (they'll have to
> > > peek into the profiles in our POMs and copy/paste the profile into
> > > their project). This becomes more complicated when we consider how
> > > users will try to use things like Instamo.
> > >
> > > There are workarounds, but none of them are really pleasant.
> > >
> > > 1. The best way to support both major Hadoop APIs is to have separate
> > > modules with separate dependencies directly in the POM. This is a fair
> > > amount of work, and in my opinion, would be too disruptive for 1.5.0.
> > > This solution also gets us separate binaries for separate supported
> > > versions, which is useful.
> > >
> > > 2. A second option, and the preferred one I think for 1.5.0, is to put
> > > a Hadoop2 patch in the branch's contrib directory
> > > (branches/1.5/contrib) that patches the POM files to support building
> > > against Hadoop2. (Acknowledgement to Keith for suggesting this
> > > solution.)
> > >
> > > 3. A third option is to fork Accumulo, and maintain two separate
> > > builds (a more traditional technique). This adds merging nightmare for
> > > features/patches, but gets around some reflection hacks that we may
> > > have been motivated to do in the past. I'm not a fan of this option,
> > > particularly because I don't want to replicate the fork nightmare that
> > > has been the history of early Hadoop itself.
> > >
> > > 4. The last option is to do nothing and to continue to build with the
> > > separate profiles as we are, and make users discover and specify
> > > transitive dependencies entirely on their own. I think this is the
> > > worst option, as it essentially amounts to "ignore the problem".
> > >
> > > At the very least, it does not seem reasonable to complete
> > > ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> > >
> > > Thoughts? Discussion? Vote on option?
> > >
> > > --
> > > Christopher L Tubbs II
> > > http://gravatar.com/ctubbsii
> > >
> >
>

Re: Hadoop 2 compatibility issues - tangent

Posted by Christopher <ct...@apache.org>.
No problem. FYI, this is essentially what we do to drop the
non-provided deps into lib/ in the first place.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, May 15, 2013 at 3:03 AM, John Vines <vi...@apache.org> wrote:
> Awesome Chris, thanks. I didn't know where to begin looking for that one.
>
> Sent from my phone, please pardon the typos and brevity.
> On May 14, 2013 7:11 PM, "Christopher" <ct...@apache.org> wrote:
>
>> With the right configuration, you could use the copy-dependencies goal
>> of the maven-dependency-plugin to gather your dependencies to one
>> place.
>>
>> --
>> Christopher L Tubbs II
>> http://gravatar.com/ctubbsii
>>
>>
>> On Tue, May 14, 2013 at 6:14 PM, John Vines <vi...@apache.org> wrote:
>> > On that note, I was wondering if there were any suggestions for how to
>> deal
>> > with the laundry list of provided dependencies that Accumulo core has?
>> > Writing packages against it is a bit ugly if not using the accumulo
>> script
>> > to start. Are there any maven utilities to automatically dissect provided
>> > dependencies and make them included.
>> >
>> > Sent from my phone, please pardon the typos and brevity.
>> > On May 14, 2013 6:09 PM, "Keith Turner" <ke...@deenlo.com> wrote:
>> >
>> >> One note about option 4.  When using 1.4 users have to include hadoop
>> core
>> >> as a dependency in their pom. This must be done because the 1.4 Accumulo
>> >> pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps
>> in
>> >> the profile are provided?
>> >>
>> >>
>> >> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
>> wrote:
>> >>
>> >> > So, I've run into a problem with ACCUMULO-1402 that requires a larger
>> >> > discussion about how Accumulo 1.5.0 should support Hadoop2.
>> >> >
>> >> > The problem is basically that profiles should not contain
>> >> > dependencies, because profiles don't get activated transitively. A
>> >> > slide deck by the Maven developers point this out as a bad practice...
>> >> > yet it's a practice we rely on for our current implementation of
>> >> > Hadoop2 support
>> >> > (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>> >> > slide 80).
>> >> >
>> >> > What this means is that even if we go through the work of publishing
>> >> > binary artifacts compiled against Hadoop2, neither our Hadoop1
>> >> > binaries or our Hadoop2 binaries will be able to transitively resolve
>> >> > any dependencies defined in profiles. This has significant
>> >> > implications to user code that depends on Accumulo Maven artifacts.
>> >> > Every user will essentially have to explicitly add Hadoop dependencies
>> >> > for every Accumulo artifact that has dependencies on Hadoop, either
>> >> > because we directly or transitively depend on Hadoop (they'll have to
>> >> > peek into the profiles in our POMs and copy/paste the profile into
>> >> > their project). This becomes more complicated when we consider how
>> >> > users will try to use things like Instamo.
>> >> >
>> >> > There are workarounds, but none of them are really pleasant.
>> >> >
>> >> > 1. The best way to support both major Hadoop APIs is to have separate
>> >> > modules with separate dependencies directly in the POM. This is a fair
>> >> > amount of work, and in my opinion, would be too disruptive for 1.5.0.
>> >> > This solution also gets us separate binaries for separate supported
>> >> > versions, which is useful.
>> >> >
>> >> > 2. A second option, and the preferred one I think for 1.5.0, is to put
>> >> > a Hadoop2 patch in the branch's contrib directory
>> >> > (branches/1.5/contrib) that patches the POM files to support building
>> >> > against Hadoop2. (Acknowledgement to Keith for suggesting this
>> >> > solution.)
>> >> >
>> >> > 3. A third option is to fork Accumulo, and maintain two separate
>> >> > builds (a more traditional technique). This adds merging nightmare for
>> >> > features/patches, but gets around some reflection hacks that we may
>> >> > have been motivated to do in the past. I'm not a fan of this option,
>> >> > particularly because I don't want to replicate the fork nightmare that
>> >> > has been the history of early Hadoop itself.
>> >> >
>> >> > 4. The last option is to do nothing and to continue to build with the
>> >> > separate profiles as we are, and make users discover and specify
>> >> > transitive dependencies entirely on their own. I think this is the
>> >> > worst option, as it essentially amounts to "ignore the problem".
>> >> >
>> >> > At the very least, it does not seem reasonable to complete
>> >> > ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>> >> >
>> >> > Thoughts? Discussion? Vote on option?
>> >> >
>> >> > --
>> >> > Christopher L Tubbs II
>> >> > http://gravatar.com/ctubbsii
>> >> >
>> >>
>>

Re: Hadoop 2 compatibility issues - tangent

Posted by John Vines <vi...@apache.org>.
Awesome Chris, thanks. I didn't know where to begin looking for that one.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 7:11 PM, "Christopher" <ct...@apache.org> wrote:

> With the right configuration, you could use the copy-dependencies goal
> of the maven-dependency-plugin to gather your dependencies to one
> place.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Tue, May 14, 2013 at 6:14 PM, John Vines <vi...@apache.org> wrote:
> > On that note, I was wondering if there were any suggestions for how to
> deal
> > with the laundry list of provided dependencies that Accumulo core has?
> > Writing packages against it is a bit ugly if not using the accumulo
> script
> > to start. Are there any maven utilities to automatically dissect provided
> > dependencies and make them included.
> >
> > Sent from my phone, please pardon the typos and brevity.
> > On May 14, 2013 6:09 PM, "Keith Turner" <ke...@deenlo.com> wrote:
> >
> >> One note about option 4.  When using 1.4 users have to include hadoop
> core
> >> as a dependency in their pom. This must be done because the 1.4 Accumulo
> >> pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps
> in
> >> the profile are provided?
> >>
> >>
> >> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
> wrote:
> >>
> >> > So, I've run into a problem with ACCUMULO-1402 that requires a larger
> >> > discussion about how Accumulo 1.5.0 should support Hadoop2.
> >> >
> >> > The problem is basically that profiles should not contain
> >> > dependencies, because profiles don't get activated transitively. A
> >> > slide deck by the Maven developers point this out as a bad practice...
> >> > yet it's a practice we rely on for our current implementation of
> >> > Hadoop2 support
> >> > (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >> > slide 80).
> >> >
> >> > What this means is that even if we go through the work of publishing
> >> > binary artifacts compiled against Hadoop2, neither our Hadoop1
> >> > binaries or our Hadoop2 binaries will be able to transitively resolve
> >> > any dependencies defined in profiles. This has significant
> >> > implications to user code that depends on Accumulo Maven artifacts.
> >> > Every user will essentially have to explicitly add Hadoop dependencies
> >> > for every Accumulo artifact that has dependencies on Hadoop, either
> >> > because we directly or transitively depend on Hadoop (they'll have to
> >> > peek into the profiles in our POMs and copy/paste the profile into
> >> > their project). This becomes more complicated when we consider how
> >> > users will try to use things like Instamo.
> >> >
> >> > There are workarounds, but none of them are really pleasant.
> >> >
> >> > 1. The best way to support both major Hadoop APIs is to have separate
> >> > modules with separate dependencies directly in the POM. This is a fair
> >> > amount of work, and in my opinion, would be too disruptive for 1.5.0.
> >> > This solution also gets us separate binaries for separate supported
> >> > versions, which is useful.
> >> >
> >> > 2. A second option, and the preferred one I think for 1.5.0, is to put
> >> > a Hadoop2 patch in the branch's contrib directory
> >> > (branches/1.5/contrib) that patches the POM files to support building
> >> > against Hadoop2. (Acknowledgement to Keith for suggesting this
> >> > solution.)
> >> >
> >> > 3. A third option is to fork Accumulo, and maintain two separate
> >> > builds (a more traditional technique). This adds merging nightmare for
> >> > features/patches, but gets around some reflection hacks that we may
> >> > have been motivated to do in the past. I'm not a fan of this option,
> >> > particularly because I don't want to replicate the fork nightmare that
> >> > has been the history of early Hadoop itself.
> >> >
> >> > 4. The last option is to do nothing and to continue to build with the
> >> > separate profiles as we are, and make users discover and specify
> >> > transitive dependencies entirely on their own. I think this is the
> >> > worst option, as it essentially amounts to "ignore the problem".
> >> >
> >> > At the very least, it does not seem reasonable to complete
> >> > ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >> >
> >> > Thoughts? Discussion? Vote on option?
> >> >
> >> > --
> >> > Christopher L Tubbs II
> >> > http://gravatar.com/ctubbsii
> >> >
> >>
>

Re: Hadoop 2 compatibility issues - tangent

Posted by Christopher <ct...@apache.org>.
With the right configuration, you could use the copy-dependencies goal
of the maven-dependency-plugin to gather your dependencies to one
place.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 6:14 PM, John Vines <vi...@apache.org> wrote:
> On that note, I was wondering if there were any suggestions for how to deal
> with the laundry list of provided dependencies that Accumulo core has?
> Writing packages against it is a bit ugly if not using the accumulo script
> to start. Are there any maven utilities to automatically dissect provided
> dependencies and make them included.
>
> Sent from my phone, please pardon the typos and brevity.
> On May 14, 2013 6:09 PM, "Keith Turner" <ke...@deenlo.com> wrote:
>
>> One note about option 4.  When using 1.4 users have to include hadoop core
>> as a dependency in their pom. This must be done because the 1.4 Accumulo
>> pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps in
>> the profile are provided?
>>
>>
>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:
>>
>> > So, I've run into a problem with ACCUMULO-1402 that requires a larger
>> > discussion about how Accumulo 1.5.0 should support Hadoop2.
>> >
>> > The problem is basically that profiles should not contain
>> > dependencies, because profiles don't get activated transitively. A
>> > slide deck by the Maven developers point this out as a bad practice...
>> > yet it's a practice we rely on for our current implementation of
>> > Hadoop2 support
>> > (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>> > slide 80).
>> >
>> > What this means is that even if we go through the work of publishing
>> > binary artifacts compiled against Hadoop2, neither our Hadoop1
>> > binaries or our Hadoop2 binaries will be able to transitively resolve
>> > any dependencies defined in profiles. This has significant
>> > implications to user code that depends on Accumulo Maven artifacts.
>> > Every user will essentially have to explicitly add Hadoop dependencies
>> > for every Accumulo artifact that has dependencies on Hadoop, either
>> > because we directly or transitively depend on Hadoop (they'll have to
>> > peek into the profiles in our POMs and copy/paste the profile into
>> > their project). This becomes more complicated when we consider how
>> > users will try to use things like Instamo.
>> >
>> > There are workarounds, but none of them are really pleasant.
>> >
>> > 1. The best way to support both major Hadoop APIs is to have separate
>> > modules with separate dependencies directly in the POM. This is a fair
>> > amount of work, and in my opinion, would be too disruptive for 1.5.0.
>> > This solution also gets us separate binaries for separate supported
>> > versions, which is useful.
>> >
>> > 2. A second option, and the preferred one I think for 1.5.0, is to put
>> > a Hadoop2 patch in the branch's contrib directory
>> > (branches/1.5/contrib) that patches the POM files to support building
>> > against Hadoop2. (Acknowledgement to Keith for suggesting this
>> > solution.)
>> >
>> > 3. A third option is to fork Accumulo, and maintain two separate
>> > builds (a more traditional technique). This adds merging nightmare for
>> > features/patches, but gets around some reflection hacks that we may
>> > have been motivated to do in the past. I'm not a fan of this option,
>> > particularly because I don't want to replicate the fork nightmare that
>> > has been the history of early Hadoop itself.
>> >
>> > 4. The last option is to do nothing and to continue to build with the
>> > separate profiles as we are, and make users discover and specify
>> > transitive dependencies entirely on their own. I think this is the
>> > worst option, as it essentially amounts to "ignore the problem".
>> >
>> > At the very least, it does not seem reasonable to complete
>> > ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>> >
>> > Thoughts? Discussion? Vote on option?
>> >
>> > --
>> > Christopher L Tubbs II
>> > http://gravatar.com/ctubbsii
>> >
>>