You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by Christopher <ct...@apache.org> on 2013/05/14 22:40:23 UTC

Hadoop 2 compatibility issues

So, I've run into a problem with ACCUMULO-1402 that requires a larger
discussion about how Accumulo 1.5.0 should support Hadoop2.

The problem is basically that profiles should not contain
dependencies, because profiles don't get activated transitively. A
slide deck by the Maven developers point this out as a bad practice...
yet it's a practice we rely on for our current implementation of
Hadoop2 support
(http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
slide 80).

What this means is that even if we go through the work of publishing
binary artifacts compiled against Hadoop2, neither our Hadoop1
binaries or our Hadoop2 binaries will be able to transitively resolve
any dependencies defined in profiles. This has significant
implications to user code that depends on Accumulo Maven artifacts.
Every user will essentially have to explicitly add Hadoop dependencies
for every Accumulo artifact that has dependencies on Hadoop, either
because we directly or transitively depend on Hadoop (they'll have to
peek into the profiles in our POMs and copy/paste the profile into
their project). This becomes more complicated when we consider how
users will try to use things like Instamo.

There are workarounds, but none of them are really pleasant.

1. The best way to support both major Hadoop APIs is to have separate
modules with separate dependencies directly in the POM. This is a fair
amount of work, and in my opinion, would be too disruptive for 1.5.0.
This solution also gets us separate binaries for separate supported
versions, which is useful.

2. A second option, and the preferred one I think for 1.5.0, is to put
a Hadoop2 patch in the branch's contrib directory
(branches/1.5/contrib) that patches the POM files to support building
against Hadoop2. (Acknowledgement to Keith for suggesting this
solution.)

3. A third option is to fork Accumulo, and maintain two separate
builds (a more traditional technique). This adds merging nightmare for
features/patches, but gets around some reflection hacks that we may
have been motivated to do in the past. I'm not a fan of this option,
particularly because I don't want to replicate the fork nightmare that
has been the history of early Hadoop itself.

4. The last option is to do nothing and to continue to build with the
separate profiles as we are, and make users discover and specify
transitive dependencies entirely on their own. I think this is the
worst option, as it essentially amounts to "ignore the problem".

At the very least, it does not seem reasonable to complete
ACCUMULO-1402 for 1.5.0, given the complexity of this issue.

Thoughts? Discussion? Vote on option?

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues

Posted by Eric Newton <er...@gmail.com>.

Thanks Christopher for looking into this with your usual determination and
thoroughness.

-Eric



On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:

> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> discussion about how Accumulo 1.5.0 should support Hadoop2.
>
> The problem is basically that profiles should not contain
> dependencies, because profiles don't get activated transitively. A
> slide deck by the Maven developers point this out as a bad practice...
> yet it's a practice we rely on for our current implementation of
> Hadoop2 support
> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> slide 80).
>
> What this means is that even if we go through the work of publishing
> binary artifacts compiled against Hadoop2, neither our Hadoop1
> binaries or our Hadoop2 binaries will be able to transitively resolve
> any dependencies defined in profiles. This has significant
> implications to user code that depends on Accumulo Maven artifacts.
> Every user will essentially have to explicitly add Hadoop dependencies
> for every Accumulo artifact that has dependencies on Hadoop, either
> because we directly or transitively depend on Hadoop (they'll have to
> peek into the profiles in our POMs and copy/paste the profile into
> their project). This becomes more complicated when we consider how
> users will try to use things like Instamo.
>
> There are workarounds, but none of them are really pleasant.
>
> 1. The best way to support both major Hadoop APIs is to have separate
> modules with separate dependencies directly in the POM. This is a fair
> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> This solution also gets us separate binaries for separate supported
> versions, which is useful.
>
> 2. A second option, and the preferred one I think for 1.5.0, is to put
> a Hadoop2 patch in the branch's contrib directory
> (branches/1.5/contrib) that patches the POM files to support building
> against Hadoop2. (Acknowledgement to Keith for suggesting this
> solution.)
>
> 3. A third option is to fork Accumulo, and maintain two separate
> builds (a more traditional technique). This adds merging nightmare for
> features/patches, but gets around some reflection hacks that we may
> have been motivated to do in the past. I'm not a fan of this option,
> particularly because I don't want to replicate the fork nightmare that
> has been the history of early Hadoop itself.
>
> 4. The last option is to do nothing and to continue to build with the
> separate profiles as we are, and make users discover and specify
> transitive dependencies entirely on their own. I think this is the
> worst option, as it essentially amounts to "ignore the problem".
>
> At the very least, it does not seem reasonable to complete
> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>
> Thoughts? Discussion? Vote on option?
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>

Re: Hadoop 2 compatibility issues

Posted by Christopher <ct...@apache.org>.

Yes, they should add a dependency on Hadoop, if they use it. The
problem isn't just if they use Hadoop classes, though. It is that the
dependency is required for any code path where Accumulo requires
Hadoop... and this is unknown to the user, because the dependency tree
looks like Accumulo has no dependency on Hadoop at all. They certainly
won't know which Hadoop jars to add, without deep inspection of
Accumulo's profiles.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 4:52 PM, Sean Busbey <bu...@cloudera.com> wrote:
> If a user is referencing any of the Hadoop classes, aren't they supposed to
> add a dependency on the appropriate Hadoop artifact anyways?
>
> FWIW, option 4 is what Avro does. Their discussion:
>
> https://issues.apache.org/jira/browse/AVRO-1170
>
>
>
>
> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:
>
>> So, I've run into a problem with ACCUMULO-1402 that requires a larger
>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>
>> The problem is basically that profiles should not contain
>> dependencies, because profiles don't get activated transitively. A
>> slide deck by the Maven developers point this out as a bad practice...
>> yet it's a practice we rely on for our current implementation of
>> Hadoop2 support
>> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>> slide 80).
>>
>> What this means is that even if we go through the work of publishing
>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>> binaries or our Hadoop2 binaries will be able to transitively resolve
>> any dependencies defined in profiles. This has significant
>> implications to user code that depends on Accumulo Maven artifacts.
>> Every user will essentially have to explicitly add Hadoop dependencies
>> for every Accumulo artifact that has dependencies on Hadoop, either
>> because we directly or transitively depend on Hadoop (they'll have to
>> peek into the profiles in our POMs and copy/paste the profile into
>> their project). This becomes more complicated when we consider how
>> users will try to use things like Instamo.
>>
>> There are workarounds, but none of them are really pleasant.
>>
>> 1. The best way to support both major Hadoop APIs is to have separate
>> modules with separate dependencies directly in the POM. This is a fair
>> amount of work, and in my opinion, would be too disruptive for 1.5.0.
>> This solution also gets us separate binaries for separate supported
>> versions, which is useful.
>>
>> 2. A second option, and the preferred one I think for 1.5.0, is to put
>> a Hadoop2 patch in the branch's contrib directory
>> (branches/1.5/contrib) that patches the POM files to support building
>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>> solution.)
>>
>> 3. A third option is to fork Accumulo, and maintain two separate
>> builds (a more traditional technique). This adds merging nightmare for
>> features/patches, but gets around some reflection hacks that we may
>> have been motivated to do in the past. I'm not a fan of this option,
>> particularly because I don't want to replicate the fork nightmare that
>> has been the history of early Hadoop itself.
>>
>> 4. The last option is to do nothing and to continue to build with the
>> separate profiles as we are, and make users discover and specify
>> transitive dependencies entirely on their own. I think this is the
>> worst option, as it essentially amounts to "ignore the problem".
>>
>> At the very least, it does not seem reasonable to complete
>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>
>> Thoughts? Discussion? Vote on option?
>>
>> --
>> Christopher L Tubbs II
>> http://gravatar.com/ctubbsii
>>
>
>
>
> --
> Sean Busbey
> Solutions Architect
> Cloudera, Inc.
> Phone: MAN-VS-BEARD

Re: Hadoop 2 compatibility issues

Posted by Sean Busbey <bu...@cloudera.com>.

If a user is referencing any of the Hadoop classes, aren't they supposed to
add a dependency on the appropriate Hadoop artifact anyways?

FWIW, option 4 is what Avro does. Their discussion:

https://issues.apache.org/jira/browse/AVRO-1170




On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:

> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> discussion about how Accumulo 1.5.0 should support Hadoop2.
>
> The problem is basically that profiles should not contain
> dependencies, because profiles don't get activated transitively. A
> slide deck by the Maven developers point this out as a bad practice...
> yet it's a practice we rely on for our current implementation of
> Hadoop2 support
> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> slide 80).
>
> What this means is that even if we go through the work of publishing
> binary artifacts compiled against Hadoop2, neither our Hadoop1
> binaries or our Hadoop2 binaries will be able to transitively resolve
> any dependencies defined in profiles. This has significant
> implications to user code that depends on Accumulo Maven artifacts.
> Every user will essentially have to explicitly add Hadoop dependencies
> for every Accumulo artifact that has dependencies on Hadoop, either
> because we directly or transitively depend on Hadoop (they'll have to
> peek into the profiles in our POMs and copy/paste the profile into
> their project). This becomes more complicated when we consider how
> users will try to use things like Instamo.
>
> There are workarounds, but none of them are really pleasant.
>
> 1. The best way to support both major Hadoop APIs is to have separate
> modules with separate dependencies directly in the POM. This is a fair
> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> This solution also gets us separate binaries for separate supported
> versions, which is useful.
>
> 2. A second option, and the preferred one I think for 1.5.0, is to put
> a Hadoop2 patch in the branch's contrib directory
> (branches/1.5/contrib) that patches the POM files to support building
> against Hadoop2. (Acknowledgement to Keith for suggesting this
> solution.)
>
> 3. A third option is to fork Accumulo, and maintain two separate
> builds (a more traditional technique). This adds merging nightmare for
> features/patches, but gets around some reflection hacks that we may
> have been motivated to do in the past. I'm not a fan of this option,
> particularly because I don't want to replicate the fork nightmare that
> has been the history of early Hadoop itself.
>
> 4. The last option is to do nothing and to continue to build with the
> separate profiles as we are, and make users discover and specify
> transitive dependencies entirely on their own. I think this is the
> worst option, as it essentially amounts to "ignore the problem".
>
> At the very least, it does not seem reasonable to complete
> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>
> Thoughts? Discussion? Vote on option?
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>



-- 
Sean Busbey
Solutions Architect
Cloudera, Inc.
Phone: MAN-VS-BEARD

Re: Hadoop 2 compatibility issues

Posted by Josh Elser <jo...@gmail.com>.

I'm not sure what the "best" solution would be, but I'd easily assume 
any worthwhile solution would extend the 1.5.0 release date even farther 
than I'd be happy about. So, by that stance, I'm for #4 or another quick 
fix, even if it does perpetuate some sort of "hack".

On 05/14/2013 07:09 PM, Benson Margulies wrote:
> I just doesn't make very much sense to me to have two different GAV's
> for the very same .class files, just to get different dependencies in
> the poms. However, if someone really wanted that, I'd look to make
> some scripting that created this downstream from the main build.
This makes sense to me. Although, I don't know exactly how one would go 
about doing this, I trust Benson enough not to throw something 
non-feasible at us :)

Re: Hadoop 2 compatibility issues

Posted by Adam Fuchs <af...@apache.org>.

I also just snuck in that Hadoop 1/2 compatibility fix with JobContext
(ACCUMULO-1421). Not sure if that's the only change needed, but it should
be a step forward.

Adam



On Thu, May 16, 2013 at 11:23 AM, Eric Newton <er...@gmail.com> wrote:

> I've snuck some necessary changes in... doing integration testing on it
> right now.
>
> -Eric
>
>
>
> On Wed, May 15, 2013 at 8:03 PM, John Vines <vi...@apache.org> wrote:
>
> > I will gladly do it next week, but I'd rather not have it delay the
> > release. The question from there is, is doing this type of packaging
> change
> > too large to put in 1.5.1?
> >
> >
> > On Wed, May 15, 2013 at 2:44 PM, Christopher <ct...@apache.org>
> wrote:
> >
> > > So, I think that'd be great, if it works, but who is willing to do
> > > this work and get it in before I make another RC?
> > > I'd like to cut RC3 tomorrow if I have time. So, feel free to patch
> > > these in to get it to work before then... or, by the next RC if RC3
> > > fails to pass a vote.
> > >
> > > --
> > > Christopher L Tubbs II
> > > http://gravatar.com/ctubbsii
> > >
> > >
> > > On Wed, May 15, 2013 at 5:31 PM, Adam Fuchs <af...@apache.org> wrote:
> > > > It seems like the ideal option would be to have one binary build that
> > > > determines Hadoop version and switches appropriately at runtime. Has
> > > anyone
> > > > attempted to do this yet, and do we have an enumeration of the places
> > in
> > > > Accumulo code where the incompatibilities show up?
> > > >
> > > > One of the incompatibilities is in
> > org.apache.hadoop.mapreduce.JobContext
> > > > switching between an abstract class and an interface. This can be
> fixed
> > > > with something to the effect of:
> > > >
> > > >   public static Configuration getConfiguration(JobContext context) {
> > > >     Impl impl = new Impl();
> > > >     Configuration configuration = null;
> > > >     try {
> > > >       Class c =
> > > >
> > >
> >
> TestCompatibility.class.getClassLoader().loadClass("org.apache.hadoop.mapreduce.JobContext");
> > > >       Method m = c.getMethod("getConfiguration");
> > > >       Object o = m.invoke(context, new Object[0]);
> > > >       configuration = (Configuration)o;
> > > >     } catch (Exception e) {
> > > >       throw new RuntimeException(e);
> > > >     }
> > > >     return configuration;
> > > >   }
> > > >
> > > > Based on a test I just ran, using that getConfiguration method
> instead
> > of
> > > > just calling the getConfiguration method on context should avoid the
> > one
> > > > incompatibility. Maybe with a couple more changes like that we can
> get
> > > down
> > > > to one bytecode release for all known Hadoop versions?
> > > >
> > > > Adam
> > >
> >
>

Re: Hadoop 2 compatibility issues

Posted by Eric Newton <er...@gmail.com>.

I've snuck some necessary changes in... doing integration testing on it
right now.

-Eric



On Wed, May 15, 2013 at 8:03 PM, John Vines <vi...@apache.org> wrote:

> I will gladly do it next week, but I'd rather not have it delay the
> release. The question from there is, is doing this type of packaging change
> too large to put in 1.5.1?
>
>
> On Wed, May 15, 2013 at 2:44 PM, Christopher <ct...@apache.org> wrote:
>
> > So, I think that'd be great, if it works, but who is willing to do
> > this work and get it in before I make another RC?
> > I'd like to cut RC3 tomorrow if I have time. So, feel free to patch
> > these in to get it to work before then... or, by the next RC if RC3
> > fails to pass a vote.
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
> >
> > On Wed, May 15, 2013 at 5:31 PM, Adam Fuchs <af...@apache.org> wrote:
> > > It seems like the ideal option would be to have one binary build that
> > > determines Hadoop version and switches appropriately at runtime. Has
> > anyone
> > > attempted to do this yet, and do we have an enumeration of the places
> in
> > > Accumulo code where the incompatibilities show up?
> > >
> > > One of the incompatibilities is in
> org.apache.hadoop.mapreduce.JobContext
> > > switching between an abstract class and an interface. This can be fixed
> > > with something to the effect of:
> > >
> > >   public static Configuration getConfiguration(JobContext context) {
> > >     Impl impl = new Impl();
> > >     Configuration configuration = null;
> > >     try {
> > >       Class c =
> > >
> >
> TestCompatibility.class.getClassLoader().loadClass("org.apache.hadoop.mapreduce.JobContext");
> > >       Method m = c.getMethod("getConfiguration");
> > >       Object o = m.invoke(context, new Object[0]);
> > >       configuration = (Configuration)o;
> > >     } catch (Exception e) {
> > >       throw new RuntimeException(e);
> > >     }
> > >     return configuration;
> > >   }
> > >
> > > Based on a test I just ran, using that getConfiguration method instead
> of
> > > just calling the getConfiguration method on context should avoid the
> one
> > > incompatibility. Maybe with a couple more changes like that we can get
> > down
> > > to one bytecode release for all known Hadoop versions?
> > >
> > > Adam
> >
>

Re: Hadoop 2 compatibility issues

Posted by John Vines <vi...@apache.org>.

I will gladly do it next week, but I'd rather not have it delay the
release. The question from there is, is doing this type of packaging change
too large to put in 1.5.1?


On Wed, May 15, 2013 at 2:44 PM, Christopher <ct...@apache.org> wrote:

> So, I think that'd be great, if it works, but who is willing to do
> this work and get it in before I make another RC?
> I'd like to cut RC3 tomorrow if I have time. So, feel free to patch
> these in to get it to work before then... or, by the next RC if RC3
> fails to pass a vote.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Wed, May 15, 2013 at 5:31 PM, Adam Fuchs <af...@apache.org> wrote:
> > It seems like the ideal option would be to have one binary build that
> > determines Hadoop version and switches appropriately at runtime. Has
> anyone
> > attempted to do this yet, and do we have an enumeration of the places in
> > Accumulo code where the incompatibilities show up?
> >
> > One of the incompatibilities is in org.apache.hadoop.mapreduce.JobContext
> > switching between an abstract class and an interface. This can be fixed
> > with something to the effect of:
> >
> >   public static Configuration getConfiguration(JobContext context) {
> >     Impl impl = new Impl();
> >     Configuration configuration = null;
> >     try {
> >       Class c =
> >
> TestCompatibility.class.getClassLoader().loadClass("org.apache.hadoop.mapreduce.JobContext");
> >       Method m = c.getMethod("getConfiguration");
> >       Object o = m.invoke(context, new Object[0]);
> >       configuration = (Configuration)o;
> >     } catch (Exception e) {
> >       throw new RuntimeException(e);
> >     }
> >     return configuration;
> >   }
> >
> > Based on a test I just ran, using that getConfiguration method instead of
> > just calling the getConfiguration method on context should avoid the one
> > incompatibility. Maybe with a couple more changes like that we can get
> down
> > to one bytecode release for all known Hadoop versions?
> >
> > Adam
>

Re: Hadoop 2 compatibility issues

Posted by Christopher <ct...@apache.org>.

So, I think that'd be great, if it works, but who is willing to do
this work and get it in before I make another RC?
I'd like to cut RC3 tomorrow if I have time. So, feel free to patch
these in to get it to work before then... or, by the next RC if RC3
fails to pass a vote.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, May 15, 2013 at 5:31 PM, Adam Fuchs <af...@apache.org> wrote:
> It seems like the ideal option would be to have one binary build that
> determines Hadoop version and switches appropriately at runtime. Has anyone
> attempted to do this yet, and do we have an enumeration of the places in
> Accumulo code where the incompatibilities show up?
>
> One of the incompatibilities is in org.apache.hadoop.mapreduce.JobContext
> switching between an abstract class and an interface. This can be fixed
> with something to the effect of:
>
>   public static Configuration getConfiguration(JobContext context) {
>     Impl impl = new Impl();
>     Configuration configuration = null;
>     try {
>       Class c =
> TestCompatibility.class.getClassLoader().loadClass("org.apache.hadoop.mapreduce.JobContext");
>       Method m = c.getMethod("getConfiguration");
>       Object o = m.invoke(context, new Object[0]);
>       configuration = (Configuration)o;
>     } catch (Exception e) {
>       throw new RuntimeException(e);
>     }
>     return configuration;
>   }
>
> Based on a test I just ran, using that getConfiguration method instead of
> just calling the getConfiguration method on context should avoid the one
> incompatibility. Maybe with a couple more changes like that we can get down
> to one bytecode release for all known Hadoop versions?
>
> Adam

Re: Hadoop 2 compatibility issues

Posted by Adam Fuchs <af...@apache.org>.

It seems like the ideal option would be to have one binary build that
determines Hadoop version and switches appropriately at runtime. Has anyone
attempted to do this yet, and do we have an enumeration of the places in
Accumulo code where the incompatibilities show up?

One of the incompatibilities is in org.apache.hadoop.mapreduce.JobContext
switching between an abstract class and an interface. This can be fixed
with something to the effect of:

  public static Configuration getConfiguration(JobContext context) {
    Impl impl = new Impl();
    Configuration configuration = null;
    try {
      Class c =
TestCompatibility.class.getClassLoader().loadClass("org.apache.hadoop.mapreduce.JobContext");
      Method m = c.getMethod("getConfiguration");
      Object o = m.invoke(context, new Object[0]);
      configuration = (Configuration)o;
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
    return configuration;
  }

Based on a test I just ran, using that getConfiguration method instead of
just calling the getConfiguration method on context should avoid the one
incompatibility. Maybe with a couple more changes like that we can get down
to one bytecode release for all known Hadoop versions?

Adam

Re: Hadoop 2 compatibility issues

Posted by Christopher <ct...@apache.org>.

I'm very much partial to the "First" option, as it's far less effort
for approximately the same value (in my opinion, but in light of the
enthusiasm above for hadoop2, I could be very wrong on my assessment
of the value).

I'm going to upload a patch to ACCUMULO-1402 soon (tiny polishing
left), to demonstrate a way to push redundant jars, with an extra
classifier (though I still have to build twice, to avoid
maven-invoker-plugin complexity) for hadoop2-compatible binaries. If
you don't mind, I'll tag you with a request to review that patch, as
I'd like more details about the classifier issues you mention, in
context.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 8:27 PM, Benson Margulies <bi...@gmail.com> wrote:
> Maven will malfunction in various entertaining ways if you try to
> change the GAV of the output of the build using a profile.
>
> Maven will malfunction in various entertaining ways if you use
> classifiers on real-live-JAR files that get used as
> real-live-dependencies, because it has no concept of a
> pom-per-classifier.
>
> Where does this leave you/us? (I'm not sure that I've earned an 'us'
> recently around here.)
>
> First, I note that 'Apache releases are source releases'. So, one
> resort of scoundrels here would be to support only one hadoop in the
> convenience binaries that get pushed to Maven Central, and let other
> hadoop users take the source release and build for themselves.
>
> Second, I am reduced to suggesting an elaboration of the build in
> which some tool edits poms and runs builds. The maven-invoker-plugin
> could be used to run that, but a plain old script in a plain old
> language might be less painful.
>
> I appreciate that this may not be an appealing contribution to where
> things are, but it might be the best of the evil choices.
>
>
> On Tue, May 14, 2013 at 7:50 PM, John Vines <vi...@apache.org> wrote:
>> The compiled code is compiled code. There are no concerns of dependency
>> resolution. So I see no issues in using the profile to define the gav if
>> that is feasible.
>>
>> Sent from my phone, please pardon the typos and brevity.
>> On May 14, 2013 7:47 PM, "Christopher" <ct...@apache.org> wrote:
>>
>>> Response to Benson inline, but additional note here:
>>>
>>> It should be noted that the situation will be made worse for the
>>> solution I was considering for ACCUMULO-1402, which would move the
>>> accumulo artifacts, classified by the hadoop2 variant, into the
>>> profiles... meaning they will no longer resolve transitively when they
>>> did before. Can go into details on that ticket, if needed.
>>>
>>> On Tue, May 14, 2013 at 7:41 PM, Benson Margulies <bi...@gmail.com>
>>> wrote:
>>> > On Tue, May 14, 2013 at 7:36 PM, Christopher <ct...@apache.org>
>>> wrote:
>>> >> Benson-
>>> >>
>>> >> They produce different byte-code. That's why we're even considering
>>> >> this. ACCUMULO-1402 is the ticket under which our intent is to add
>>> >> classifiers, so that they can be distinguished.
>>> >
>>> > whoops, missed that.
>>> >
>>> > Then how do people succeed in just fixing up their dependencies and
>>> using it?
>>>
>>> The specific differences are things like changes from abstract class
>>> to an interface. Apparently an import of these do not produce
>>> compatible byte-code, even though the method signature looks the same.
>>>
>>> > In any case, speaking as a Maven-maven, classifiers are absolutely,
>>> > positively, a cure worse than the disease. If you want the details
>>> > just ask.
>>>
>>> Agreed. I just don't see a good alternative here.
>>>
>>> >>
>>> >> All-
>>> >>
>>> >> To Keith's point, I think perhaps all this concern is a non-issue...
>>> >> because as Keith points out, the dependencies in question are marked
>>> >> as "provided", and dependency resolution doesn't occur for provided
>>> >> dependencies anyway... so even if we leave off the profiles, we're in
>>> >> the same boat. Maybe not the boat we should be in... but certainly not
>>> >> a sinking one as I had first imagined. It's as afloat as it was
>>> >> before, when they were not in a profile, but still marked as
>>> >> "provided".
>>> >>
>>> >> --
>>> >> Christopher L Tubbs II
>>> >> http://gravatar.com/ctubbsii
>>> >>
>>> >>
>>> >> On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <
>>> bimargulies@gmail.com> wrote:
>>> >>> I just doesn't make very much sense to me to have two different GAV's
>>> >>> for the very same .class files, just to get different dependencies in
>>> >>> the poms. However, if someone really wanted that, I'd look to make
>>> >>> some scripting that created this downstream from the main build.
>>> >>>
>>> >>>
>>> >>> On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
>>> >>>> They're the same currently. I was requesting separate gavs for hadoop
>>> 2.
>>> >>>> It's been on the mailing list and jira.
>>> >>>>
>>> >>>> Sent from my phone, please pardon the typos and brevity.
>>> >>>> On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
>>> >>>>
>>> >>>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <
>>> bimargulies@gmail.com
>>> >>>>> >wrote:
>>> >>>>>
>>> >>>>> > I am a maven developer, and I'm offering this advice based on my
>>> >>>>> > understanding of reason why that generic advice is offered.
>>> >>>>> >
>>> >>>>> > If you have different profiles that _build different results_ but
>>> all
>>> >>>>> > deliver the same GAV, you have chaos.
>>> >>>>> >
>>> >>>>>
>>> >>>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
>>> >>>>>
>>> >>>>>
>>> >>>>> >
>>> >>>>> > If you have different profiles that test against different
>>> versions of
>>> >>>>> > dependencies, but all deliver the same byte code at the end of the
>>> >>>>> > day, you don't have chaos.
>>> >>>>> >
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
>>> >>>>> wrote:
>>> >>>>> > > I think it's interesting that Option 4 seems to be most
>>> preferred...
>>> >>>>> > > because it's the *only* option that is explicitly advised
>>> against by
>>> >>>>> > > the Maven developers (from the information I've read). I can see
>>> its
>>> >>>>> > > appeal, but I really don't think that we should introduce an
>>> explicit
>>> >>>>> > > problem for users (that applies to users using even the Hadoop
>>> version
>>> >>>>> > > we directly build against... not just those using Hadoop 2... I
>>> don't
>>> >>>>> > > know if that point was clear), to only partially support a
>>> version of
>>> >>>>> > > Hadoop that is still alpha and has never had a stable release.
>>> >>>>> > >
>>> >>>>> > > BTW, Option 4 was how I had have achieved a solution for
>>> >>>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this
>>> issue
>>> >>>>> > > outstanding, as it may exacerbate the problem.
>>> >>>>> > >
>>> >>>>> > > Another implication for Option 4 (the current "solution") is for
>>> >>>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it
>>> means that
>>> >>>>> > > the accumulo-maven-plugin will need to be configured like this:
>>> >>>>> > > <plugin>
>>> >>>>> > >   <groupId>org.apache.accumulo</groupId>
>>> >>>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
>>> >>>>> > >   <dependencies>
>>> >>>>> > >    ... all the required hadoop 1 dependencies to make the plugin
>>> work,
>>> >>>>> > > even though this version only works against hadoop 1 anyway...
>>> >>>>> > >   </dependencies>
>>> >>>>> > >   ...
>>> >>>>> > > </plugin>
>>> >>>>> > >
>>> >>>>> > > --
>>> >>>>> > > Christopher L Tubbs II
>>> >>>>> > > http://gravatar.com/ctubbsii
>>> >>>>> > >
>>> >>>>> > >
>>> >>>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <
>>> ctubbsii@apache.org>
>>> >>>>> > wrote:
>>> >>>>> > >> I think Option 2 is the best solution for "waiting until we
>>> have the
>>> >>>>> > >> time to solve the problem correctly", as it ensures that
>>> transitive
>>> >>>>> > >> dependencies work for the stable version of Hadoop, and using
>>> Hadoop2
>>> >>>>> > >> is a very simple documentation issue for how to apply the patch
>>> and
>>> >>>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a
>>> problem
>>> >>>>> > >> for users.
>>> >>>>> > >>
>>> >>>>> > >> Option 1 is how I'm tentatively thinking about fixing it
>>> properly in
>>> >>>>> > 1.6.0.
>>> >>>>> > >>
>>> >>>>> > >>
>>> >>>>> > >> --
>>> >>>>> > >> Christopher L Tubbs II
>>> >>>>> > >> http://gravatar.com/ctubbsii
>>> >>>>> > >>
>>> >>>>> > >>
>>> >>>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org>
>>> wrote:
>>> >>>>> > >>> I'm an advocate of option 4. You say that it's ignoring the
>>> problem,
>>> >>>>> > >>> whereas I think it's waiting until we have the time to solve
>>> the
>>> >>>>> > problem
>>> >>>>> > >>> correctly. Your reasoning for this is for standardizing for
>>> maven
>>> >>>>> > >>> conventions, but the other options, while more 'correct' from
>>> a maven
>>> >>>>> > >>> standpoint or a larger headache for our user base and
>>> ourselves. In
>>> >>>>> > either
>>> >>>>> > >>> case, we're going to be breaking some sort of convention, and
>>> while
>>> >>>>> > it's
>>> >>>>> > >>> not good, we should be doing the one that's less bad for US.
>>> The
>>> >>>>> > important
>>> >>>>> > >>> thing here, now, is that the poms work and we should go with
>>> the
>>> >>>>> method
>>> >>>>> > >>> that leaves the work minimal for our end users to utilize them.
>>> >>>>> > >>>
>>> >>>>> > >>> I do agree that 1. is the correct option in the long run. More
>>> >>>>> > >>> specifically, I think it boils down to having a single module
>>> >>>>> > compatibility
>>> >>>>> > >>> layer, which is how hbase deals with this issue. But like you
>>> said,
>>> >>>>> we
>>> >>>>> > >>> don't have the time to engineer a proper solution. So let
>>> sleeping
>>> >>>>> > dogs lie
>>> >>>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we
>>> have
>>> >>>>> the
>>> >>>>> > >>> cycles to do it right.
>>> >>>>> > >>>
>>> >>>>> > >>>
>>> >>>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <
>>> ctubbsii@apache.org>
>>> >>>>> > wrote:
>>> >>>>> > >>>
>>> >>>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
>>> >>>>> larger
>>> >>>>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>> >>>>> > >>>>
>>> >>>>> > >>>> The problem is basically that profiles should not contain
>>> >>>>> > >>>> dependencies, because profiles don't get activated
>>> transitively. A
>>> >>>>> > >>>> slide deck by the Maven developers point this out as a bad
>>> >>>>> practice...
>>> >>>>> > >>>> yet it's a practice we rely on for our current implementation
>>> of
>>> >>>>> > >>>> Hadoop2 support
>>> >>>>> > >>>> (
>>> >>>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>>> >>>>> > >>>> slide 80).
>>> >>>>> > >>>>
>>> >>>>> > >>>> What this means is that even if we go through the work of
>>> publishing
>>> >>>>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>>> >>>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
>>> >>>>> resolve
>>> >>>>> > >>>> any dependencies defined in profiles. This has significant
>>> >>>>> > >>>> implications to user code that depends on Accumulo Maven
>>> artifacts.
>>> >>>>> > >>>> Every user will essentially have to explicitly add Hadoop
>>> >>>>> dependencies
>>> >>>>> > >>>> for every Accumulo artifact that has dependencies on Hadoop,
>>> either
>>> >>>>> > >>>> because we directly or transitively depend on Hadoop (they'll
>>> have
>>> >>>>> to
>>> >>>>> > >>>> peek into the profiles in our POMs and copy/paste the profile
>>> into
>>> >>>>> > >>>> their project). This becomes more complicated when we
>>> consider how
>>> >>>>> > >>>> users will try to use things like Instamo.
>>> >>>>> > >>>>
>>> >>>>> > >>>> There are workarounds, but none of them are really pleasant.
>>> >>>>> > >>>>
>>> >>>>> > >>>> 1. The best way to support both major Hadoop APIs is to have
>>> >>>>> separate
>>> >>>>> > >>>> modules with separate dependencies directly in the POM. This
>>> is a
>>> >>>>> fair
>>> >>>>> > >>>> amount of work, and in my opinion, would be too disruptive for
>>> >>>>> 1.5.0.
>>> >>>>> > >>>> This solution also gets us separate binaries for separate
>>> supported
>>> >>>>> > >>>> versions, which is useful.
>>> >>>>> > >>>>
>>> >>>>> > >>>> 2. A second option, and the preferred one I think for 1.5.0,
>>> is to
>>> >>>>> put
>>> >>>>> > >>>> a Hadoop2 patch in the branch's contrib directory
>>> >>>>> > >>>> (branches/1.5/contrib) that patches the POM files to support
>>> >>>>> building
>>> >>>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>>> >>>>> > >>>> solution.)
>>> >>>>> > >>>>
>>> >>>>> > >>>> 3. A third option is to fork Accumulo, and maintain two
>>> separate
>>> >>>>> > >>>> builds (a more traditional technique). This adds merging
>>> nightmare
>>> >>>>> for
>>> >>>>> > >>>> features/patches, but gets around some reflection hacks that
>>> we may
>>> >>>>> > >>>> have been motivated to do in the past. I'm not a fan of this
>>> option,
>>> >>>>> > >>>> particularly because I don't want to replicate the fork
>>> nightmare
>>> >>>>> that
>>> >>>>> > >>>> has been the history of early Hadoop itself.
>>> >>>>> > >>>>
>>> >>>>> > >>>> 4. The last option is to do nothing and to continue to build
>>> with
>>> >>>>> the
>>> >>>>> > >>>> separate profiles as we are, and make users discover and
>>> specify
>>> >>>>> > >>>> transitive dependencies entirely on their own. I think this
>>> is the
>>> >>>>> > >>>> worst option, as it essentially amounts to "ignore the
>>> problem".
>>> >>>>> > >>>>
>>> >>>>> > >>>> At the very least, it does not seem reasonable to complete
>>> >>>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>> >>>>> > >>>>
>>> >>>>> > >>>> Thoughts? Discussion? Vote on option?
>>> >>>>> > >>>>
>>> >>>>> > >>>> --
>>> >>>>> > >>>> Christopher L Tubbs II
>>> >>>>> > >>>> http://gravatar.com/ctubbsii
>>> >>>>> > >>>>
>>> >>>>> >
>>> >>>>>
>>>

Re: Hadoop 2 compatibility issues

Posted by Benson Margulies <bi...@gmail.com>.

Maven will malfunction in various entertaining ways if you try to
change the GAV of the output of the build using a profile.

Maven will malfunction in various entertaining ways if you use
classifiers on real-live-JAR files that get used as
real-live-dependencies, because it has no concept of a
pom-per-classifier.

Where does this leave you/us? (I'm not sure that I've earned an 'us'
recently around here.)

First, I note that 'Apache releases are source releases'. So, one
resort of scoundrels here would be to support only one hadoop in the
convenience binaries that get pushed to Maven Central, and let other
hadoop users take the source release and build for themselves.

Second, I am reduced to suggesting an elaboration of the build in
which some tool edits poms and runs builds. The maven-invoker-plugin
could be used to run that, but a plain old script in a plain old
language might be less painful.

I appreciate that this may not be an appealing contribution to where
things are, but it might be the best of the evil choices.


On Tue, May 14, 2013 at 7:50 PM, John Vines <vi...@apache.org> wrote:
> The compiled code is compiled code. There are no concerns of dependency
> resolution. So I see no issues in using the profile to define the gav if
> that is feasible.
>
> Sent from my phone, please pardon the typos and brevity.
> On May 14, 2013 7:47 PM, "Christopher" <ct...@apache.org> wrote:
>
>> Response to Benson inline, but additional note here:
>>
>> It should be noted that the situation will be made worse for the
>> solution I was considering for ACCUMULO-1402, which would move the
>> accumulo artifacts, classified by the hadoop2 variant, into the
>> profiles... meaning they will no longer resolve transitively when they
>> did before. Can go into details on that ticket, if needed.
>>
>> On Tue, May 14, 2013 at 7:41 PM, Benson Margulies <bi...@gmail.com>
>> wrote:
>> > On Tue, May 14, 2013 at 7:36 PM, Christopher <ct...@apache.org>
>> wrote:
>> >> Benson-
>> >>
>> >> They produce different byte-code. That's why we're even considering
>> >> this. ACCUMULO-1402 is the ticket under which our intent is to add
>> >> classifiers, so that they can be distinguished.
>> >
>> > whoops, missed that.
>> >
>> > Then how do people succeed in just fixing up their dependencies and
>> using it?
>>
>> The specific differences are things like changes from abstract class
>> to an interface. Apparently an import of these do not produce
>> compatible byte-code, even though the method signature looks the same.
>>
>> > In any case, speaking as a Maven-maven, classifiers are absolutely,
>> > positively, a cure worse than the disease. If you want the details
>> > just ask.
>>
>> Agreed. I just don't see a good alternative here.
>>
>> >>
>> >> All-
>> >>
>> >> To Keith's point, I think perhaps all this concern is a non-issue...
>> >> because as Keith points out, the dependencies in question are marked
>> >> as "provided", and dependency resolution doesn't occur for provided
>> >> dependencies anyway... so even if we leave off the profiles, we're in
>> >> the same boat. Maybe not the boat we should be in... but certainly not
>> >> a sinking one as I had first imagined. It's as afloat as it was
>> >> before, when they were not in a profile, but still marked as
>> >> "provided".
>> >>
>> >> --
>> >> Christopher L Tubbs II
>> >> http://gravatar.com/ctubbsii
>> >>
>> >>
>> >> On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <
>> bimargulies@gmail.com> wrote:
>> >>> I just doesn't make very much sense to me to have two different GAV's
>> >>> for the very same .class files, just to get different dependencies in
>> >>> the poms. However, if someone really wanted that, I'd look to make
>> >>> some scripting that created this downstream from the main build.
>> >>>
>> >>>
>> >>> On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
>> >>>> They're the same currently. I was requesting separate gavs for hadoop
>> 2.
>> >>>> It's been on the mailing list and jira.
>> >>>>
>> >>>> Sent from my phone, please pardon the typos and brevity.
>> >>>> On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
>> >>>>
>> >>>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <
>> bimargulies@gmail.com
>> >>>>> >wrote:
>> >>>>>
>> >>>>> > I am a maven developer, and I'm offering this advice based on my
>> >>>>> > understanding of reason why that generic advice is offered.
>> >>>>> >
>> >>>>> > If you have different profiles that _build different results_ but
>> all
>> >>>>> > deliver the same GAV, you have chaos.
>> >>>>> >
>> >>>>>
>> >>>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
>> >>>>>
>> >>>>>
>> >>>>> >
>> >>>>> > If you have different profiles that test against different
>> versions of
>> >>>>> > dependencies, but all deliver the same byte code at the end of the
>> >>>>> > day, you don't have chaos.
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
>> >>>>> wrote:
>> >>>>> > > I think it's interesting that Option 4 seems to be most
>> preferred...
>> >>>>> > > because it's the *only* option that is explicitly advised
>> against by
>> >>>>> > > the Maven developers (from the information I've read). I can see
>> its
>> >>>>> > > appeal, but I really don't think that we should introduce an
>> explicit
>> >>>>> > > problem for users (that applies to users using even the Hadoop
>> version
>> >>>>> > > we directly build against... not just those using Hadoop 2... I
>> don't
>> >>>>> > > know if that point was clear), to only partially support a
>> version of
>> >>>>> > > Hadoop that is still alpha and has never had a stable release.
>> >>>>> > >
>> >>>>> > > BTW, Option 4 was how I had have achieved a solution for
>> >>>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this
>> issue
>> >>>>> > > outstanding, as it may exacerbate the problem.
>> >>>>> > >
>> >>>>> > > Another implication for Option 4 (the current "solution") is for
>> >>>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it
>> means that
>> >>>>> > > the accumulo-maven-plugin will need to be configured like this:
>> >>>>> > > <plugin>
>> >>>>> > >   <groupId>org.apache.accumulo</groupId>
>> >>>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
>> >>>>> > >   <dependencies>
>> >>>>> > >    ... all the required hadoop 1 dependencies to make the plugin
>> work,
>> >>>>> > > even though this version only works against hadoop 1 anyway...
>> >>>>> > >   </dependencies>
>> >>>>> > >   ...
>> >>>>> > > </plugin>
>> >>>>> > >
>> >>>>> > > --
>> >>>>> > > Christopher L Tubbs II
>> >>>>> > > http://gravatar.com/ctubbsii
>> >>>>> > >
>> >>>>> > >
>> >>>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <
>> ctubbsii@apache.org>
>> >>>>> > wrote:
>> >>>>> > >> I think Option 2 is the best solution for "waiting until we
>> have the
>> >>>>> > >> time to solve the problem correctly", as it ensures that
>> transitive
>> >>>>> > >> dependencies work for the stable version of Hadoop, and using
>> Hadoop2
>> >>>>> > >> is a very simple documentation issue for how to apply the patch
>> and
>> >>>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a
>> problem
>> >>>>> > >> for users.
>> >>>>> > >>
>> >>>>> > >> Option 1 is how I'm tentatively thinking about fixing it
>> properly in
>> >>>>> > 1.6.0.
>> >>>>> > >>
>> >>>>> > >>
>> >>>>> > >> --
>> >>>>> > >> Christopher L Tubbs II
>> >>>>> > >> http://gravatar.com/ctubbsii
>> >>>>> > >>
>> >>>>> > >>
>> >>>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org>
>> wrote:
>> >>>>> > >>> I'm an advocate of option 4. You say that it's ignoring the
>> problem,
>> >>>>> > >>> whereas I think it's waiting until we have the time to solve
>> the
>> >>>>> > problem
>> >>>>> > >>> correctly. Your reasoning for this is for standardizing for
>> maven
>> >>>>> > >>> conventions, but the other options, while more 'correct' from
>> a maven
>> >>>>> > >>> standpoint or a larger headache for our user base and
>> ourselves. In
>> >>>>> > either
>> >>>>> > >>> case, we're going to be breaking some sort of convention, and
>> while
>> >>>>> > it's
>> >>>>> > >>> not good, we should be doing the one that's less bad for US.
>> The
>> >>>>> > important
>> >>>>> > >>> thing here, now, is that the poms work and we should go with
>> the
>> >>>>> method
>> >>>>> > >>> that leaves the work minimal for our end users to utilize them.
>> >>>>> > >>>
>> >>>>> > >>> I do agree that 1. is the correct option in the long run. More
>> >>>>> > >>> specifically, I think it boils down to having a single module
>> >>>>> > compatibility
>> >>>>> > >>> layer, which is how hbase deals with this issue. But like you
>> said,
>> >>>>> we
>> >>>>> > >>> don't have the time to engineer a proper solution. So let
>> sleeping
>> >>>>> > dogs lie
>> >>>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we
>> have
>> >>>>> the
>> >>>>> > >>> cycles to do it right.
>> >>>>> > >>>
>> >>>>> > >>>
>> >>>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <
>> ctubbsii@apache.org>
>> >>>>> > wrote:
>> >>>>> > >>>
>> >>>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
>> >>>>> larger
>> >>>>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>> >>>>> > >>>>
>> >>>>> > >>>> The problem is basically that profiles should not contain
>> >>>>> > >>>> dependencies, because profiles don't get activated
>> transitively. A
>> >>>>> > >>>> slide deck by the Maven developers point this out as a bad
>> >>>>> practice...
>> >>>>> > >>>> yet it's a practice we rely on for our current implementation
>> of
>> >>>>> > >>>> Hadoop2 support
>> >>>>> > >>>> (
>> >>>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>> >>>>> > >>>> slide 80).
>> >>>>> > >>>>
>> >>>>> > >>>> What this means is that even if we go through the work of
>> publishing
>> >>>>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>> >>>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
>> >>>>> resolve
>> >>>>> > >>>> any dependencies defined in profiles. This has significant
>> >>>>> > >>>> implications to user code that depends on Accumulo Maven
>> artifacts.
>> >>>>> > >>>> Every user will essentially have to explicitly add Hadoop
>> >>>>> dependencies
>> >>>>> > >>>> for every Accumulo artifact that has dependencies on Hadoop,
>> either
>> >>>>> > >>>> because we directly or transitively depend on Hadoop (they'll
>> have
>> >>>>> to
>> >>>>> > >>>> peek into the profiles in our POMs and copy/paste the profile
>> into
>> >>>>> > >>>> their project). This becomes more complicated when we
>> consider how
>> >>>>> > >>>> users will try to use things like Instamo.
>> >>>>> > >>>>
>> >>>>> > >>>> There are workarounds, but none of them are really pleasant.
>> >>>>> > >>>>
>> >>>>> > >>>> 1. The best way to support both major Hadoop APIs is to have
>> >>>>> separate
>> >>>>> > >>>> modules with separate dependencies directly in the POM. This
>> is a
>> >>>>> fair
>> >>>>> > >>>> amount of work, and in my opinion, would be too disruptive for
>> >>>>> 1.5.0.
>> >>>>> > >>>> This solution also gets us separate binaries for separate
>> supported
>> >>>>> > >>>> versions, which is useful.
>> >>>>> > >>>>
>> >>>>> > >>>> 2. A second option, and the preferred one I think for 1.5.0,
>> is to
>> >>>>> put
>> >>>>> > >>>> a Hadoop2 patch in the branch's contrib directory
>> >>>>> > >>>> (branches/1.5/contrib) that patches the POM files to support
>> >>>>> building
>> >>>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>> >>>>> > >>>> solution.)
>> >>>>> > >>>>
>> >>>>> > >>>> 3. A third option is to fork Accumulo, and maintain two
>> separate
>> >>>>> > >>>> builds (a more traditional technique). This adds merging
>> nightmare
>> >>>>> for
>> >>>>> > >>>> features/patches, but gets around some reflection hacks that
>> we may
>> >>>>> > >>>> have been motivated to do in the past. I'm not a fan of this
>> option,
>> >>>>> > >>>> particularly because I don't want to replicate the fork
>> nightmare
>> >>>>> that
>> >>>>> > >>>> has been the history of early Hadoop itself.
>> >>>>> > >>>>
>> >>>>> > >>>> 4. The last option is to do nothing and to continue to build
>> with
>> >>>>> the
>> >>>>> > >>>> separate profiles as we are, and make users discover and
>> specify
>> >>>>> > >>>> transitive dependencies entirely on their own. I think this
>> is the
>> >>>>> > >>>> worst option, as it essentially amounts to "ignore the
>> problem".
>> >>>>> > >>>>
>> >>>>> > >>>> At the very least, it does not seem reasonable to complete
>> >>>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>> >>>>> > >>>>
>> >>>>> > >>>> Thoughts? Discussion? Vote on option?
>> >>>>> > >>>>
>> >>>>> > >>>> --
>> >>>>> > >>>> Christopher L Tubbs II
>> >>>>> > >>>> http://gravatar.com/ctubbsii
>> >>>>> > >>>>
>> >>>>> >
>> >>>>>
>>

Re: Hadoop 2 compatibility issues

Posted by John Vines <vi...@apache.org>.

The compiled code is compiled code. There are no concerns of dependency
resolution. So I see no issues in using the profile to define the gav if
that is feasible.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 7:47 PM, "Christopher" <ct...@apache.org> wrote:

> Response to Benson inline, but additional note here:
>
> It should be noted that the situation will be made worse for the
> solution I was considering for ACCUMULO-1402, which would move the
> accumulo artifacts, classified by the hadoop2 variant, into the
> profiles... meaning they will no longer resolve transitively when they
> did before. Can go into details on that ticket, if needed.
>
> On Tue, May 14, 2013 at 7:41 PM, Benson Margulies <bi...@gmail.com>
> wrote:
> > On Tue, May 14, 2013 at 7:36 PM, Christopher <ct...@apache.org>
> wrote:
> >> Benson-
> >>
> >> They produce different byte-code. That's why we're even considering
> >> this. ACCUMULO-1402 is the ticket under which our intent is to add
> >> classifiers, so that they can be distinguished.
> >
> > whoops, missed that.
> >
> > Then how do people succeed in just fixing up their dependencies and
> using it?
>
> The specific differences are things like changes from abstract class
> to an interface. Apparently an import of these do not produce
> compatible byte-code, even though the method signature looks the same.
>
> > In any case, speaking as a Maven-maven, classifiers are absolutely,
> > positively, a cure worse than the disease. If you want the details
> > just ask.
>
> Agreed. I just don't see a good alternative here.
>
> >>
> >> All-
> >>
> >> To Keith's point, I think perhaps all this concern is a non-issue...
> >> because as Keith points out, the dependencies in question are marked
> >> as "provided", and dependency resolution doesn't occur for provided
> >> dependencies anyway... so even if we leave off the profiles, we're in
> >> the same boat. Maybe not the boat we should be in... but certainly not
> >> a sinking one as I had first imagined. It's as afloat as it was
> >> before, when they were not in a profile, but still marked as
> >> "provided".
> >>
> >> --
> >> Christopher L Tubbs II
> >> http://gravatar.com/ctubbsii
> >>
> >>
> >> On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <
> bimargulies@gmail.com> wrote:
> >>> I just doesn't make very much sense to me to have two different GAV's
> >>> for the very same .class files, just to get different dependencies in
> >>> the poms. However, if someone really wanted that, I'd look to make
> >>> some scripting that created this downstream from the main build.
> >>>
> >>>
> >>> On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
> >>>> They're the same currently. I was requesting separate gavs for hadoop
> 2.
> >>>> It's been on the mailing list and jira.
> >>>>
> >>>> Sent from my phone, please pardon the typos and brevity.
> >>>> On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
> >>>>
> >>>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <
> bimargulies@gmail.com
> >>>>> >wrote:
> >>>>>
> >>>>> > I am a maven developer, and I'm offering this advice based on my
> >>>>> > understanding of reason why that generic advice is offered.
> >>>>> >
> >>>>> > If you have different profiles that _build different results_ but
> all
> >>>>> > deliver the same GAV, you have chaos.
> >>>>> >
> >>>>>
> >>>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
> >>>>>
> >>>>>
> >>>>> >
> >>>>> > If you have different profiles that test against different
> versions of
> >>>>> > dependencies, but all deliver the same byte code at the end of the
> >>>>> > day, you don't have chaos.
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
> >>>>> wrote:
> >>>>> > > I think it's interesting that Option 4 seems to be most
> preferred...
> >>>>> > > because it's the *only* option that is explicitly advised
> against by
> >>>>> > > the Maven developers (from the information I've read). I can see
> its
> >>>>> > > appeal, but I really don't think that we should introduce an
> explicit
> >>>>> > > problem for users (that applies to users using even the Hadoop
> version
> >>>>> > > we directly build against... not just those using Hadoop 2... I
> don't
> >>>>> > > know if that point was clear), to only partially support a
> version of
> >>>>> > > Hadoop that is still alpha and has never had a stable release.
> >>>>> > >
> >>>>> > > BTW, Option 4 was how I had have achieved a solution for
> >>>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this
> issue
> >>>>> > > outstanding, as it may exacerbate the problem.
> >>>>> > >
> >>>>> > > Another implication for Option 4 (the current "solution") is for
> >>>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it
> means that
> >>>>> > > the accumulo-maven-plugin will need to be configured like this:
> >>>>> > > <plugin>
> >>>>> > >   <groupId>org.apache.accumulo</groupId>
> >>>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
> >>>>> > >   <dependencies>
> >>>>> > >    ... all the required hadoop 1 dependencies to make the plugin
> work,
> >>>>> > > even though this version only works against hadoop 1 anyway...
> >>>>> > >   </dependencies>
> >>>>> > >   ...
> >>>>> > > </plugin>
> >>>>> > >
> >>>>> > > --
> >>>>> > > Christopher L Tubbs II
> >>>>> > > http://gravatar.com/ctubbsii
> >>>>> > >
> >>>>> > >
> >>>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <
> ctubbsii@apache.org>
> >>>>> > wrote:
> >>>>> > >> I think Option 2 is the best solution for "waiting until we
> have the
> >>>>> > >> time to solve the problem correctly", as it ensures that
> transitive
> >>>>> > >> dependencies work for the stable version of Hadoop, and using
> Hadoop2
> >>>>> > >> is a very simple documentation issue for how to apply the patch
> and
> >>>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a
> problem
> >>>>> > >> for users.
> >>>>> > >>
> >>>>> > >> Option 1 is how I'm tentatively thinking about fixing it
> properly in
> >>>>> > 1.6.0.
> >>>>> > >>
> >>>>> > >>
> >>>>> > >> --
> >>>>> > >> Christopher L Tubbs II
> >>>>> > >> http://gravatar.com/ctubbsii
> >>>>> > >>
> >>>>> > >>
> >>>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org>
> wrote:
> >>>>> > >>> I'm an advocate of option 4. You say that it's ignoring the
> problem,
> >>>>> > >>> whereas I think it's waiting until we have the time to solve
> the
> >>>>> > problem
> >>>>> > >>> correctly. Your reasoning for this is for standardizing for
> maven
> >>>>> > >>> conventions, but the other options, while more 'correct' from
> a maven
> >>>>> > >>> standpoint or a larger headache for our user base and
> ourselves. In
> >>>>> > either
> >>>>> > >>> case, we're going to be breaking some sort of convention, and
> while
> >>>>> > it's
> >>>>> > >>> not good, we should be doing the one that's less bad for US.
> The
> >>>>> > important
> >>>>> > >>> thing here, now, is that the poms work and we should go with
> the
> >>>>> method
> >>>>> > >>> that leaves the work minimal for our end users to utilize them.
> >>>>> > >>>
> >>>>> > >>> I do agree that 1. is the correct option in the long run. More
> >>>>> > >>> specifically, I think it boils down to having a single module
> >>>>> > compatibility
> >>>>> > >>> layer, which is how hbase deals with this issue. But like you
> said,
> >>>>> we
> >>>>> > >>> don't have the time to engineer a proper solution. So let
> sleeping
> >>>>> > dogs lie
> >>>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we
> have
> >>>>> the
> >>>>> > >>> cycles to do it right.
> >>>>> > >>>
> >>>>> > >>>
> >>>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <
> ctubbsii@apache.org>
> >>>>> > wrote:
> >>>>> > >>>
> >>>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
> >>>>> larger
> >>>>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
> >>>>> > >>>>
> >>>>> > >>>> The problem is basically that profiles should not contain
> >>>>> > >>>> dependencies, because profiles don't get activated
> transitively. A
> >>>>> > >>>> slide deck by the Maven developers point this out as a bad
> >>>>> practice...
> >>>>> > >>>> yet it's a practice we rely on for our current implementation
> of
> >>>>> > >>>> Hadoop2 support
> >>>>> > >>>> (
> >>>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >>>>> > >>>> slide 80).
> >>>>> > >>>>
> >>>>> > >>>> What this means is that even if we go through the work of
> publishing
> >>>>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
> >>>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
> >>>>> resolve
> >>>>> > >>>> any dependencies defined in profiles. This has significant
> >>>>> > >>>> implications to user code that depends on Accumulo Maven
> artifacts.
> >>>>> > >>>> Every user will essentially have to explicitly add Hadoop
> >>>>> dependencies
> >>>>> > >>>> for every Accumulo artifact that has dependencies on Hadoop,
> either
> >>>>> > >>>> because we directly or transitively depend on Hadoop (they'll
> have
> >>>>> to
> >>>>> > >>>> peek into the profiles in our POMs and copy/paste the profile
> into
> >>>>> > >>>> their project). This becomes more complicated when we
> consider how
> >>>>> > >>>> users will try to use things like Instamo.
> >>>>> > >>>>
> >>>>> > >>>> There are workarounds, but none of them are really pleasant.
> >>>>> > >>>>
> >>>>> > >>>> 1. The best way to support both major Hadoop APIs is to have
> >>>>> separate
> >>>>> > >>>> modules with separate dependencies directly in the POM. This
> is a
> >>>>> fair
> >>>>> > >>>> amount of work, and in my opinion, would be too disruptive for
> >>>>> 1.5.0.
> >>>>> > >>>> This solution also gets us separate binaries for separate
> supported
> >>>>> > >>>> versions, which is useful.
> >>>>> > >>>>
> >>>>> > >>>> 2. A second option, and the preferred one I think for 1.5.0,
> is to
> >>>>> put
> >>>>> > >>>> a Hadoop2 patch in the branch's contrib directory
> >>>>> > >>>> (branches/1.5/contrib) that patches the POM files to support
> >>>>> building
> >>>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
> >>>>> > >>>> solution.)
> >>>>> > >>>>
> >>>>> > >>>> 3. A third option is to fork Accumulo, and maintain two
> separate
> >>>>> > >>>> builds (a more traditional technique). This adds merging
> nightmare
> >>>>> for
> >>>>> > >>>> features/patches, but gets around some reflection hacks that
> we may
> >>>>> > >>>> have been motivated to do in the past. I'm not a fan of this
> option,
> >>>>> > >>>> particularly because I don't want to replicate the fork
> nightmare
> >>>>> that
> >>>>> > >>>> has been the history of early Hadoop itself.
> >>>>> > >>>>
> >>>>> > >>>> 4. The last option is to do nothing and to continue to build
> with
> >>>>> the
> >>>>> > >>>> separate profiles as we are, and make users discover and
> specify
> >>>>> > >>>> transitive dependencies entirely on their own. I think this
> is the
> >>>>> > >>>> worst option, as it essentially amounts to "ignore the
> problem".
> >>>>> > >>>>
> >>>>> > >>>> At the very least, it does not seem reasonable to complete
> >>>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >>>>> > >>>>
> >>>>> > >>>> Thoughts? Discussion? Vote on option?
> >>>>> > >>>>
> >>>>> > >>>> --
> >>>>> > >>>> Christopher L Tubbs II
> >>>>> > >>>> http://gravatar.com/ctubbsii
> >>>>> > >>>>
> >>>>> >
> >>>>>
>

Re: Hadoop 2 compatibility issues

Posted by Christopher <ct...@apache.org>.

Response to Benson inline, but additional note here:

It should be noted that the situation will be made worse for the
solution I was considering for ACCUMULO-1402, which would move the
accumulo artifacts, classified by the hadoop2 variant, into the
profiles... meaning they will no longer resolve transitively when they
did before. Can go into details on that ticket, if needed.

On Tue, May 14, 2013 at 7:41 PM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, May 14, 2013 at 7:36 PM, Christopher <ct...@apache.org> wrote:
>> Benson-
>>
>> They produce different byte-code. That's why we're even considering
>> this. ACCUMULO-1402 is the ticket under which our intent is to add
>> classifiers, so that they can be distinguished.
>
> whoops, missed that.
>
> Then how do people succeed in just fixing up their dependencies and using it?

The specific differences are things like changes from abstract class
to an interface. Apparently an import of these do not produce
compatible byte-code, even though the method signature looks the same.

> In any case, speaking as a Maven-maven, classifiers are absolutely,
> positively, a cure worse than the disease. If you want the details
> just ask.

Agreed. I just don't see a good alternative here.

>>
>> All-
>>
>> To Keith's point, I think perhaps all this concern is a non-issue...
>> because as Keith points out, the dependencies in question are marked
>> as "provided", and dependency resolution doesn't occur for provided
>> dependencies anyway... so even if we leave off the profiles, we're in
>> the same boat. Maybe not the boat we should be in... but certainly not
>> a sinking one as I had first imagined. It's as afloat as it was
>> before, when they were not in a profile, but still marked as
>> "provided".
>>
>> --
>> Christopher L Tubbs II
>> http://gravatar.com/ctubbsii
>>
>>
>> On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <bi...@gmail.com> wrote:
>>> I just doesn't make very much sense to me to have two different GAV's
>>> for the very same .class files, just to get different dependencies in
>>> the poms. However, if someone really wanted that, I'd look to make
>>> some scripting that created this downstream from the main build.
>>>
>>>
>>> On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
>>>> They're the same currently. I was requesting separate gavs for hadoop 2.
>>>> It's been on the mailing list and jira.
>>>>
>>>> Sent from my phone, please pardon the typos and brevity.
>>>> On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
>>>>
>>>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <bimargulies@gmail.com
>>>>> >wrote:
>>>>>
>>>>> > I am a maven developer, and I'm offering this advice based on my
>>>>> > understanding of reason why that generic advice is offered.
>>>>> >
>>>>> > If you have different profiles that _build different results_ but all
>>>>> > deliver the same GAV, you have chaos.
>>>>> >
>>>>>
>>>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
>>>>>
>>>>>
>>>>> >
>>>>> > If you have different profiles that test against different versions of
>>>>> > dependencies, but all deliver the same byte code at the end of the
>>>>> > day, you don't have chaos.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
>>>>> wrote:
>>>>> > > I think it's interesting that Option 4 seems to be most preferred...
>>>>> > > because it's the *only* option that is explicitly advised against by
>>>>> > > the Maven developers (from the information I've read). I can see its
>>>>> > > appeal, but I really don't think that we should introduce an explicit
>>>>> > > problem for users (that applies to users using even the Hadoop version
>>>>> > > we directly build against... not just those using Hadoop 2... I don't
>>>>> > > know if that point was clear), to only partially support a version of
>>>>> > > Hadoop that is still alpha and has never had a stable release.
>>>>> > >
>>>>> > > BTW, Option 4 was how I had have achieved a solution for
>>>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this issue
>>>>> > > outstanding, as it may exacerbate the problem.
>>>>> > >
>>>>> > > Another implication for Option 4 (the current "solution") is for
>>>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it means that
>>>>> > > the accumulo-maven-plugin will need to be configured like this:
>>>>> > > <plugin>
>>>>> > >   <groupId>org.apache.accumulo</groupId>
>>>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
>>>>> > >   <dependencies>
>>>>> > >    ... all the required hadoop 1 dependencies to make the plugin work,
>>>>> > > even though this version only works against hadoop 1 anyway...
>>>>> > >   </dependencies>
>>>>> > >   ...
>>>>> > > </plugin>
>>>>> > >
>>>>> > > --
>>>>> > > Christopher L Tubbs II
>>>>> > > http://gravatar.com/ctubbsii
>>>>> > >
>>>>> > >
>>>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org>
>>>>> > wrote:
>>>>> > >> I think Option 2 is the best solution for "waiting until we have the
>>>>> > >> time to solve the problem correctly", as it ensures that transitive
>>>>> > >> dependencies work for the stable version of Hadoop, and using Hadoop2
>>>>> > >> is a very simple documentation issue for how to apply the patch and
>>>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
>>>>> > >> for users.
>>>>> > >>
>>>>> > >> Option 1 is how I'm tentatively thinking about fixing it properly in
>>>>> > 1.6.0.
>>>>> > >>
>>>>> > >>
>>>>> > >> --
>>>>> > >> Christopher L Tubbs II
>>>>> > >> http://gravatar.com/ctubbsii
>>>>> > >>
>>>>> > >>
>>>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
>>>>> > >>> I'm an advocate of option 4. You say that it's ignoring the problem,
>>>>> > >>> whereas I think it's waiting until we have the time to solve the
>>>>> > problem
>>>>> > >>> correctly. Your reasoning for this is for standardizing for maven
>>>>> > >>> conventions, but the other options, while more 'correct' from a maven
>>>>> > >>> standpoint or a larger headache for our user base and ourselves. In
>>>>> > either
>>>>> > >>> case, we're going to be breaking some sort of convention, and while
>>>>> > it's
>>>>> > >>> not good, we should be doing the one that's less bad for US. The
>>>>> > important
>>>>> > >>> thing here, now, is that the poms work and we should go with the
>>>>> method
>>>>> > >>> that leaves the work minimal for our end users to utilize them.
>>>>> > >>>
>>>>> > >>> I do agree that 1. is the correct option in the long run. More
>>>>> > >>> specifically, I think it boils down to having a single module
>>>>> > compatibility
>>>>> > >>> layer, which is how hbase deals with this issue. But like you said,
>>>>> we
>>>>> > >>> don't have the time to engineer a proper solution. So let sleeping
>>>>> > dogs lie
>>>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have
>>>>> the
>>>>> > >>> cycles to do it right.
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
>>>>> > wrote:
>>>>> > >>>
>>>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
>>>>> larger
>>>>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>>>> > >>>>
>>>>> > >>>> The problem is basically that profiles should not contain
>>>>> > >>>> dependencies, because profiles don't get activated transitively. A
>>>>> > >>>> slide deck by the Maven developers point this out as a bad
>>>>> practice...
>>>>> > >>>> yet it's a practice we rely on for our current implementation of
>>>>> > >>>> Hadoop2 support
>>>>> > >>>> (
>>>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>>>>> > >>>> slide 80).
>>>>> > >>>>
>>>>> > >>>> What this means is that even if we go through the work of publishing
>>>>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>>>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
>>>>> resolve
>>>>> > >>>> any dependencies defined in profiles. This has significant
>>>>> > >>>> implications to user code that depends on Accumulo Maven artifacts.
>>>>> > >>>> Every user will essentially have to explicitly add Hadoop
>>>>> dependencies
>>>>> > >>>> for every Accumulo artifact that has dependencies on Hadoop, either
>>>>> > >>>> because we directly or transitively depend on Hadoop (they'll have
>>>>> to
>>>>> > >>>> peek into the profiles in our POMs and copy/paste the profile into
>>>>> > >>>> their project). This becomes more complicated when we consider how
>>>>> > >>>> users will try to use things like Instamo.
>>>>> > >>>>
>>>>> > >>>> There are workarounds, but none of them are really pleasant.
>>>>> > >>>>
>>>>> > >>>> 1. The best way to support both major Hadoop APIs is to have
>>>>> separate
>>>>> > >>>> modules with separate dependencies directly in the POM. This is a
>>>>> fair
>>>>> > >>>> amount of work, and in my opinion, would be too disruptive for
>>>>> 1.5.0.
>>>>> > >>>> This solution also gets us separate binaries for separate supported
>>>>> > >>>> versions, which is useful.
>>>>> > >>>>
>>>>> > >>>> 2. A second option, and the preferred one I think for 1.5.0, is to
>>>>> put
>>>>> > >>>> a Hadoop2 patch in the branch's contrib directory
>>>>> > >>>> (branches/1.5/contrib) that patches the POM files to support
>>>>> building
>>>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>>>>> > >>>> solution.)
>>>>> > >>>>
>>>>> > >>>> 3. A third option is to fork Accumulo, and maintain two separate
>>>>> > >>>> builds (a more traditional technique). This adds merging nightmare
>>>>> for
>>>>> > >>>> features/patches, but gets around some reflection hacks that we may
>>>>> > >>>> have been motivated to do in the past. I'm not a fan of this option,
>>>>> > >>>> particularly because I don't want to replicate the fork nightmare
>>>>> that
>>>>> > >>>> has been the history of early Hadoop itself.
>>>>> > >>>>
>>>>> > >>>> 4. The last option is to do nothing and to continue to build with
>>>>> the
>>>>> > >>>> separate profiles as we are, and make users discover and specify
>>>>> > >>>> transitive dependencies entirely on their own. I think this is the
>>>>> > >>>> worst option, as it essentially amounts to "ignore the problem".
>>>>> > >>>>
>>>>> > >>>> At the very least, it does not seem reasonable to complete
>>>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>>>> > >>>>
>>>>> > >>>> Thoughts? Discussion? Vote on option?
>>>>> > >>>>
>>>>> > >>>> --
>>>>> > >>>> Christopher L Tubbs II
>>>>> > >>>> http://gravatar.com/ctubbsii
>>>>> > >>>>
>>>>> >
>>>>>

Re: Hadoop 2 compatibility issues

Posted by John Vines <vi...@apache.org>.

We've written the code such that it works in either, and then we have
profiles which set the hadoop.version for convenience. The profiles also
alternate between using hadoop-client and hadoop-core, but as I mentioned
above, that is unnecessary.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 7:42 PM, "Benson Margulies" <bi...@gmail.com> wrote:

> On Tue, May 14, 2013 at 7:36 PM, Christopher <ct...@apache.org> wrote:
> > Benson-
> >
> > They produce different byte-code. That's why we're even considering
> > this. ACCUMULO-1402 is the ticket under which our intent is to add
> > classifiers, so that they can be distinguished.
>
> whoops, missed that.
>
> Then how do people succeed in just fixing up their dependencies and using
> it?
>
> In any case, speaking as a Maven-maven, classifiers are absolutely,
> positively, a cure worse than the disease. If you want the details
> just ask.
>
> >
> > All-
> >
> > To Keith's point, I think perhaps all this concern is a non-issue...
> > because as Keith points out, the dependencies in question are marked
> > as "provided", and dependency resolution doesn't occur for provided
> > dependencies anyway... so even if we leave off the profiles, we're in
> > the same boat. Maybe not the boat we should be in... but certainly not
> > a sinking one as I had first imagined. It's as afloat as it was
> > before, when they were not in a profile, but still marked as
> > "provided".
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
> >
> > On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <bi...@gmail.com>
> wrote:
> >> I just doesn't make very much sense to me to have two different GAV's
> >> for the very same .class files, just to get different dependencies in
> >> the poms. However, if someone really wanted that, I'd look to make
> >> some scripting that created this downstream from the main build.
> >>
> >>
> >> On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
> >>> They're the same currently. I was requesting separate gavs for hadoop
> 2.
> >>> It's been on the mailing list and jira.
> >>>
> >>> Sent from my phone, please pardon the typos and brevity.
> >>> On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
> >>>
> >>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <
> bimargulies@gmail.com
> >>>> >wrote:
> >>>>
> >>>> > I am a maven developer, and I'm offering this advice based on my
> >>>> > understanding of reason why that generic advice is offered.
> >>>> >
> >>>> > If you have different profiles that _build different results_ but
> all
> >>>> > deliver the same GAV, you have chaos.
> >>>> >
> >>>>
> >>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
> >>>>
> >>>>
> >>>> >
> >>>> > If you have different profiles that test against different versions
> of
> >>>> > dependencies, but all deliver the same byte code at the end of the
> >>>> > day, you don't have chaos.
> >>>> >
> >>>> >
> >>>> >
> >>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
> >>>> wrote:
> >>>> > > I think it's interesting that Option 4 seems to be most
> preferred...
> >>>> > > because it's the *only* option that is explicitly advised against
> by
> >>>> > > the Maven developers (from the information I've read). I can see
> its
> >>>> > > appeal, but I really don't think that we should introduce an
> explicit
> >>>> > > problem for users (that applies to users using even the Hadoop
> version
> >>>> > > we directly build against... not just those using Hadoop 2... I
> don't
> >>>> > > know if that point was clear), to only partially support a
> version of
> >>>> > > Hadoop that is still alpha and has never had a stable release.
> >>>> > >
> >>>> > > BTW, Option 4 was how I had have achieved a solution for
> >>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this
> issue
> >>>> > > outstanding, as it may exacerbate the problem.
> >>>> > >
> >>>> > > Another implication for Option 4 (the current "solution") is for
> >>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it means
> that
> >>>> > > the accumulo-maven-plugin will need to be configured like this:
> >>>> > > <plugin>
> >>>> > >   <groupId>org.apache.accumulo</groupId>
> >>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
> >>>> > >   <dependencies>
> >>>> > >    ... all the required hadoop 1 dependencies to make the plugin
> work,
> >>>> > > even though this version only works against hadoop 1 anyway...
> >>>> > >   </dependencies>
> >>>> > >   ...
> >>>> > > </plugin>
> >>>> > >
> >>>> > > --
> >>>> > > Christopher L Tubbs II
> >>>> > > http://gravatar.com/ctubbsii
> >>>> > >
> >>>> > >
> >>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <ctubbsii@apache.org
> >
> >>>> > wrote:
> >>>> > >> I think Option 2 is the best solution for "waiting until we have
> the
> >>>> > >> time to solve the problem correctly", as it ensures that
> transitive
> >>>> > >> dependencies work for the stable version of Hadoop, and using
> Hadoop2
> >>>> > >> is a very simple documentation issue for how to apply the patch
> and
> >>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a
> problem
> >>>> > >> for users.
> >>>> > >>
> >>>> > >> Option 1 is how I'm tentatively thinking about fixing it
> properly in
> >>>> > 1.6.0.
> >>>> > >>
> >>>> > >>
> >>>> > >> --
> >>>> > >> Christopher L Tubbs II
> >>>> > >> http://gravatar.com/ctubbsii
> >>>> > >>
> >>>> > >>
> >>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org>
> wrote:
> >>>> > >>> I'm an advocate of option 4. You say that it's ignoring the
> problem,
> >>>> > >>> whereas I think it's waiting until we have the time to solve the
> >>>> > problem
> >>>> > >>> correctly. Your reasoning for this is for standardizing for
> maven
> >>>> > >>> conventions, but the other options, while more 'correct' from a
> maven
> >>>> > >>> standpoint or a larger headache for our user base and
> ourselves. In
> >>>> > either
> >>>> > >>> case, we're going to be breaking some sort of convention, and
> while
> >>>> > it's
> >>>> > >>> not good, we should be doing the one that's less bad for US. The
> >>>> > important
> >>>> > >>> thing here, now, is that the poms work and we should go with the
> >>>> method
> >>>> > >>> that leaves the work minimal for our end users to utilize them.
> >>>> > >>>
> >>>> > >>> I do agree that 1. is the correct option in the long run. More
> >>>> > >>> specifically, I think it boils down to having a single module
> >>>> > compatibility
> >>>> > >>> layer, which is how hbase deals with this issue. But like you
> said,
> >>>> we
> >>>> > >>> don't have the time to engineer a proper solution. So let
> sleeping
> >>>> > dogs lie
> >>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we
> have
> >>>> the
> >>>> > >>> cycles to do it right.
> >>>> > >>>
> >>>> > >>>
> >>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <
> ctubbsii@apache.org>
> >>>> > wrote:
> >>>> > >>>
> >>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
> >>>> larger
> >>>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
> >>>> > >>>>
> >>>> > >>>> The problem is basically that profiles should not contain
> >>>> > >>>> dependencies, because profiles don't get activated
> transitively. A
> >>>> > >>>> slide deck by the Maven developers point this out as a bad
> >>>> practice...
> >>>> > >>>> yet it's a practice we rely on for our current implementation
> of
> >>>> > >>>> Hadoop2 support
> >>>> > >>>> (
> >>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >>>> > >>>> slide 80).
> >>>> > >>>>
> >>>> > >>>> What this means is that even if we go through the work of
> publishing
> >>>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
> >>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
> >>>> resolve
> >>>> > >>>> any dependencies defined in profiles. This has significant
> >>>> > >>>> implications to user code that depends on Accumulo Maven
> artifacts.
> >>>> > >>>> Every user will essentially have to explicitly add Hadoop
> >>>> dependencies
> >>>> > >>>> for every Accumulo artifact that has dependencies on Hadoop,
> either
> >>>> > >>>> because we directly or transitively depend on Hadoop (they'll
> have
> >>>> to
> >>>> > >>>> peek into the profiles in our POMs and copy/paste the profile
> into
> >>>> > >>>> their project). This becomes more complicated when we consider
> how
> >>>> > >>>> users will try to use things like Instamo.
> >>>> > >>>>
> >>>> > >>>> There are workarounds, but none of them are really pleasant.
> >>>> > >>>>
> >>>> > >>>> 1. The best way to support both major Hadoop APIs is to have
> >>>> separate
> >>>> > >>>> modules with separate dependencies directly in the POM. This
> is a
> >>>> fair
> >>>> > >>>> amount of work, and in my opinion, would be too disruptive for
> >>>> 1.5.0.
> >>>> > >>>> This solution also gets us separate binaries for separate
> supported
> >>>> > >>>> versions, which is useful.
> >>>> > >>>>
> >>>> > >>>> 2. A second option, and the preferred one I think for 1.5.0,
> is to
> >>>> put
> >>>> > >>>> a Hadoop2 patch in the branch's contrib directory
> >>>> > >>>> (branches/1.5/contrib) that patches the POM files to support
> >>>> building
> >>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
> >>>> > >>>> solution.)
> >>>> > >>>>
> >>>> > >>>> 3. A third option is to fork Accumulo, and maintain two
> separate
> >>>> > >>>> builds (a more traditional technique). This adds merging
> nightmare
> >>>> for
> >>>> > >>>> features/patches, but gets around some reflection hacks that
> we may
> >>>> > >>>> have been motivated to do in the past. I'm not a fan of this
> option,
> >>>> > >>>> particularly because I don't want to replicate the fork
> nightmare
> >>>> that
> >>>> > >>>> has been the history of early Hadoop itself.
> >>>> > >>>>
> >>>> > >>>> 4. The last option is to do nothing and to continue to build
> with
> >>>> the
> >>>> > >>>> separate profiles as we are, and make users discover and
> specify
> >>>> > >>>> transitive dependencies entirely on their own. I think this is
> the
> >>>> > >>>> worst option, as it essentially amounts to "ignore the
> problem".
> >>>> > >>>>
> >>>> > >>>> At the very least, it does not seem reasonable to complete
> >>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >>>> > >>>>
> >>>> > >>>> Thoughts? Discussion? Vote on option?
> >>>> > >>>>
> >>>> > >>>> --
> >>>> > >>>> Christopher L Tubbs II
> >>>> > >>>> http://gravatar.com/ctubbsii
> >>>> > >>>>
> >>>> >
> >>>>
>

Re: Hadoop 2 compatibility issues

Posted by Benson Margulies <bi...@gmail.com>.

On Tue, May 14, 2013 at 7:36 PM, Christopher <ct...@apache.org> wrote:
> Benson-
>
> They produce different byte-code. That's why we're even considering
> this. ACCUMULO-1402 is the ticket under which our intent is to add
> classifiers, so that they can be distinguished.

whoops, missed that.

Then how do people succeed in just fixing up their dependencies and using it?

In any case, speaking as a Maven-maven, classifiers are absolutely,
positively, a cure worse than the disease. If you want the details
just ask.

>
> All-
>
> To Keith's point, I think perhaps all this concern is a non-issue...
> because as Keith points out, the dependencies in question are marked
> as "provided", and dependency resolution doesn't occur for provided
> dependencies anyway... so even if we leave off the profiles, we're in
> the same boat. Maybe not the boat we should be in... but certainly not
> a sinking one as I had first imagined. It's as afloat as it was
> before, when they were not in a profile, but still marked as
> "provided".
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <bi...@gmail.com> wrote:
>> I just doesn't make very much sense to me to have two different GAV's
>> for the very same .class files, just to get different dependencies in
>> the poms. However, if someone really wanted that, I'd look to make
>> some scripting that created this downstream from the main build.
>>
>>
>> On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
>>> They're the same currently. I was requesting separate gavs for hadoop 2.
>>> It's been on the mailing list and jira.
>>>
>>> Sent from my phone, please pardon the typos and brevity.
>>> On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
>>>
>>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <bimargulies@gmail.com
>>>> >wrote:
>>>>
>>>> > I am a maven developer, and I'm offering this advice based on my
>>>> > understanding of reason why that generic advice is offered.
>>>> >
>>>> > If you have different profiles that _build different results_ but all
>>>> > deliver the same GAV, you have chaos.
>>>> >
>>>>
>>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
>>>>
>>>>
>>>> >
>>>> > If you have different profiles that test against different versions of
>>>> > dependencies, but all deliver the same byte code at the end of the
>>>> > day, you don't have chaos.
>>>> >
>>>> >
>>>> >
>>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
>>>> wrote:
>>>> > > I think it's interesting that Option 4 seems to be most preferred...
>>>> > > because it's the *only* option that is explicitly advised against by
>>>> > > the Maven developers (from the information I've read). I can see its
>>>> > > appeal, but I really don't think that we should introduce an explicit
>>>> > > problem for users (that applies to users using even the Hadoop version
>>>> > > we directly build against... not just those using Hadoop 2... I don't
>>>> > > know if that point was clear), to only partially support a version of
>>>> > > Hadoop that is still alpha and has never had a stable release.
>>>> > >
>>>> > > BTW, Option 4 was how I had have achieved a solution for
>>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this issue
>>>> > > outstanding, as it may exacerbate the problem.
>>>> > >
>>>> > > Another implication for Option 4 (the current "solution") is for
>>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it means that
>>>> > > the accumulo-maven-plugin will need to be configured like this:
>>>> > > <plugin>
>>>> > >   <groupId>org.apache.accumulo</groupId>
>>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
>>>> > >   <dependencies>
>>>> > >    ... all the required hadoop 1 dependencies to make the plugin work,
>>>> > > even though this version only works against hadoop 1 anyway...
>>>> > >   </dependencies>
>>>> > >   ...
>>>> > > </plugin>
>>>> > >
>>>> > > --
>>>> > > Christopher L Tubbs II
>>>> > > http://gravatar.com/ctubbsii
>>>> > >
>>>> > >
>>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org>
>>>> > wrote:
>>>> > >> I think Option 2 is the best solution for "waiting until we have the
>>>> > >> time to solve the problem correctly", as it ensures that transitive
>>>> > >> dependencies work for the stable version of Hadoop, and using Hadoop2
>>>> > >> is a very simple documentation issue for how to apply the patch and
>>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
>>>> > >> for users.
>>>> > >>
>>>> > >> Option 1 is how I'm tentatively thinking about fixing it properly in
>>>> > 1.6.0.
>>>> > >>
>>>> > >>
>>>> > >> --
>>>> > >> Christopher L Tubbs II
>>>> > >> http://gravatar.com/ctubbsii
>>>> > >>
>>>> > >>
>>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
>>>> > >>> I'm an advocate of option 4. You say that it's ignoring the problem,
>>>> > >>> whereas I think it's waiting until we have the time to solve the
>>>> > problem
>>>> > >>> correctly. Your reasoning for this is for standardizing for maven
>>>> > >>> conventions, but the other options, while more 'correct' from a maven
>>>> > >>> standpoint or a larger headache for our user base and ourselves. In
>>>> > either
>>>> > >>> case, we're going to be breaking some sort of convention, and while
>>>> > it's
>>>> > >>> not good, we should be doing the one that's less bad for US. The
>>>> > important
>>>> > >>> thing here, now, is that the poms work and we should go with the
>>>> method
>>>> > >>> that leaves the work minimal for our end users to utilize them.
>>>> > >>>
>>>> > >>> I do agree that 1. is the correct option in the long run. More
>>>> > >>> specifically, I think it boils down to having a single module
>>>> > compatibility
>>>> > >>> layer, which is how hbase deals with this issue. But like you said,
>>>> we
>>>> > >>> don't have the time to engineer a proper solution. So let sleeping
>>>> > dogs lie
>>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have
>>>> the
>>>> > >>> cycles to do it right.
>>>> > >>>
>>>> > >>>
>>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
>>>> > wrote:
>>>> > >>>
>>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
>>>> larger
>>>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>>> > >>>>
>>>> > >>>> The problem is basically that profiles should not contain
>>>> > >>>> dependencies, because profiles don't get activated transitively. A
>>>> > >>>> slide deck by the Maven developers point this out as a bad
>>>> practice...
>>>> > >>>> yet it's a practice we rely on for our current implementation of
>>>> > >>>> Hadoop2 support
>>>> > >>>> (
>>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>>>> > >>>> slide 80).
>>>> > >>>>
>>>> > >>>> What this means is that even if we go through the work of publishing
>>>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
>>>> resolve
>>>> > >>>> any dependencies defined in profiles. This has significant
>>>> > >>>> implications to user code that depends on Accumulo Maven artifacts.
>>>> > >>>> Every user will essentially have to explicitly add Hadoop
>>>> dependencies
>>>> > >>>> for every Accumulo artifact that has dependencies on Hadoop, either
>>>> > >>>> because we directly or transitively depend on Hadoop (they'll have
>>>> to
>>>> > >>>> peek into the profiles in our POMs and copy/paste the profile into
>>>> > >>>> their project). This becomes more complicated when we consider how
>>>> > >>>> users will try to use things like Instamo.
>>>> > >>>>
>>>> > >>>> There are workarounds, but none of them are really pleasant.
>>>> > >>>>
>>>> > >>>> 1. The best way to support both major Hadoop APIs is to have
>>>> separate
>>>> > >>>> modules with separate dependencies directly in the POM. This is a
>>>> fair
>>>> > >>>> amount of work, and in my opinion, would be too disruptive for
>>>> 1.5.0.
>>>> > >>>> This solution also gets us separate binaries for separate supported
>>>> > >>>> versions, which is useful.
>>>> > >>>>
>>>> > >>>> 2. A second option, and the preferred one I think for 1.5.0, is to
>>>> put
>>>> > >>>> a Hadoop2 patch in the branch's contrib directory
>>>> > >>>> (branches/1.5/contrib) that patches the POM files to support
>>>> building
>>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>>>> > >>>> solution.)
>>>> > >>>>
>>>> > >>>> 3. A third option is to fork Accumulo, and maintain two separate
>>>> > >>>> builds (a more traditional technique). This adds merging nightmare
>>>> for
>>>> > >>>> features/patches, but gets around some reflection hacks that we may
>>>> > >>>> have been motivated to do in the past. I'm not a fan of this option,
>>>> > >>>> particularly because I don't want to replicate the fork nightmare
>>>> that
>>>> > >>>> has been the history of early Hadoop itself.
>>>> > >>>>
>>>> > >>>> 4. The last option is to do nothing and to continue to build with
>>>> the
>>>> > >>>> separate profiles as we are, and make users discover and specify
>>>> > >>>> transitive dependencies entirely on their own. I think this is the
>>>> > >>>> worst option, as it essentially amounts to "ignore the problem".
>>>> > >>>>
>>>> > >>>> At the very least, it does not seem reasonable to complete
>>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>>> > >>>>
>>>> > >>>> Thoughts? Discussion? Vote on option?
>>>> > >>>>
>>>> > >>>> --
>>>> > >>>> Christopher L Tubbs II
>>>> > >>>> http://gravatar.com/ctubbsii
>>>> > >>>>
>>>> >
>>>>

Re: Hadoop 2 compatibility issues

Posted by Christopher <ct...@apache.org>.

Benson-

They produce different byte-code. That's why we're even considering
this. ACCUMULO-1402 is the ticket under which our intent is to add
classifiers, so that they can be distinguished.

All-

To Keith's point, I think perhaps all this concern is a non-issue...
because as Keith points out, the dependencies in question are marked
as "provided", and dependency resolution doesn't occur for provided
dependencies anyway... so even if we leave off the profiles, we're in
the same boat. Maybe not the boat we should be in... but certainly not
a sinking one as I had first imagined. It's as afloat as it was
before, when they were not in a profile, but still marked as
"provided".

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <bi...@gmail.com> wrote:
> I just doesn't make very much sense to me to have two different GAV's
> for the very same .class files, just to get different dependencies in
> the poms. However, if someone really wanted that, I'd look to make
> some scripting that created this downstream from the main build.
>
>
> On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
>> They're the same currently. I was requesting separate gavs for hadoop 2.
>> It's been on the mailing list and jira.
>>
>> Sent from my phone, please pardon the typos and brevity.
>> On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
>>
>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <bimargulies@gmail.com
>>> >wrote:
>>>
>>> > I am a maven developer, and I'm offering this advice based on my
>>> > understanding of reason why that generic advice is offered.
>>> >
>>> > If you have different profiles that _build different results_ but all
>>> > deliver the same GAV, you have chaos.
>>> >
>>>
>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
>>>
>>>
>>> >
>>> > If you have different profiles that test against different versions of
>>> > dependencies, but all deliver the same byte code at the end of the
>>> > day, you don't have chaos.
>>> >
>>> >
>>> >
>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
>>> wrote:
>>> > > I think it's interesting that Option 4 seems to be most preferred...
>>> > > because it's the *only* option that is explicitly advised against by
>>> > > the Maven developers (from the information I've read). I can see its
>>> > > appeal, but I really don't think that we should introduce an explicit
>>> > > problem for users (that applies to users using even the Hadoop version
>>> > > we directly build against... not just those using Hadoop 2... I don't
>>> > > know if that point was clear), to only partially support a version of
>>> > > Hadoop that is still alpha and has never had a stable release.
>>> > >
>>> > > BTW, Option 4 was how I had have achieved a solution for
>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this issue
>>> > > outstanding, as it may exacerbate the problem.
>>> > >
>>> > > Another implication for Option 4 (the current "solution") is for
>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it means that
>>> > > the accumulo-maven-plugin will need to be configured like this:
>>> > > <plugin>
>>> > >   <groupId>org.apache.accumulo</groupId>
>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
>>> > >   <dependencies>
>>> > >    ... all the required hadoop 1 dependencies to make the plugin work,
>>> > > even though this version only works against hadoop 1 anyway...
>>> > >   </dependencies>
>>> > >   ...
>>> > > </plugin>
>>> > >
>>> > > --
>>> > > Christopher L Tubbs II
>>> > > http://gravatar.com/ctubbsii
>>> > >
>>> > >
>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org>
>>> > wrote:
>>> > >> I think Option 2 is the best solution for "waiting until we have the
>>> > >> time to solve the problem correctly", as it ensures that transitive
>>> > >> dependencies work for the stable version of Hadoop, and using Hadoop2
>>> > >> is a very simple documentation issue for how to apply the patch and
>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
>>> > >> for users.
>>> > >>
>>> > >> Option 1 is how I'm tentatively thinking about fixing it properly in
>>> > 1.6.0.
>>> > >>
>>> > >>
>>> > >> --
>>> > >> Christopher L Tubbs II
>>> > >> http://gravatar.com/ctubbsii
>>> > >>
>>> > >>
>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
>>> > >>> I'm an advocate of option 4. You say that it's ignoring the problem,
>>> > >>> whereas I think it's waiting until we have the time to solve the
>>> > problem
>>> > >>> correctly. Your reasoning for this is for standardizing for maven
>>> > >>> conventions, but the other options, while more 'correct' from a maven
>>> > >>> standpoint or a larger headache for our user base and ourselves. In
>>> > either
>>> > >>> case, we're going to be breaking some sort of convention, and while
>>> > it's
>>> > >>> not good, we should be doing the one that's less bad for US. The
>>> > important
>>> > >>> thing here, now, is that the poms work and we should go with the
>>> method
>>> > >>> that leaves the work minimal for our end users to utilize them.
>>> > >>>
>>> > >>> I do agree that 1. is the correct option in the long run. More
>>> > >>> specifically, I think it boils down to having a single module
>>> > compatibility
>>> > >>> layer, which is how hbase deals with this issue. But like you said,
>>> we
>>> > >>> don't have the time to engineer a proper solution. So let sleeping
>>> > dogs lie
>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have
>>> the
>>> > >>> cycles to do it right.
>>> > >>>
>>> > >>>
>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
>>> > wrote:
>>> > >>>
>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
>>> larger
>>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>> > >>>>
>>> > >>>> The problem is basically that profiles should not contain
>>> > >>>> dependencies, because profiles don't get activated transitively. A
>>> > >>>> slide deck by the Maven developers point this out as a bad
>>> practice...
>>> > >>>> yet it's a practice we rely on for our current implementation of
>>> > >>>> Hadoop2 support
>>> > >>>> (
>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>>> > >>>> slide 80).
>>> > >>>>
>>> > >>>> What this means is that even if we go through the work of publishing
>>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
>>> resolve
>>> > >>>> any dependencies defined in profiles. This has significant
>>> > >>>> implications to user code that depends on Accumulo Maven artifacts.
>>> > >>>> Every user will essentially have to explicitly add Hadoop
>>> dependencies
>>> > >>>> for every Accumulo artifact that has dependencies on Hadoop, either
>>> > >>>> because we directly or transitively depend on Hadoop (they'll have
>>> to
>>> > >>>> peek into the profiles in our POMs and copy/paste the profile into
>>> > >>>> their project). This becomes more complicated when we consider how
>>> > >>>> users will try to use things like Instamo.
>>> > >>>>
>>> > >>>> There are workarounds, but none of them are really pleasant.
>>> > >>>>
>>> > >>>> 1. The best way to support both major Hadoop APIs is to have
>>> separate
>>> > >>>> modules with separate dependencies directly in the POM. This is a
>>> fair
>>> > >>>> amount of work, and in my opinion, would be too disruptive for
>>> 1.5.0.
>>> > >>>> This solution also gets us separate binaries for separate supported
>>> > >>>> versions, which is useful.
>>> > >>>>
>>> > >>>> 2. A second option, and the preferred one I think for 1.5.0, is to
>>> put
>>> > >>>> a Hadoop2 patch in the branch's contrib directory
>>> > >>>> (branches/1.5/contrib) that patches the POM files to support
>>> building
>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>>> > >>>> solution.)
>>> > >>>>
>>> > >>>> 3. A third option is to fork Accumulo, and maintain two separate
>>> > >>>> builds (a more traditional technique). This adds merging nightmare
>>> for
>>> > >>>> features/patches, but gets around some reflection hacks that we may
>>> > >>>> have been motivated to do in the past. I'm not a fan of this option,
>>> > >>>> particularly because I don't want to replicate the fork nightmare
>>> that
>>> > >>>> has been the history of early Hadoop itself.
>>> > >>>>
>>> > >>>> 4. The last option is to do nothing and to continue to build with
>>> the
>>> > >>>> separate profiles as we are, and make users discover and specify
>>> > >>>> transitive dependencies entirely on their own. I think this is the
>>> > >>>> worst option, as it essentially amounts to "ignore the problem".
>>> > >>>>
>>> > >>>> At the very least, it does not seem reasonable to complete
>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>> > >>>>
>>> > >>>> Thoughts? Discussion? Vote on option?
>>> > >>>>
>>> > >>>> --
>>> > >>>> Christopher L Tubbs II
>>> > >>>> http://gravatar.com/ctubbsii
>>> > >>>>
>>> >
>>>

Re: Hadoop 2 compatibility issues

Posted by John Vines <vi...@apache.org>.

Sorry for the dupe Benson, meant to reply all

Oh no Benson, the compiled code is different. The fundamental issue is that
some interfaces got changes to abstract classes or vice versa. The source
is the same, but class files are different.
Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 7:09 PM, "Benson Margulies" <bi...@gmail.com> wrote:

> I just doesn't make very much sense to me to have two different GAV's
> for the very same .class files, just to get different dependencies in
> the poms. However, if someone really wanted that, I'd look to make
> some scripting that created this downstream from the main build.
>
>
> On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
> > They're the same currently. I was requesting separate gavs for hadoop 2.
> > It's been on the mailing list and jira.
> >
> > Sent from my phone, please pardon the typos and brevity.
> > On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
> >
> >> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <
> bimargulies@gmail.com
> >> >wrote:
> >>
> >> > I am a maven developer, and I'm offering this advice based on my
> >> > understanding of reason why that generic advice is offered.
> >> >
> >> > If you have different profiles that _build different results_ but all
> >> > deliver the same GAV, you have chaos.
> >> >
> >>
> >> What GAV are we currently producing for hadoop 1 and hadoop 2?
> >>
> >>
> >> >
> >> > If you have different profiles that test against different versions of
> >> > dependencies, but all deliver the same byte code at the end of the
> >> > day, you don't have chaos.
> >> >
> >> >
> >> >
> >> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
> >> wrote:
> >> > > I think it's interesting that Option 4 seems to be most preferred...
> >> > > because it's the *only* option that is explicitly advised against by
> >> > > the Maven developers (from the information I've read). I can see its
> >> > > appeal, but I really don't think that we should introduce an
> explicit
> >> > > problem for users (that applies to users using even the Hadoop
> version
> >> > > we directly build against... not just those using Hadoop 2... I
> don't
> >> > > know if that point was clear), to only partially support a version
> of
> >> > > Hadoop that is still alpha and has never had a stable release.
> >> > >
> >> > > BTW, Option 4 was how I had have achieved a solution for
> >> > > ACCUMULO-1402, but am reluctant to apply that patch, with this issue
> >> > > outstanding, as it may exacerbate the problem.
> >> > >
> >> > > Another implication for Option 4 (the current "solution") is for
> >> > > 1.6.0, with the planned accumulo-maven-plugin... because it means
> that
> >> > > the accumulo-maven-plugin will need to be configured like this:
> >> > > <plugin>
> >> > >   <groupId>org.apache.accumulo</groupId>
> >> > >   <artifactId>accumulo-maven-plugin</artifactId>
> >> > >   <dependencies>
> >> > >    ... all the required hadoop 1 dependencies to make the plugin
> work,
> >> > > even though this version only works against hadoop 1 anyway...
> >> > >   </dependencies>
> >> > >   ...
> >> > > </plugin>
> >> > >
> >> > > --
> >> > > Christopher L Tubbs II
> >> > > http://gravatar.com/ctubbsii
> >> > >
> >> > >
> >> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org>
> >> > wrote:
> >> > >> I think Option 2 is the best solution for "waiting until we have
> the
> >> > >> time to solve the problem correctly", as it ensures that transitive
> >> > >> dependencies work for the stable version of Hadoop, and using
> Hadoop2
> >> > >> is a very simple documentation issue for how to apply the patch and
> >> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a
> problem
> >> > >> for users.
> >> > >>
> >> > >> Option 1 is how I'm tentatively thinking about fixing it properly
> in
> >> > 1.6.0.
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Christopher L Tubbs II
> >> > >> http://gravatar.com/ctubbsii
> >> > >>
> >> > >>
> >> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org>
> wrote:
> >> > >>> I'm an advocate of option 4. You say that it's ignoring the
> problem,
> >> > >>> whereas I think it's waiting until we have the time to solve the
> >> > problem
> >> > >>> correctly. Your reasoning for this is for standardizing for maven
> >> > >>> conventions, but the other options, while more 'correct' from a
> maven
> >> > >>> standpoint or a larger headache for our user base and ourselves.
> In
> >> > either
> >> > >>> case, we're going to be breaking some sort of convention, and
> while
> >> > it's
> >> > >>> not good, we should be doing the one that's less bad for US. The
> >> > important
> >> > >>> thing here, now, is that the poms work and we should go with the
> >> method
> >> > >>> that leaves the work minimal for our end users to utilize them.
> >> > >>>
> >> > >>> I do agree that 1. is the correct option in the long run. More
> >> > >>> specifically, I think it boils down to having a single module
> >> > compatibility
> >> > >>> layer, which is how hbase deals with this issue. But like you
> said,
> >> we
> >> > >>> don't have the time to engineer a proper solution. So let sleeping
> >> > dogs lie
> >> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have
> >> the
> >> > >>> cycles to do it right.
> >> > >>>
> >> > >>>
> >> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ctubbsii@apache.org
> >
> >> > wrote:
> >> > >>>
> >> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
> >> larger
> >> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
> >> > >>>>
> >> > >>>> The problem is basically that profiles should not contain
> >> > >>>> dependencies, because profiles don't get activated transitively.
> A
> >> > >>>> slide deck by the Maven developers point this out as a bad
> >> practice...
> >> > >>>> yet it's a practice we rely on for our current implementation of
> >> > >>>> Hadoop2 support
> >> > >>>> (
> >> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >> > >>>> slide 80).
> >> > >>>>
> >> > >>>> What this means is that even if we go through the work of
> publishing
> >> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
> >> > >>>> binaries or our Hadoop2 binaries will be able to transitively
> >> resolve
> >> > >>>> any dependencies defined in profiles. This has significant
> >> > >>>> implications to user code that depends on Accumulo Maven
> artifacts.
> >> > >>>> Every user will essentially have to explicitly add Hadoop
> >> dependencies
> >> > >>>> for every Accumulo artifact that has dependencies on Hadoop,
> either
> >> > >>>> because we directly or transitively depend on Hadoop (they'll
> have
> >> to
> >> > >>>> peek into the profiles in our POMs and copy/paste the profile
> into
> >> > >>>> their project). This becomes more complicated when we consider
> how
> >> > >>>> users will try to use things like Instamo.
> >> > >>>>
> >> > >>>> There are workarounds, but none of them are really pleasant.
> >> > >>>>
> >> > >>>> 1. The best way to support both major Hadoop APIs is to have
> >> separate
> >> > >>>> modules with separate dependencies directly in the POM. This is a
> >> fair
> >> > >>>> amount of work, and in my opinion, would be too disruptive for
> >> 1.5.0.
> >> > >>>> This solution also gets us separate binaries for separate
> supported
> >> > >>>> versions, which is useful.
> >> > >>>>
> >> > >>>> 2. A second option, and the preferred one I think for 1.5.0, is
> to
> >> put
> >> > >>>> a Hadoop2 patch in the branch's contrib directory
> >> > >>>> (branches/1.5/contrib) that patches the POM files to support
> >> building
> >> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
> >> > >>>> solution.)
> >> > >>>>
> >> > >>>> 3. A third option is to fork Accumulo, and maintain two separate
> >> > >>>> builds (a more traditional technique). This adds merging
> nightmare
> >> for
> >> > >>>> features/patches, but gets around some reflection hacks that we
> may
> >> > >>>> have been motivated to do in the past. I'm not a fan of this
> option,
> >> > >>>> particularly because I don't want to replicate the fork nightmare
> >> that
> >> > >>>> has been the history of early Hadoop itself.
> >> > >>>>
> >> > >>>> 4. The last option is to do nothing and to continue to build with
> >> the
> >> > >>>> separate profiles as we are, and make users discover and specify
> >> > >>>> transitive dependencies entirely on their own. I think this is
> the
> >> > >>>> worst option, as it essentially amounts to "ignore the problem".
> >> > >>>>
> >> > >>>> At the very least, it does not seem reasonable to complete
> >> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >> > >>>>
> >> > >>>> Thoughts? Discussion? Vote on option?
> >> > >>>>
> >> > >>>> --
> >> > >>>> Christopher L Tubbs II
> >> > >>>> http://gravatar.com/ctubbsii
> >> > >>>>
> >> >
> >>
>

Re: Hadoop 2 compatibility issues

Posted by Benson Margulies <bi...@gmail.com>.

I just doesn't make very much sense to me to have two different GAV's
for the very same .class files, just to get different dependencies in
the poms. However, if someone really wanted that, I'd look to make
some scripting that created this downstream from the main build.


On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
> They're the same currently. I was requesting separate gavs for hadoop 2.
> It's been on the mailing list and jira.
>
> Sent from my phone, please pardon the typos and brevity.
> On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
>
>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <bimargulies@gmail.com
>> >wrote:
>>
>> > I am a maven developer, and I'm offering this advice based on my
>> > understanding of reason why that generic advice is offered.
>> >
>> > If you have different profiles that _build different results_ but all
>> > deliver the same GAV, you have chaos.
>> >
>>
>> What GAV are we currently producing for hadoop 1 and hadoop 2?
>>
>>
>> >
>> > If you have different profiles that test against different versions of
>> > dependencies, but all deliver the same byte code at the end of the
>> > day, you don't have chaos.
>> >
>> >
>> >
>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
>> wrote:
>> > > I think it's interesting that Option 4 seems to be most preferred...
>> > > because it's the *only* option that is explicitly advised against by
>> > > the Maven developers (from the information I've read). I can see its
>> > > appeal, but I really don't think that we should introduce an explicit
>> > > problem for users (that applies to users using even the Hadoop version
>> > > we directly build against... not just those using Hadoop 2... I don't
>> > > know if that point was clear), to only partially support a version of
>> > > Hadoop that is still alpha and has never had a stable release.
>> > >
>> > > BTW, Option 4 was how I had have achieved a solution for
>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this issue
>> > > outstanding, as it may exacerbate the problem.
>> > >
>> > > Another implication for Option 4 (the current "solution") is for
>> > > 1.6.0, with the planned accumulo-maven-plugin... because it means that
>> > > the accumulo-maven-plugin will need to be configured like this:
>> > > <plugin>
>> > >   <groupId>org.apache.accumulo</groupId>
>> > >   <artifactId>accumulo-maven-plugin</artifactId>
>> > >   <dependencies>
>> > >    ... all the required hadoop 1 dependencies to make the plugin work,
>> > > even though this version only works against hadoop 1 anyway...
>> > >   </dependencies>
>> > >   ...
>> > > </plugin>
>> > >
>> > > --
>> > > Christopher L Tubbs II
>> > > http://gravatar.com/ctubbsii
>> > >
>> > >
>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org>
>> > wrote:
>> > >> I think Option 2 is the best solution for "waiting until we have the
>> > >> time to solve the problem correctly", as it ensures that transitive
>> > >> dependencies work for the stable version of Hadoop, and using Hadoop2
>> > >> is a very simple documentation issue for how to apply the patch and
>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
>> > >> for users.
>> > >>
>> > >> Option 1 is how I'm tentatively thinking about fixing it properly in
>> > 1.6.0.
>> > >>
>> > >>
>> > >> --
>> > >> Christopher L Tubbs II
>> > >> http://gravatar.com/ctubbsii
>> > >>
>> > >>
>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
>> > >>> I'm an advocate of option 4. You say that it's ignoring the problem,
>> > >>> whereas I think it's waiting until we have the time to solve the
>> > problem
>> > >>> correctly. Your reasoning for this is for standardizing for maven
>> > >>> conventions, but the other options, while more 'correct' from a maven
>> > >>> standpoint or a larger headache for our user base and ourselves. In
>> > either
>> > >>> case, we're going to be breaking some sort of convention, and while
>> > it's
>> > >>> not good, we should be doing the one that's less bad for US. The
>> > important
>> > >>> thing here, now, is that the poms work and we should go with the
>> method
>> > >>> that leaves the work minimal for our end users to utilize them.
>> > >>>
>> > >>> I do agree that 1. is the correct option in the long run. More
>> > >>> specifically, I think it boils down to having a single module
>> > compatibility
>> > >>> layer, which is how hbase deals with this issue. But like you said,
>> we
>> > >>> don't have the time to engineer a proper solution. So let sleeping
>> > dogs lie
>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have
>> the
>> > >>> cycles to do it right.
>> > >>>
>> > >>>
>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
>> > wrote:
>> > >>>
>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
>> larger
>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>> > >>>>
>> > >>>> The problem is basically that profiles should not contain
>> > >>>> dependencies, because profiles don't get activated transitively. A
>> > >>>> slide deck by the Maven developers point this out as a bad
>> practice...
>> > >>>> yet it's a practice we rely on for our current implementation of
>> > >>>> Hadoop2 support
>> > >>>> (
>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>> > >>>> slide 80).
>> > >>>>
>> > >>>> What this means is that even if we go through the work of publishing
>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
>> resolve
>> > >>>> any dependencies defined in profiles. This has significant
>> > >>>> implications to user code that depends on Accumulo Maven artifacts.
>> > >>>> Every user will essentially have to explicitly add Hadoop
>> dependencies
>> > >>>> for every Accumulo artifact that has dependencies on Hadoop, either
>> > >>>> because we directly or transitively depend on Hadoop (they'll have
>> to
>> > >>>> peek into the profiles in our POMs and copy/paste the profile into
>> > >>>> their project). This becomes more complicated when we consider how
>> > >>>> users will try to use things like Instamo.
>> > >>>>
>> > >>>> There are workarounds, but none of them are really pleasant.
>> > >>>>
>> > >>>> 1. The best way to support both major Hadoop APIs is to have
>> separate
>> > >>>> modules with separate dependencies directly in the POM. This is a
>> fair
>> > >>>> amount of work, and in my opinion, would be too disruptive for
>> 1.5.0.
>> > >>>> This solution also gets us separate binaries for separate supported
>> > >>>> versions, which is useful.
>> > >>>>
>> > >>>> 2. A second option, and the preferred one I think for 1.5.0, is to
>> put
>> > >>>> a Hadoop2 patch in the branch's contrib directory
>> > >>>> (branches/1.5/contrib) that patches the POM files to support
>> building
>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>> > >>>> solution.)
>> > >>>>
>> > >>>> 3. A third option is to fork Accumulo, and maintain two separate
>> > >>>> builds (a more traditional technique). This adds merging nightmare
>> for
>> > >>>> features/patches, but gets around some reflection hacks that we may
>> > >>>> have been motivated to do in the past. I'm not a fan of this option,
>> > >>>> particularly because I don't want to replicate the fork nightmare
>> that
>> > >>>> has been the history of early Hadoop itself.
>> > >>>>
>> > >>>> 4. The last option is to do nothing and to continue to build with
>> the
>> > >>>> separate profiles as we are, and make users discover and specify
>> > >>>> transitive dependencies entirely on their own. I think this is the
>> > >>>> worst option, as it essentially amounts to "ignore the problem".
>> > >>>>
>> > >>>> At the very least, it does not seem reasonable to complete
>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>> > >>>>
>> > >>>> Thoughts? Discussion? Vote on option?
>> > >>>>
>> > >>>> --
>> > >>>> Christopher L Tubbs II
>> > >>>> http://gravatar.com/ctubbsii
>> > >>>>
>> >
>>

Re: Hadoop 2 compatibility issues

Posted by John Vines <vi...@apache.org>.

They're the same currently. I was requesting separate gavs for hadoop 2.
It's been on the mailing list and jira.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:

> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <bimargulies@gmail.com
> >wrote:
>
> > I am a maven developer, and I'm offering this advice based on my
> > understanding of reason why that generic advice is offered.
> >
> > If you have different profiles that _build different results_ but all
> > deliver the same GAV, you have chaos.
> >
>
> What GAV are we currently producing for hadoop 1 and hadoop 2?
>
>
> >
> > If you have different profiles that test against different versions of
> > dependencies, but all deliver the same byte code at the end of the
> > day, you don't have chaos.
> >
> >
> >
> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org>
> wrote:
> > > I think it's interesting that Option 4 seems to be most preferred...
> > > because it's the *only* option that is explicitly advised against by
> > > the Maven developers (from the information I've read). I can see its
> > > appeal, but I really don't think that we should introduce an explicit
> > > problem for users (that applies to users using even the Hadoop version
> > > we directly build against... not just those using Hadoop 2... I don't
> > > know if that point was clear), to only partially support a version of
> > > Hadoop that is still alpha and has never had a stable release.
> > >
> > > BTW, Option 4 was how I had have achieved a solution for
> > > ACCUMULO-1402, but am reluctant to apply that patch, with this issue
> > > outstanding, as it may exacerbate the problem.
> > >
> > > Another implication for Option 4 (the current "solution") is for
> > > 1.6.0, with the planned accumulo-maven-plugin... because it means that
> > > the accumulo-maven-plugin will need to be configured like this:
> > > <plugin>
> > >   <groupId>org.apache.accumulo</groupId>
> > >   <artifactId>accumulo-maven-plugin</artifactId>
> > >   <dependencies>
> > >    ... all the required hadoop 1 dependencies to make the plugin work,
> > > even though this version only works against hadoop 1 anyway...
> > >   </dependencies>
> > >   ...
> > > </plugin>
> > >
> > > --
> > > Christopher L Tubbs II
> > > http://gravatar.com/ctubbsii
> > >
> > >
> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org>
> > wrote:
> > >> I think Option 2 is the best solution for "waiting until we have the
> > >> time to solve the problem correctly", as it ensures that transitive
> > >> dependencies work for the stable version of Hadoop, and using Hadoop2
> > >> is a very simple documentation issue for how to apply the patch and
> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
> > >> for users.
> > >>
> > >> Option 1 is how I'm tentatively thinking about fixing it properly in
> > 1.6.0.
> > >>
> > >>
> > >> --
> > >> Christopher L Tubbs II
> > >> http://gravatar.com/ctubbsii
> > >>
> > >>
> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
> > >>> I'm an advocate of option 4. You say that it's ignoring the problem,
> > >>> whereas I think it's waiting until we have the time to solve the
> > problem
> > >>> correctly. Your reasoning for this is for standardizing for maven
> > >>> conventions, but the other options, while more 'correct' from a maven
> > >>> standpoint or a larger headache for our user base and ourselves. In
> > either
> > >>> case, we're going to be breaking some sort of convention, and while
> > it's
> > >>> not good, we should be doing the one that's less bad for US. The
> > important
> > >>> thing here, now, is that the poms work and we should go with the
> method
> > >>> that leaves the work minimal for our end users to utilize them.
> > >>>
> > >>> I do agree that 1. is the correct option in the long run. More
> > >>> specifically, I think it boils down to having a single module
> > compatibility
> > >>> layer, which is how hbase deals with this issue. But like you said,
> we
> > >>> don't have the time to engineer a proper solution. So let sleeping
> > dogs lie
> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have
> the
> > >>> cycles to do it right.
> > >>>
> > >>>
> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
> > wrote:
> > >>>
> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
> larger
> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
> > >>>>
> > >>>> The problem is basically that profiles should not contain
> > >>>> dependencies, because profiles don't get activated transitively. A
> > >>>> slide deck by the Maven developers point this out as a bad
> practice...
> > >>>> yet it's a practice we rely on for our current implementation of
> > >>>> Hadoop2 support
> > >>>> (
> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> > >>>> slide 80).
> > >>>>
> > >>>> What this means is that even if we go through the work of publishing
> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
> > >>>> binaries or our Hadoop2 binaries will be able to transitively
> resolve
> > >>>> any dependencies defined in profiles. This has significant
> > >>>> implications to user code that depends on Accumulo Maven artifacts.
> > >>>> Every user will essentially have to explicitly add Hadoop
> dependencies
> > >>>> for every Accumulo artifact that has dependencies on Hadoop, either
> > >>>> because we directly or transitively depend on Hadoop (they'll have
> to
> > >>>> peek into the profiles in our POMs and copy/paste the profile into
> > >>>> their project). This becomes more complicated when we consider how
> > >>>> users will try to use things like Instamo.
> > >>>>
> > >>>> There are workarounds, but none of them are really pleasant.
> > >>>>
> > >>>> 1. The best way to support both major Hadoop APIs is to have
> separate
> > >>>> modules with separate dependencies directly in the POM. This is a
> fair
> > >>>> amount of work, and in my opinion, would be too disruptive for
> 1.5.0.
> > >>>> This solution also gets us separate binaries for separate supported
> > >>>> versions, which is useful.
> > >>>>
> > >>>> 2. A second option, and the preferred one I think for 1.5.0, is to
> put
> > >>>> a Hadoop2 patch in the branch's contrib directory
> > >>>> (branches/1.5/contrib) that patches the POM files to support
> building
> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
> > >>>> solution.)
> > >>>>
> > >>>> 3. A third option is to fork Accumulo, and maintain two separate
> > >>>> builds (a more traditional technique). This adds merging nightmare
> for
> > >>>> features/patches, but gets around some reflection hacks that we may
> > >>>> have been motivated to do in the past. I'm not a fan of this option,
> > >>>> particularly because I don't want to replicate the fork nightmare
> that
> > >>>> has been the history of early Hadoop itself.
> > >>>>
> > >>>> 4. The last option is to do nothing and to continue to build with
> the
> > >>>> separate profiles as we are, and make users discover and specify
> > >>>> transitive dependencies entirely on their own. I think this is the
> > >>>> worst option, as it essentially amounts to "ignore the problem".
> > >>>>
> > >>>> At the very least, it does not seem reasonable to complete
> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> > >>>>
> > >>>> Thoughts? Discussion? Vote on option?
> > >>>>
> > >>>> --
> > >>>> Christopher L Tubbs II
> > >>>> http://gravatar.com/ctubbsii
> > >>>>
> >
>

Re: Hadoop 2 compatibility issues

Posted by Keith Turner <ke...@deenlo.com>.

On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <bi...@gmail.com>wrote:

> I am a maven developer, and I'm offering this advice based on my
> understanding of reason why that generic advice is offered.
>
> If you have different profiles that _build different results_ but all
> deliver the same GAV, you have chaos.
>

What GAV are we currently producing for hadoop 1 and hadoop 2?


>
> If you have different profiles that test against different versions of
> dependencies, but all deliver the same byte code at the end of the
> day, you don't have chaos.
>
>
>
> On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org> wrote:
> > I think it's interesting that Option 4 seems to be most preferred...
> > because it's the *only* option that is explicitly advised against by
> > the Maven developers (from the information I've read). I can see its
> > appeal, but I really don't think that we should introduce an explicit
> > problem for users (that applies to users using even the Hadoop version
> > we directly build against... not just those using Hadoop 2... I don't
> > know if that point was clear), to only partially support a version of
> > Hadoop that is still alpha and has never had a stable release.
> >
> > BTW, Option 4 was how I had have achieved a solution for
> > ACCUMULO-1402, but am reluctant to apply that patch, with this issue
> > outstanding, as it may exacerbate the problem.
> >
> > Another implication for Option 4 (the current "solution") is for
> > 1.6.0, with the planned accumulo-maven-plugin... because it means that
> > the accumulo-maven-plugin will need to be configured like this:
> > <plugin>
> >   <groupId>org.apache.accumulo</groupId>
> >   <artifactId>accumulo-maven-plugin</artifactId>
> >   <dependencies>
> >    ... all the required hadoop 1 dependencies to make the plugin work,
> > even though this version only works against hadoop 1 anyway...
> >   </dependencies>
> >   ...
> > </plugin>
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
> >
> > On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org>
> wrote:
> >> I think Option 2 is the best solution for "waiting until we have the
> >> time to solve the problem correctly", as it ensures that transitive
> >> dependencies work for the stable version of Hadoop, and using Hadoop2
> >> is a very simple documentation issue for how to apply the patch and
> >> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
> >> for users.
> >>
> >> Option 1 is how I'm tentatively thinking about fixing it properly in
> 1.6.0.
> >>
> >>
> >> --
> >> Christopher L Tubbs II
> >> http://gravatar.com/ctubbsii
> >>
> >>
> >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
> >>> I'm an advocate of option 4. You say that it's ignoring the problem,
> >>> whereas I think it's waiting until we have the time to solve the
> problem
> >>> correctly. Your reasoning for this is for standardizing for maven
> >>> conventions, but the other options, while more 'correct' from a maven
> >>> standpoint or a larger headache for our user base and ourselves. In
> either
> >>> case, we're going to be breaking some sort of convention, and while
> it's
> >>> not good, we should be doing the one that's less bad for US. The
> important
> >>> thing here, now, is that the poms work and we should go with the method
> >>> that leaves the work minimal for our end users to utilize them.
> >>>
> >>> I do agree that 1. is the correct option in the long run. More
> >>> specifically, I think it boils down to having a single module
> compatibility
> >>> layer, which is how hbase deals with this issue. But like you said, we
> >>> don't have the time to engineer a proper solution. So let sleeping
> dogs lie
> >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
> >>> cycles to do it right.
> >>>
> >>>
> >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
> wrote:
> >>>
> >>>> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
> >>>>
> >>>> The problem is basically that profiles should not contain
> >>>> dependencies, because profiles don't get activated transitively. A
> >>>> slide deck by the Maven developers point this out as a bad practice...
> >>>> yet it's a practice we rely on for our current implementation of
> >>>> Hadoop2 support
> >>>> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >>>> slide 80).
> >>>>
> >>>> What this means is that even if we go through the work of publishing
> >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
> >>>> binaries or our Hadoop2 binaries will be able to transitively resolve
> >>>> any dependencies defined in profiles. This has significant
> >>>> implications to user code that depends on Accumulo Maven artifacts.
> >>>> Every user will essentially have to explicitly add Hadoop dependencies
> >>>> for every Accumulo artifact that has dependencies on Hadoop, either
> >>>> because we directly or transitively depend on Hadoop (they'll have to
> >>>> peek into the profiles in our POMs and copy/paste the profile into
> >>>> their project). This becomes more complicated when we consider how
> >>>> users will try to use things like Instamo.
> >>>>
> >>>> There are workarounds, but none of them are really pleasant.
> >>>>
> >>>> 1. The best way to support both major Hadoop APIs is to have separate
> >>>> modules with separate dependencies directly in the POM. This is a fair
> >>>> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> >>>> This solution also gets us separate binaries for separate supported
> >>>> versions, which is useful.
> >>>>
> >>>> 2. A second option, and the preferred one I think for 1.5.0, is to put
> >>>> a Hadoop2 patch in the branch's contrib directory
> >>>> (branches/1.5/contrib) that patches the POM files to support building
> >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
> >>>> solution.)
> >>>>
> >>>> 3. A third option is to fork Accumulo, and maintain two separate
> >>>> builds (a more traditional technique). This adds merging nightmare for
> >>>> features/patches, but gets around some reflection hacks that we may
> >>>> have been motivated to do in the past. I'm not a fan of this option,
> >>>> particularly because I don't want to replicate the fork nightmare that
> >>>> has been the history of early Hadoop itself.
> >>>>
> >>>> 4. The last option is to do nothing and to continue to build with the
> >>>> separate profiles as we are, and make users discover and specify
> >>>> transitive dependencies entirely on their own. I think this is the
> >>>> worst option, as it essentially amounts to "ignore the problem".
> >>>>
> >>>> At the very least, it does not seem reasonable to complete
> >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >>>>
> >>>> Thoughts? Discussion? Vote on option?
> >>>>
> >>>> --
> >>>> Christopher L Tubbs II
> >>>> http://gravatar.com/ctubbsii
> >>>>
>

Re: Hadoop 2 compatibility issues

Posted by Benson Margulies <bi...@gmail.com>.

I am a maven developer, and I'm offering this advice based on my
understanding of reason why that generic advice is offered.

If you have different profiles that _build different results_ but all
deliver the same GAV, you have chaos.

If you have different profiles that test against different versions of
dependencies, but all deliver the same byte code at the end of the
day, you don't have chaos.



On Tue, May 14, 2013 at 5:48 PM, Christopher <ct...@apache.org> wrote:
> I think it's interesting that Option 4 seems to be most preferred...
> because it's the *only* option that is explicitly advised against by
> the Maven developers (from the information I've read). I can see its
> appeal, but I really don't think that we should introduce an explicit
> problem for users (that applies to users using even the Hadoop version
> we directly build against... not just those using Hadoop 2... I don't
> know if that point was clear), to only partially support a version of
> Hadoop that is still alpha and has never had a stable release.
>
> BTW, Option 4 was how I had have achieved a solution for
> ACCUMULO-1402, but am reluctant to apply that patch, with this issue
> outstanding, as it may exacerbate the problem.
>
> Another implication for Option 4 (the current "solution") is for
> 1.6.0, with the planned accumulo-maven-plugin... because it means that
> the accumulo-maven-plugin will need to be configured like this:
> <plugin>
>   <groupId>org.apache.accumulo</groupId>
>   <artifactId>accumulo-maven-plugin</artifactId>
>   <dependencies>
>    ... all the required hadoop 1 dependencies to make the plugin work,
> even though this version only works against hadoop 1 anyway...
>   </dependencies>
>   ...
> </plugin>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org> wrote:
>> I think Option 2 is the best solution for "waiting until we have the
>> time to solve the problem correctly", as it ensures that transitive
>> dependencies work for the stable version of Hadoop, and using Hadoop2
>> is a very simple documentation issue for how to apply the patch and
>> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
>> for users.
>>
>> Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0.
>>
>>
>> --
>> Christopher L Tubbs II
>> http://gravatar.com/ctubbsii
>>
>>
>> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
>>> I'm an advocate of option 4. You say that it's ignoring the problem,
>>> whereas I think it's waiting until we have the time to solve the problem
>>> correctly. Your reasoning for this is for standardizing for maven
>>> conventions, but the other options, while more 'correct' from a maven
>>> standpoint or a larger headache for our user base and ourselves. In either
>>> case, we're going to be breaking some sort of convention, and while it's
>>> not good, we should be doing the one that's less bad for US. The important
>>> thing here, now, is that the poms work and we should go with the method
>>> that leaves the work minimal for our end users to utilize them.
>>>
>>> I do agree that 1. is the correct option in the long run. More
>>> specifically, I think it boils down to having a single module compatibility
>>> layer, which is how hbase deals with this issue. But like you said, we
>>> don't have the time to engineer a proper solution. So let sleeping dogs lie
>>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
>>> cycles to do it right.
>>>
>>>
>>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:
>>>
>>>> So, I've run into a problem with ACCUMULO-1402 that requires a larger
>>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>>>
>>>> The problem is basically that profiles should not contain
>>>> dependencies, because profiles don't get activated transitively. A
>>>> slide deck by the Maven developers point this out as a bad practice...
>>>> yet it's a practice we rely on for our current implementation of
>>>> Hadoop2 support
>>>> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>>>> slide 80).
>>>>
>>>> What this means is that even if we go through the work of publishing
>>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>>>> binaries or our Hadoop2 binaries will be able to transitively resolve
>>>> any dependencies defined in profiles. This has significant
>>>> implications to user code that depends on Accumulo Maven artifacts.
>>>> Every user will essentially have to explicitly add Hadoop dependencies
>>>> for every Accumulo artifact that has dependencies on Hadoop, either
>>>> because we directly or transitively depend on Hadoop (they'll have to
>>>> peek into the profiles in our POMs and copy/paste the profile into
>>>> their project). This becomes more complicated when we consider how
>>>> users will try to use things like Instamo.
>>>>
>>>> There are workarounds, but none of them are really pleasant.
>>>>
>>>> 1. The best way to support both major Hadoop APIs is to have separate
>>>> modules with separate dependencies directly in the POM. This is a fair
>>>> amount of work, and in my opinion, would be too disruptive for 1.5.0.
>>>> This solution also gets us separate binaries for separate supported
>>>> versions, which is useful.
>>>>
>>>> 2. A second option, and the preferred one I think for 1.5.0, is to put
>>>> a Hadoop2 patch in the branch's contrib directory
>>>> (branches/1.5/contrib) that patches the POM files to support building
>>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>>>> solution.)
>>>>
>>>> 3. A third option is to fork Accumulo, and maintain two separate
>>>> builds (a more traditional technique). This adds merging nightmare for
>>>> features/patches, but gets around some reflection hacks that we may
>>>> have been motivated to do in the past. I'm not a fan of this option,
>>>> particularly because I don't want to replicate the fork nightmare that
>>>> has been the history of early Hadoop itself.
>>>>
>>>> 4. The last option is to do nothing and to continue to build with the
>>>> separate profiles as we are, and make users discover and specify
>>>> transitive dependencies entirely on their own. I think this is the
>>>> worst option, as it essentially amounts to "ignore the problem".
>>>>
>>>> At the very least, it does not seem reasonable to complete
>>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>>>
>>>> Thoughts? Discussion? Vote on option?
>>>>
>>>> --
>>>> Christopher L Tubbs II
>>>> http://gravatar.com/ctubbsii
>>>>

Re: Hadoop 2 compatibility issues

Posted by John Vines <vi...@apache.org>.

You're so quick to dismiss hadoop 2,but you really need to keep in mind how
pervasive it is. Even from our own software we can see how much people love
to run off of trunk, let alpha releases. But then one of the most popular
distributions, cdh, is more in line with it as well. Something to keep in
mind is that cdh3u5+ has hell of a lot in common with Hadoop 2 rather that
hadoop 1,with regard to api compatibilities. I'm sorry, but that's a user
base I would rather have some "unconventional" build code to support rather
than create an unnecessary headache for them and ourselves.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 5:48 PM, "Christopher" <ct...@apache.org> wrote:

> I think it's interesting that Option 4 seems to be most preferred...
> because it's the *only* option that is explicitly advised against by
> the Maven developers (from the information I've read). I can see its
> appeal, but I really don't think that we should introduce an explicit
> problem for users (that applies to users using even the Hadoop version
> we directly build against... not just those using Hadoop 2... I don't
> know if that point was clear), to only partially support a version of
> Hadoop that is still alpha and has never had a stable release.
>
> BTW, Option 4 was how I had have achieved a solution for
> ACCUMULO-1402, but am reluctant to apply that patch, with this issue
> outstanding, as it may exacerbate the problem.
>
> Another implication for Option 4 (the current "solution") is for
> 1.6.0, with the planned accumulo-maven-plugin... because it means that
> the accumulo-maven-plugin will need to be configured like this:
> <plugin>
>   <groupId>org.apache.accumulo</groupId>
>   <artifactId>accumulo-maven-plugin</artifactId>
>   <dependencies>
>    ... all the required hadoop 1 dependencies to make the plugin work,
> even though this version only works against hadoop 1 anyway...
>   </dependencies>
>   ...
> </plugin>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org> wrote:
> > I think Option 2 is the best solution for "waiting until we have the
> > time to solve the problem correctly", as it ensures that transitive
> > dependencies work for the stable version of Hadoop, and using Hadoop2
> > is a very simple documentation issue for how to apply the patch and
> > rebuild. Option 4 doesn't wait... it explicitly introduces a problem
> > for users.
> >
> > Option 1 is how I'm tentatively thinking about fixing it properly in
> 1.6.0.
> >
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
> >
> > On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
> >> I'm an advocate of option 4. You say that it's ignoring the problem,
> >> whereas I think it's waiting until we have the time to solve the problem
> >> correctly. Your reasoning for this is for standardizing for maven
> >> conventions, but the other options, while more 'correct' from a maven
> >> standpoint or a larger headache for our user base and ourselves. In
> either
> >> case, we're going to be breaking some sort of convention, and while it's
> >> not good, we should be doing the one that's less bad for US. The
> important
> >> thing here, now, is that the poms work and we should go with the method
> >> that leaves the work minimal for our end users to utilize them.
> >>
> >> I do agree that 1. is the correct option in the long run. More
> >> specifically, I think it boils down to having a single module
> compatibility
> >> layer, which is how hbase deals with this issue. But like you said, we
> >> don't have the time to engineer a proper solution. So let sleeping dogs
> lie
> >> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
> >> cycles to do it right.
> >>
> >>
> >> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
> wrote:
> >>
> >>> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> >>> discussion about how Accumulo 1.5.0 should support Hadoop2.
> >>>
> >>> The problem is basically that profiles should not contain
> >>> dependencies, because profiles don't get activated transitively. A
> >>> slide deck by the Maven developers point this out as a bad practice...
> >>> yet it's a practice we rely on for our current implementation of
> >>> Hadoop2 support
> >>> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >>> slide 80).
> >>>
> >>> What this means is that even if we go through the work of publishing
> >>> binary artifacts compiled against Hadoop2, neither our Hadoop1
> >>> binaries or our Hadoop2 binaries will be able to transitively resolve
> >>> any dependencies defined in profiles. This has significant
> >>> implications to user code that depends on Accumulo Maven artifacts.
> >>> Every user will essentially have to explicitly add Hadoop dependencies
> >>> for every Accumulo artifact that has dependencies on Hadoop, either
> >>> because we directly or transitively depend on Hadoop (they'll have to
> >>> peek into the profiles in our POMs and copy/paste the profile into
> >>> their project). This becomes more complicated when we consider how
> >>> users will try to use things like Instamo.
> >>>
> >>> There are workarounds, but none of them are really pleasant.
> >>>
> >>> 1. The best way to support both major Hadoop APIs is to have separate
> >>> modules with separate dependencies directly in the POM. This is a fair
> >>> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> >>> This solution also gets us separate binaries for separate supported
> >>> versions, which is useful.
> >>>
> >>> 2. A second option, and the preferred one I think for 1.5.0, is to put
> >>> a Hadoop2 patch in the branch's contrib directory
> >>> (branches/1.5/contrib) that patches the POM files to support building
> >>> against Hadoop2. (Acknowledgement to Keith for suggesting this
> >>> solution.)
> >>>
> >>> 3. A third option is to fork Accumulo, and maintain two separate
> >>> builds (a more traditional technique). This adds merging nightmare for
> >>> features/patches, but gets around some reflection hacks that we may
> >>> have been motivated to do in the past. I'm not a fan of this option,
> >>> particularly because I don't want to replicate the fork nightmare that
> >>> has been the history of early Hadoop itself.
> >>>
> >>> 4. The last option is to do nothing and to continue to build with the
> >>> separate profiles as we are, and make users discover and specify
> >>> transitive dependencies entirely on their own. I think this is the
> >>> worst option, as it essentially amounts to "ignore the problem".
> >>>
> >>> At the very least, it does not seem reasonable to complete
> >>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >>>
> >>> Thoughts? Discussion? Vote on option?
> >>>
> >>> --
> >>> Christopher L Tubbs II
> >>> http://gravatar.com/ctubbsii
> >>>
>

Re: Hadoop 2 compatibility issues

Posted by John Vines <vi...@apache.org>.

We can easily fix the break it the hadoop dependencies by making the switch
to hadoop-client and relying on hadoop.version to set/override the version.
The hadoop 2 profile is just needed to bring in additional dependencies and
possibly setting the hadoop version for convenience.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 5:48 PM, "Christopher" <ct...@apache.org> wrote:

> I think it's interesting that Option 4 seems to be most preferred...
> because it's the *only* option that is explicitly advised against by
> the Maven developers (from the information I've read). I can see its
> appeal, but I really don't think that we should introduce an explicit
> problem for users (that applies to users using even the Hadoop version
> we directly build against... not just those using Hadoop 2... I don't
> know if that point was clear), to only partially support a version of
> Hadoop that is still alpha and has never had a stable release.
>
> BTW, Option 4 was how I had have achieved a solution for
> ACCUMULO-1402, but am reluctant to apply that patch, with this issue
> outstanding, as it may exacerbate the problem.
>
> Another implication for Option 4 (the current "solution") is for
> 1.6.0, with the planned accumulo-maven-plugin... because it means that
> the accumulo-maven-plugin will need to be configured like this:
> <plugin>
>   <groupId>org.apache.accumulo</groupId>
>   <artifactId>accumulo-maven-plugin</artifactId>
>   <dependencies>
>    ... all the required hadoop 1 dependencies to make the plugin work,
> even though this version only works against hadoop 1 anyway...
>   </dependencies>
>   ...
> </plugin>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org> wrote:
> > I think Option 2 is the best solution for "waiting until we have the
> > time to solve the problem correctly", as it ensures that transitive
> > dependencies work for the stable version of Hadoop, and using Hadoop2
> > is a very simple documentation issue for how to apply the patch and
> > rebuild. Option 4 doesn't wait... it explicitly introduces a problem
> > for users.
> >
> > Option 1 is how I'm tentatively thinking about fixing it properly in
> 1.6.0.
> >
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
> >
> > On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
> >> I'm an advocate of option 4. You say that it's ignoring the problem,
> >> whereas I think it's waiting until we have the time to solve the problem
> >> correctly. Your reasoning for this is for standardizing for maven
> >> conventions, but the other options, while more 'correct' from a maven
> >> standpoint or a larger headache for our user base and ourselves. In
> either
> >> case, we're going to be breaking some sort of convention, and while it's
> >> not good, we should be doing the one that's less bad for US. The
> important
> >> thing here, now, is that the poms work and we should go with the method
> >> that leaves the work minimal for our end users to utilize them.
> >>
> >> I do agree that 1. is the correct option in the long run. More
> >> specifically, I think it boils down to having a single module
> compatibility
> >> layer, which is how hbase deals with this issue. But like you said, we
> >> don't have the time to engineer a proper solution. So let sleeping dogs
> lie
> >> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
> >> cycles to do it right.
> >>
> >>
> >> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
> wrote:
> >>
> >>> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> >>> discussion about how Accumulo 1.5.0 should support Hadoop2.
> >>>
> >>> The problem is basically that profiles should not contain
> >>> dependencies, because profiles don't get activated transitively. A
> >>> slide deck by the Maven developers point this out as a bad practice...
> >>> yet it's a practice we rely on for our current implementation of
> >>> Hadoop2 support
> >>> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >>> slide 80).
> >>>
> >>> What this means is that even if we go through the work of publishing
> >>> binary artifacts compiled against Hadoop2, neither our Hadoop1
> >>> binaries or our Hadoop2 binaries will be able to transitively resolve
> >>> any dependencies defined in profiles. This has significant
> >>> implications to user code that depends on Accumulo Maven artifacts.
> >>> Every user will essentially have to explicitly add Hadoop dependencies
> >>> for every Accumulo artifact that has dependencies on Hadoop, either
> >>> because we directly or transitively depend on Hadoop (they'll have to
> >>> peek into the profiles in our POMs and copy/paste the profile into
> >>> their project). This becomes more complicated when we consider how
> >>> users will try to use things like Instamo.
> >>>
> >>> There are workarounds, but none of them are really pleasant.
> >>>
> >>> 1. The best way to support both major Hadoop APIs is to have separate
> >>> modules with separate dependencies directly in the POM. This is a fair
> >>> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> >>> This solution also gets us separate binaries for separate supported
> >>> versions, which is useful.
> >>>
> >>> 2. A second option, and the preferred one I think for 1.5.0, is to put
> >>> a Hadoop2 patch in the branch's contrib directory
> >>> (branches/1.5/contrib) that patches the POM files to support building
> >>> against Hadoop2. (Acknowledgement to Keith for suggesting this
> >>> solution.)
> >>>
> >>> 3. A third option is to fork Accumulo, and maintain two separate
> >>> builds (a more traditional technique). This adds merging nightmare for
> >>> features/patches, but gets around some reflection hacks that we may
> >>> have been motivated to do in the past. I'm not a fan of this option,
> >>> particularly because I don't want to replicate the fork nightmare that
> >>> has been the history of early Hadoop itself.
> >>>
> >>> 4. The last option is to do nothing and to continue to build with the
> >>> separate profiles as we are, and make users discover and specify
> >>> transitive dependencies entirely on their own. I think this is the
> >>> worst option, as it essentially amounts to "ignore the problem".
> >>>
> >>> At the very least, it does not seem reasonable to complete
> >>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >>>
> >>> Thoughts? Discussion? Vote on option?
> >>>
> >>> --
> >>> Christopher L Tubbs II
> >>> http://gravatar.com/ctubbsii
> >>>
>

Re: Hadoop 2 compatibility issues

Posted by Christopher <ct...@apache.org>.

I think it's interesting that Option 4 seems to be most preferred...
because it's the *only* option that is explicitly advised against by
the Maven developers (from the information I've read). I can see its
appeal, but I really don't think that we should introduce an explicit
problem for users (that applies to users using even the Hadoop version
we directly build against... not just those using Hadoop 2... I don't
know if that point was clear), to only partially support a version of
Hadoop that is still alpha and has never had a stable release.

BTW, Option 4 was how I had have achieved a solution for
ACCUMULO-1402, but am reluctant to apply that patch, with this issue
outstanding, as it may exacerbate the problem.

Another implication for Option 4 (the current "solution") is for
1.6.0, with the planned accumulo-maven-plugin... because it means that
the accumulo-maven-plugin will need to be configured like this:
<plugin>
  <groupId>org.apache.accumulo</groupId>
  <artifactId>accumulo-maven-plugin</artifactId>
  <dependencies>
   ... all the required hadoop 1 dependencies to make the plugin work,
even though this version only works against hadoop 1 anyway...
  </dependencies>
  ...
</plugin>

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 5:42 PM, Christopher <ct...@apache.org> wrote:
> I think Option 2 is the best solution for "waiting until we have the
> time to solve the problem correctly", as it ensures that transitive
> dependencies work for the stable version of Hadoop, and using Hadoop2
> is a very simple documentation issue for how to apply the patch and
> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
> for users.
>
> Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0.
>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
>> I'm an advocate of option 4. You say that it's ignoring the problem,
>> whereas I think it's waiting until we have the time to solve the problem
>> correctly. Your reasoning for this is for standardizing for maven
>> conventions, but the other options, while more 'correct' from a maven
>> standpoint or a larger headache for our user base and ourselves. In either
>> case, we're going to be breaking some sort of convention, and while it's
>> not good, we should be doing the one that's less bad for US. The important
>> thing here, now, is that the poms work and we should go with the method
>> that leaves the work minimal for our end users to utilize them.
>>
>> I do agree that 1. is the correct option in the long run. More
>> specifically, I think it boils down to having a single module compatibility
>> layer, which is how hbase deals with this issue. But like you said, we
>> don't have the time to engineer a proper solution. So let sleeping dogs lie
>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
>> cycles to do it right.
>>
>>
>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:
>>
>>> So, I've run into a problem with ACCUMULO-1402 that requires a larger
>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>>
>>> The problem is basically that profiles should not contain
>>> dependencies, because profiles don't get activated transitively. A
>>> slide deck by the Maven developers point this out as a bad practice...
>>> yet it's a practice we rely on for our current implementation of
>>> Hadoop2 support
>>> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>>> slide 80).
>>>
>>> What this means is that even if we go through the work of publishing
>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>>> binaries or our Hadoop2 binaries will be able to transitively resolve
>>> any dependencies defined in profiles. This has significant
>>> implications to user code that depends on Accumulo Maven artifacts.
>>> Every user will essentially have to explicitly add Hadoop dependencies
>>> for every Accumulo artifact that has dependencies on Hadoop, either
>>> because we directly or transitively depend on Hadoop (they'll have to
>>> peek into the profiles in our POMs and copy/paste the profile into
>>> their project). This becomes more complicated when we consider how
>>> users will try to use things like Instamo.
>>>
>>> There are workarounds, but none of them are really pleasant.
>>>
>>> 1. The best way to support both major Hadoop APIs is to have separate
>>> modules with separate dependencies directly in the POM. This is a fair
>>> amount of work, and in my opinion, would be too disruptive for 1.5.0.
>>> This solution also gets us separate binaries for separate supported
>>> versions, which is useful.
>>>
>>> 2. A second option, and the preferred one I think for 1.5.0, is to put
>>> a Hadoop2 patch in the branch's contrib directory
>>> (branches/1.5/contrib) that patches the POM files to support building
>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>>> solution.)
>>>
>>> 3. A third option is to fork Accumulo, and maintain two separate
>>> builds (a more traditional technique). This adds merging nightmare for
>>> features/patches, but gets around some reflection hacks that we may
>>> have been motivated to do in the past. I'm not a fan of this option,
>>> particularly because I don't want to replicate the fork nightmare that
>>> has been the history of early Hadoop itself.
>>>
>>> 4. The last option is to do nothing and to continue to build with the
>>> separate profiles as we are, and make users discover and specify
>>> transitive dependencies entirely on their own. I think this is the
>>> worst option, as it essentially amounts to "ignore the problem".
>>>
>>> At the very least, it does not seem reasonable to complete
>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>>
>>> Thoughts? Discussion? Vote on option?
>>>
>>> --
>>> Christopher L Tubbs II
>>> http://gravatar.com/ctubbsii
>>>

Re: Hadoop 2 compatibility issues

Posted by Christopher <ct...@apache.org>.

I think Option 2 is the best solution for "waiting until we have the
time to solve the problem correctly", as it ensures that transitive
dependencies work for the stable version of Hadoop, and using Hadoop2
is a very simple documentation issue for how to apply the patch and
rebuild. Option 4 doesn't wait... it explicitly introduces a problem
for users.

Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0.


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
> I'm an advocate of option 4. You say that it's ignoring the problem,
> whereas I think it's waiting until we have the time to solve the problem
> correctly. Your reasoning for this is for standardizing for maven
> conventions, but the other options, while more 'correct' from a maven
> standpoint or a larger headache for our user base and ourselves. In either
> case, we're going to be breaking some sort of convention, and while it's
> not good, we should be doing the one that's less bad for US. The important
> thing here, now, is that the poms work and we should go with the method
> that leaves the work minimal for our end users to utilize them.
>
> I do agree that 1. is the correct option in the long run. More
> specifically, I think it boils down to having a single module compatibility
> layer, which is how hbase deals with this issue. But like you said, we
> don't have the time to engineer a proper solution. So let sleeping dogs lie
> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
> cycles to do it right.
>
>
> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:
>
>> So, I've run into a problem with ACCUMULO-1402 that requires a larger
>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>
>> The problem is basically that profiles should not contain
>> dependencies, because profiles don't get activated transitively. A
>> slide deck by the Maven developers point this out as a bad practice...
>> yet it's a practice we rely on for our current implementation of
>> Hadoop2 support
>> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>> slide 80).
>>
>> What this means is that even if we go through the work of publishing
>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>> binaries or our Hadoop2 binaries will be able to transitively resolve
>> any dependencies defined in profiles. This has significant
>> implications to user code that depends on Accumulo Maven artifacts.
>> Every user will essentially have to explicitly add Hadoop dependencies
>> for every Accumulo artifact that has dependencies on Hadoop, either
>> because we directly or transitively depend on Hadoop (they'll have to
>> peek into the profiles in our POMs and copy/paste the profile into
>> their project). This becomes more complicated when we consider how
>> users will try to use things like Instamo.
>>
>> There are workarounds, but none of them are really pleasant.
>>
>> 1. The best way to support both major Hadoop APIs is to have separate
>> modules with separate dependencies directly in the POM. This is a fair
>> amount of work, and in my opinion, would be too disruptive for 1.5.0.
>> This solution also gets us separate binaries for separate supported
>> versions, which is useful.
>>
>> 2. A second option, and the preferred one I think for 1.5.0, is to put
>> a Hadoop2 patch in the branch's contrib directory
>> (branches/1.5/contrib) that patches the POM files to support building
>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>> solution.)
>>
>> 3. A third option is to fork Accumulo, and maintain two separate
>> builds (a more traditional technique). This adds merging nightmare for
>> features/patches, but gets around some reflection hacks that we may
>> have been motivated to do in the past. I'm not a fan of this option,
>> particularly because I don't want to replicate the fork nightmare that
>> has been the history of early Hadoop itself.
>>
>> 4. The last option is to do nothing and to continue to build with the
>> separate profiles as we are, and make users discover and specify
>> transitive dependencies entirely on their own. I think this is the
>> worst option, as it essentially amounts to "ignore the problem".
>>
>> At the very least, it does not seem reasonable to complete
>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>
>> Thoughts? Discussion? Vote on option?
>>
>> --
>> Christopher L Tubbs II
>> http://gravatar.com/ctubbsii
>>

Re: Hadoop 2 compatibility issues

Posted by Adam Fuchs <af...@apache.org>.

I tend to agree with Sean, John, and Benson. Option 4 works for now, and
until we can define something that works better (e.g. runtime compatibility
with both hadoop 1 and 2 using reflection and crazy class loaders) we
should not delay the release. Good docs are always helpful where
engineering is less than ideal (egad, I hope I didn't just volunteer!).

Adam


On Tue, May 14, 2013 at 5:16 PM, Benson Margulies <bi...@gmail.com>wrote:

> CXF does (4) for the various competing JAX-WS implementations.
>
> The different options are API-compatible, and the profiles just switch
> the deps around.
>
> There would be slightly more Maven correctness in marking the deps
> optional, forcing each user to pick one explicitly.
>
> However, (4) with good doc on what to put in the POM is really not a
> cause for shame. Maven is weak in this area, and it's all tradeoffs.
>
>
>
> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
> > I'm an advocate of option 4. You say that it's ignoring the problem,
> > whereas I think it's waiting until we have the time to solve the problem
> > correctly. Your reasoning for this is for standardizing for maven
> > conventions, but the other options, while more 'correct' from a maven
> > standpoint or a larger headache for our user base and ourselves. In
> either
> > case, we're going to be breaking some sort of convention, and while it's
> > not good, we should be doing the one that's less bad for US. The
> important
> > thing here, now, is that the poms work and we should go with the method
> > that leaves the work minimal for our end users to utilize them.
> >
> > I do agree that 1. is the correct option in the long run. More
> > specifically, I think it boils down to having a single module
> compatibility
> > layer, which is how hbase deals with this issue. But like you said, we
> > don't have the time to engineer a proper solution. So let sleeping dogs
> lie
> > and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
> > cycles to do it right.
> >
> >
> > On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org>
> wrote:
> >
> >> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> >> discussion about how Accumulo 1.5.0 should support Hadoop2.
> >>
> >> The problem is basically that profiles should not contain
> >> dependencies, because profiles don't get activated transitively. A
> >> slide deck by the Maven developers point this out as a bad practice...
> >> yet it's a practice we rely on for our current implementation of
> >> Hadoop2 support
> >> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >> slide 80).
> >>
> >> What this means is that even if we go through the work of publishing
> >> binary artifacts compiled against Hadoop2, neither our Hadoop1
> >> binaries or our Hadoop2 binaries will be able to transitively resolve
> >> any dependencies defined in profiles. This has significant
> >> implications to user code that depends on Accumulo Maven artifacts.
> >> Every user will essentially have to explicitly add Hadoop dependencies
> >> for every Accumulo artifact that has dependencies on Hadoop, either
> >> because we directly or transitively depend on Hadoop (they'll have to
> >> peek into the profiles in our POMs and copy/paste the profile into
> >> their project). This becomes more complicated when we consider how
> >> users will try to use things like Instamo.
> >>
> >> There are workarounds, but none of them are really pleasant.
> >>
> >> 1. The best way to support both major Hadoop APIs is to have separate
> >> modules with separate dependencies directly in the POM. This is a fair
> >> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> >> This solution also gets us separate binaries for separate supported
> >> versions, which is useful.
> >>
> >> 2. A second option, and the preferred one I think for 1.5.0, is to put
> >> a Hadoop2 patch in the branch's contrib directory
> >> (branches/1.5/contrib) that patches the POM files to support building
> >> against Hadoop2. (Acknowledgement to Keith for suggesting this
> >> solution.)
> >>
> >> 3. A third option is to fork Accumulo, and maintain two separate
> >> builds (a more traditional technique). This adds merging nightmare for
> >> features/patches, but gets around some reflection hacks that we may
> >> have been motivated to do in the past. I'm not a fan of this option,
> >> particularly because I don't want to replicate the fork nightmare that
> >> has been the history of early Hadoop itself.
> >>
> >> 4. The last option is to do nothing and to continue to build with the
> >> separate profiles as we are, and make users discover and specify
> >> transitive dependencies entirely on their own. I think this is the
> >> worst option, as it essentially amounts to "ignore the problem".
> >>
> >> At the very least, it does not seem reasonable to complete
> >> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >>
> >> Thoughts? Discussion? Vote on option?
> >>
> >> --
> >> Christopher L Tubbs II
> >> http://gravatar.com/ctubbsii
> >>
>

Re: Hadoop 2 compatibility issues

Posted by Benson Margulies <bi...@gmail.com>.

CXF does (4) for the various competing JAX-WS implementations.

The different options are API-compatible, and the profiles just switch
the deps around.

There would be slightly more Maven correctness in marking the deps
optional, forcing each user to pick one explicitly.

However, (4) with good doc on what to put in the POM is really not a
cause for shame. Maven is weak in this area, and it's all tradeoffs.



On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
> I'm an advocate of option 4. You say that it's ignoring the problem,
> whereas I think it's waiting until we have the time to solve the problem
> correctly. Your reasoning for this is for standardizing for maven
> conventions, but the other options, while more 'correct' from a maven
> standpoint or a larger headache for our user base and ourselves. In either
> case, we're going to be breaking some sort of convention, and while it's
> not good, we should be doing the one that's less bad for US. The important
> thing here, now, is that the poms work and we should go with the method
> that leaves the work minimal for our end users to utilize them.
>
> I do agree that 1. is the correct option in the long run. More
> specifically, I think it boils down to having a single module compatibility
> layer, which is how hbase deals with this issue. But like you said, we
> don't have the time to engineer a proper solution. So let sleeping dogs lie
> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
> cycles to do it right.
>
>
> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:
>
>> So, I've run into a problem with ACCUMULO-1402 that requires a larger
>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>
>> The problem is basically that profiles should not contain
>> dependencies, because profiles don't get activated transitively. A
>> slide deck by the Maven developers point this out as a bad practice...
>> yet it's a practice we rely on for our current implementation of
>> Hadoop2 support
>> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>> slide 80).
>>
>> What this means is that even if we go through the work of publishing
>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>> binaries or our Hadoop2 binaries will be able to transitively resolve
>> any dependencies defined in profiles. This has significant
>> implications to user code that depends on Accumulo Maven artifacts.
>> Every user will essentially have to explicitly add Hadoop dependencies
>> for every Accumulo artifact that has dependencies on Hadoop, either
>> because we directly or transitively depend on Hadoop (they'll have to
>> peek into the profiles in our POMs and copy/paste the profile into
>> their project). This becomes more complicated when we consider how
>> users will try to use things like Instamo.
>>
>> There are workarounds, but none of them are really pleasant.
>>
>> 1. The best way to support both major Hadoop APIs is to have separate
>> modules with separate dependencies directly in the POM. This is a fair
>> amount of work, and in my opinion, would be too disruptive for 1.5.0.
>> This solution also gets us separate binaries for separate supported
>> versions, which is useful.
>>
>> 2. A second option, and the preferred one I think for 1.5.0, is to put
>> a Hadoop2 patch in the branch's contrib directory
>> (branches/1.5/contrib) that patches the POM files to support building
>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>> solution.)
>>
>> 3. A third option is to fork Accumulo, and maintain two separate
>> builds (a more traditional technique). This adds merging nightmare for
>> features/patches, but gets around some reflection hacks that we may
>> have been motivated to do in the past. I'm not a fan of this option,
>> particularly because I don't want to replicate the fork nightmare that
>> has been the history of early Hadoop itself.
>>
>> 4. The last option is to do nothing and to continue to build with the
>> separate profiles as we are, and make users discover and specify
>> transitive dependencies entirely on their own. I think this is the
>> worst option, as it essentially amounts to "ignore the problem".
>>
>> At the very least, it does not seem reasonable to complete
>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>
>> Thoughts? Discussion? Vote on option?
>>
>> --
>> Christopher L Tubbs II
>> http://gravatar.com/ctubbsii
>>

Re: Hadoop 2 compatibility issues

Posted by John Vines <vi...@apache.org>.

I'm an advocate of option 4. You say that it's ignoring the problem,
whereas I think it's waiting until we have the time to solve the problem
correctly. Your reasoning for this is for standardizing for maven
conventions, but the other options, while more 'correct' from a maven
standpoint or a larger headache for our user base and ourselves. In either
case, we're going to be breaking some sort of convention, and while it's
not good, we should be doing the one that's less bad for US. The important
thing here, now, is that the poms work and we should go with the method
that leaves the work minimal for our end users to utilize them.

I do agree that 1. is the correct option in the long run. More
specifically, I think it boils down to having a single module compatibility
layer, which is how hbase deals with this issue. But like you said, we
don't have the time to engineer a proper solution. So let sleeping dogs lie
and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
cycles to do it right.


On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:

> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> discussion about how Accumulo 1.5.0 should support Hadoop2.
>
> The problem is basically that profiles should not contain
> dependencies, because profiles don't get activated transitively. A
> slide deck by the Maven developers point this out as a bad practice...
> yet it's a practice we rely on for our current implementation of
> Hadoop2 support
> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> slide 80).
>
> What this means is that even if we go through the work of publishing
> binary artifacts compiled against Hadoop2, neither our Hadoop1
> binaries or our Hadoop2 binaries will be able to transitively resolve
> any dependencies defined in profiles. This has significant
> implications to user code that depends on Accumulo Maven artifacts.
> Every user will essentially have to explicitly add Hadoop dependencies
> for every Accumulo artifact that has dependencies on Hadoop, either
> because we directly or transitively depend on Hadoop (they'll have to
> peek into the profiles in our POMs and copy/paste the profile into
> their project). This becomes more complicated when we consider how
> users will try to use things like Instamo.
>
> There are workarounds, but none of them are really pleasant.
>
> 1. The best way to support both major Hadoop APIs is to have separate
> modules with separate dependencies directly in the POM. This is a fair
> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> This solution also gets us separate binaries for separate supported
> versions, which is useful.
>
> 2. A second option, and the preferred one I think for 1.5.0, is to put
> a Hadoop2 patch in the branch's contrib directory
> (branches/1.5/contrib) that patches the POM files to support building
> against Hadoop2. (Acknowledgement to Keith for suggesting this
> solution.)
>
> 3. A third option is to fork Accumulo, and maintain two separate
> builds (a more traditional technique). This adds merging nightmare for
> features/patches, but gets around some reflection hacks that we may
> have been motivated to do in the past. I'm not a fan of this option,
> particularly because I don't want to replicate the fork nightmare that
> has been the history of early Hadoop itself.
>
> 4. The last option is to do nothing and to continue to build with the
> separate profiles as we are, and make users discover and specify
> transitive dependencies entirely on their own. I think this is the
> worst option, as it essentially amounts to "ignore the problem".
>
> At the very least, it does not seem reasonable to complete
> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>
> Thoughts? Discussion? Vote on option?
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>

Re: Hadoop 2 compatibility issues

Posted by Sean Busbey <bu...@cloudera.com>.

This is part of my thinking. All of the dependencies included in the
profiles for Avro are marked provided. Provided scope, by definition, is
not transitive. Thus, it doesn't really matter that they aren't transitive
*also* because of the profile.

Is Accumulo including anything other than things provided by either Hadoop
1 or 2?



On Tue, May 14, 2013 at 6:08 PM, Keith Turner <ke...@deenlo.com> wrote:

> One note about option 4.  When using 1.4 users have to include hadoop core
> as a dependency in their pom. This must be done because the 1.4 Accumulo
> pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps in
> the profile are provided?
>
>
> On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:
>
> > So, I've run into a problem with ACCUMULO-1402 that requires a larger
> > discussion about how Accumulo 1.5.0 should support Hadoop2.
> >
> > The problem is basically that profiles should not contain
> > dependencies, because profiles don't get activated transitively. A
> > slide deck by the Maven developers point this out as a bad practice...
> > yet it's a practice we rely on for our current implementation of
> > Hadoop2 support
> > (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> > slide 80).
> >
> > What this means is that even if we go through the work of publishing
> > binary artifacts compiled against Hadoop2, neither our Hadoop1
> > binaries or our Hadoop2 binaries will be able to transitively resolve
> > any dependencies defined in profiles. This has significant
> > implications to user code that depends on Accumulo Maven artifacts.
> > Every user will essentially have to explicitly add Hadoop dependencies
> > for every Accumulo artifact that has dependencies on Hadoop, either
> > because we directly or transitively depend on Hadoop (they'll have to
> > peek into the profiles in our POMs and copy/paste the profile into
> > their project). This becomes more complicated when we consider how
> > users will try to use things like Instamo.
> >
> > There are workarounds, but none of them are really pleasant.
> >
> > 1. The best way to support both major Hadoop APIs is to have separate
> > modules with separate dependencies directly in the POM. This is a fair
> > amount of work, and in my opinion, would be too disruptive for 1.5.0.
> > This solution also gets us separate binaries for separate supported
> > versions, which is useful.
> >
> > 2. A second option, and the preferred one I think for 1.5.0, is to put
> > a Hadoop2 patch in the branch's contrib directory
> > (branches/1.5/contrib) that patches the POM files to support building
> > against Hadoop2. (Acknowledgement to Keith for suggesting this
> > solution.)
> >
> > 3. A third option is to fork Accumulo, and maintain two separate
> > builds (a more traditional technique). This adds merging nightmare for
> > features/patches, but gets around some reflection hacks that we may
> > have been motivated to do in the past. I'm not a fan of this option,
> > particularly because I don't want to replicate the fork nightmare that
> > has been the history of early Hadoop itself.
> >
> > 4. The last option is to do nothing and to continue to build with the
> > separate profiles as we are, and make users discover and specify
> > transitive dependencies entirely on their own. I think this is the
> > worst option, as it essentially amounts to "ignore the problem".
> >
> > At the very least, it does not seem reasonable to complete
> > ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >
> > Thoughts? Discussion? Vote on option?
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
>



-- 
Sean

Re: Hadoop 2 compatibility issues

Posted by Keith Turner <ke...@deenlo.com>.

One note about option 4.  When using 1.4 users have to include hadoop core
as a dependency in their pom. This must be done because the 1.4 Accumulo
pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps in
the profile are provided?


On Tue, May 14, 2013 at 4:40 PM, Christopher <ct...@apache.org> wrote:

> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> discussion about how Accumulo 1.5.0 should support Hadoop2.
>
> The problem is basically that profiles should not contain
> dependencies, because profiles don't get activated transitively. A
> slide deck by the Maven developers point this out as a bad practice...
> yet it's a practice we rely on for our current implementation of
> Hadoop2 support
> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> slide 80).
>
> What this means is that even if we go through the work of publishing
> binary artifacts compiled against Hadoop2, neither our Hadoop1
> binaries or our Hadoop2 binaries will be able to transitively resolve
> any dependencies defined in profiles. This has significant
> implications to user code that depends on Accumulo Maven artifacts.
> Every user will essentially have to explicitly add Hadoop dependencies
> for every Accumulo artifact that has dependencies on Hadoop, either
> because we directly or transitively depend on Hadoop (they'll have to
> peek into the profiles in our POMs and copy/paste the profile into
> their project). This becomes more complicated when we consider how
> users will try to use things like Instamo.
>
> There are workarounds, but none of them are really pleasant.
>
> 1. The best way to support both major Hadoop APIs is to have separate
> modules with separate dependencies directly in the POM. This is a fair
> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> This solution also gets us separate binaries for separate supported
> versions, which is useful.
>
> 2. A second option, and the preferred one I think for 1.5.0, is to put
> a Hadoop2 patch in the branch's contrib directory
> (branches/1.5/contrib) that patches the POM files to support building
> against Hadoop2. (Acknowledgement to Keith for suggesting this
> solution.)
>
> 3. A third option is to fork Accumulo, and maintain two separate
> builds (a more traditional technique). This adds merging nightmare for
> features/patches, but gets around some reflection hacks that we may
> have been motivated to do in the past. I'm not a fan of this option,
> particularly because I don't want to replicate the fork nightmare that
> has been the history of early Hadoop itself.
>
> 4. The last option is to do nothing and to continue to build with the
> separate profiles as we are, and make users discover and specify
> transitive dependencies entirely on their own. I think this is the
> worst option, as it essentially amounts to "ignore the problem".
>
> At the very least, it does not seem reasonable to complete
> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>
> Thoughts? Discussion? Vote on option?
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>