You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "张铎 (Duo Zhang)" <pa...@gmail.com> on 2019/05/29 13:40:17 UTC

[DISCUSS] Publishing hbase binaries with different hadoop release lines

See the comments here

https://issues.apache.org/jira/browse/HBASE-22394

Although we claim that hbase 2.1.x can work together with hadoop 3.1.x,
actually we require users to build the hbase binary with hadoop 3.1.x by
their own if they really want to use hbase together with hadoop 3.1.x
clients.

The problem for HBASE-22394 is that our asyncfswal references some
IA.Private classes and they have been changed between minor releases. It
can be solved by using reflection. But in general, since hadoop also
follows the semantic versioning, I do not think it is safe to do drop-in
replacement between minor releases. So maybe a better way is to publish a
hbase binary for each hadoop minor release line?

Thanks.

Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines

Posted by Josh Elser <el...@apache.org>.
Totally acknowledge and understand the scope of the problem, but I worry 
about expanding our test-matrix more. I also worry that until we get 
some LimitedPrivate/Public API for what asyncfswal needs out of Hadoop, 
we'll be chasing our tail trying to maintain compatibility.

Is it possible for us to choose a Hadoop version (or two) that we 
generally expect people to use and target that instead of going to minor 
version?

For example, we could publish with Hadoop 2.8.latest and 3.1.latest. We 
may work with versions beyond that, but we provide (as a service to our 
users) some basic versions of Hadoop that they should be able to pull 
off the shelf.

This isn't really different than what we do today, other than setting 
different defaults for hadoop-two.version and hadoop-three.version. I 
think it would be more in "communicating" to our downstream users that, 
what we provide as a convenience, is expected to work against specific 
Hadoop versions.

On 5/29/19 9:40 AM, 张铎(Duo Zhang) wrote:
> See the comments here
> 
> https://issues.apache.org/jira/browse/HBASE-22394
> 
> Although we claim that hbase 2.1.x can work together with hadoop 3.1.x,
> actually we require users to build the hbase binary with hadoop 3.1.x by
> their own if they really want to use hbase together with hadoop 3.1.x
> clients.
> 
> The problem for HBASE-22394 is that our asyncfswal references some
> IA.Private classes and they have been changed between minor releases. It
> can be solved by using reflection. But in general, since hadoop also
> follows the semantic versioning, I do not think it is safe to do drop-in
> replacement between minor releases. So maybe a better way is to publish a
> hbase binary for each hadoop minor release line?
> 
> Thanks.
> 

Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines

Posted by Artem Ervits <ar...@gmail.com>.
I can't comment whether we need every Hadoop release but at the minimum,
2.8.5 as we recently switched to it from 2.7.7. I ran into issues with
2.8.5 and 2.1.5rc0 and used workaround in
https://issues.apache.org/jira/browse/HBASE-22052 to overcome it. I guess
if 2.1 will not live past 2.1.6 then it's irrelevant. At the minimum we
need to document steps to build an HBase binary with explicit Hadoop
version. Yes, it's there but we need to do a better job bringing this
information forward.

On a separate note, I ran into issues with 2.2rc4 and Hadoop 2.9.2 and plan
to improve documentation for workarounds, initial thoughts are in
https://issues.apache.org/jira/browse/HBASE-22465. I want to emphacose both
sides of the coin, with strict durability and without. Thoughts welcome on
the jira.



On Wed, May 29, 2019, 9:41 AM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> See the comments here
>
> https://issues.apache.org/jira/browse/HBASE-22394
>
> Although we claim that hbase 2.1.x can work together with hadoop 3.1.x,
> actually we require users to build the hbase binary with hadoop 3.1.x by
> their own if they really want to use hbase together with hadoop 3.1.x
> clients.
>
> The problem for HBASE-22394 is that our asyncfswal references some
> IA.Private classes and they have been changed between minor releases. It
> can be solved by using reflection. But in general, since hadoop also
> follows the semantic versioning, I do not think it is safe to do drop-in
> replacement between minor releases. So maybe a better way is to publish a
> hbase binary for each hadoop minor release line?
>
> Thanks.
>

Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines

Posted by Andrew Purtell <ap...@apache.org>.
The separate compatibility modules repo idea is appealing, especially if it
can provide a single jar that also shades and includes Hadoop classes and
their specific dependencies. This would simplify version specific
deployment: Unpack HBase tarball, remove included hbase-hadoop-*.jar,
download appropriate hbase-hadoop-*.jar from hbase-hadoop-compat artifact,
put it into lib/, you are good to go. No need to worry about multiple
Hadoop component jars, their dependencies like guava and jackson and
whatnot.

The build and assembly files for such an uber compat module would also
serve as a template for anyone who wants to roll their own for a new Hadoop
version or an API compatible set of alternative jars.

On Thu, May 30, 2019 at 6:33 AM Raymond Lau <rl...@attivio.com> wrote:

> The idea of a separate repo with compatibility modules sounds appealing.
> This refinement has several advantages esp if we separate out the
> "compatibility API" from the actual compatibility module implementation:
>
> 1. We can choose certain major release lines of Hadoop and provide modules
> for those lines.
> 2. Customers wishing to support other Hadoop release lines can then create
> their own compatibility module. As long as the implement the API interfaces
> (or extend the API abstract classes), and get it to compile/work against
> their release of Hadoop, all is good.  The API can be versioned along with
> hbase as a separate artifact.
>
> -----Original Message-----
> From: Sean Busbey <bu...@apache.org>
> Sent: Thursday, May 30, 2019 9:06 AM
> To: user@hbase.apache.org
> Cc: dev <de...@hbase.apache.org>
> Subject: Re: [DISCUSS] Publishing hbase binaries with different hadoop
> release lines
>
>
> What about moving back to having a per-major-version-of-hadoop
> compatibility module again that builds against one needed major version all
> the time? (Presumably with some shell script magic to pick the right one?)
> that would be preferable imho to e.g. producing main project binary
> tarballs per Hadoop version.
>
> Or! We could move stuff that relies on brittle Hadoop internals into its
> own repo (or one of our existing repos) and build _that_ with binaries for
> our supported Hadoop versions. Then in the main project we can include the
> appropriate artifact for the version of Hadoop we happen to build with
> (essentially leaving the main repo how it is) and update our "replace the
> version of Hadoop!" note to including replacing this "HBase stuff that's
> closely tied to Hadoop internals" jar as well.
>
> On Wed, May 29, 2019, 08:41 张铎(Duo Zhang) <pa...@gmail.com> wrote:
>
>

-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines

Posted by Andrew Purtell <ap...@apache.org>.
The separate compatibility modules repo idea is appealing, especially if it
can provide a single jar that also shades and includes Hadoop classes and
their specific dependencies. This would simplify version specific
deployment: Unpack HBase tarball, remove included hbase-hadoop-*.jar,
download appropriate hbase-hadoop-*.jar from hbase-hadoop-compat artifact,
put it into lib/, you are good to go. No need to worry about multiple
Hadoop component jars, their dependencies like guava and jackson and
whatnot.

The build and assembly files for such an uber compat module would also
serve as a template for anyone who wants to roll their own for a new Hadoop
version or an API compatible set of alternative jars.

On Thu, May 30, 2019 at 6:33 AM Raymond Lau <rl...@attivio.com> wrote:

> The idea of a separate repo with compatibility modules sounds appealing.
> This refinement has several advantages esp if we separate out the
> "compatibility API" from the actual compatibility module implementation:
>
> 1. We can choose certain major release lines of Hadoop and provide modules
> for those lines.
> 2. Customers wishing to support other Hadoop release lines can then create
> their own compatibility module. As long as the implement the API interfaces
> (or extend the API abstract classes), and get it to compile/work against
> their release of Hadoop, all is good.  The API can be versioned along with
> hbase as a separate artifact.
>
> -----Original Message-----
> From: Sean Busbey <bu...@apache.org>
> Sent: Thursday, May 30, 2019 9:06 AM
> To: user@hbase.apache.org
> Cc: dev <de...@hbase.apache.org>
> Subject: Re: [DISCUSS] Publishing hbase binaries with different hadoop
> release lines
>
>
> What about moving back to having a per-major-version-of-hadoop
> compatibility module again that builds against one needed major version all
> the time? (Presumably with some shell script magic to pick the right one?)
> that would be preferable imho to e.g. producing main project binary
> tarballs per Hadoop version.
>
> Or! We could move stuff that relies on brittle Hadoop internals into its
> own repo (or one of our existing repos) and build _that_ with binaries for
> our supported Hadoop versions. Then in the main project we can include the
> appropriate artifact for the version of Hadoop we happen to build with
> (essentially leaving the main repo how it is) and update our "replace the
> version of Hadoop!" note to including replacing this "HBase stuff that's
> closely tied to Hadoop internals" jar as well.
>
> On Wed, May 29, 2019, 08:41 张铎(Duo Zhang) <pa...@gmail.com> wrote:
>
>

-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

RE: [DISCUSS] Publishing hbase binaries with different hadoop release lines

Posted by Raymond Lau <rl...@attivio.com>.
The idea of a separate repo with compatibility modules sounds appealing.  This refinement has several advantages esp if we separate out the "compatibility API" from the actual compatibility module implementation:

1. We can choose certain major release lines of Hadoop and provide modules for those lines.
2. Customers wishing to support other Hadoop release lines can then create their own compatibility module. As long as the implement the API interfaces (or extend the API abstract classes), and get it to compile/work against their release of Hadoop, all is good.  The API can be versioned along with hbase as a separate artifact.

-----Original Message-----
From: Sean Busbey <bu...@apache.org> 
Sent: Thursday, May 30, 2019 9:06 AM
To: user@hbase.apache.org
Cc: dev <de...@hbase.apache.org>
Subject: Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines


What about moving back to having a per-major-version-of-hadoop compatibility module again that builds against one needed major version all the time? (Presumably with some shell script magic to pick the right one?) that would be preferable imho to e.g. producing main project binary tarballs per Hadoop version.

Or! We could move stuff that relies on brittle Hadoop internals into its own repo (or one of our existing repos) and build _that_ with binaries for our supported Hadoop versions. Then in the main project we can include the appropriate artifact for the version of Hadoop we happen to build with (essentially leaving the main repo how it is) and update our "replace the version of Hadoop!" note to including replacing this "HBase stuff that's closely tied to Hadoop internals" jar as well.

On Wed, May 29, 2019, 08:41 张铎(Duo Zhang) <pa...@gmail.com> wrote:


RE: [DISCUSS] Publishing hbase binaries with different hadoop release lines

Posted by Raymond Lau <rl...@attivio.com>.
The idea of a separate repo with compatibility modules sounds appealing.  This refinement has several advantages esp if we separate out the "compatibility API" from the actual compatibility module implementation:

1. We can choose certain major release lines of Hadoop and provide modules for those lines.
2. Customers wishing to support other Hadoop release lines can then create their own compatibility module. As long as the implement the API interfaces (or extend the API abstract classes), and get it to compile/work against their release of Hadoop, all is good.  The API can be versioned along with hbase as a separate artifact.

-----Original Message-----
From: Sean Busbey <bu...@apache.org> 
Sent: Thursday, May 30, 2019 9:06 AM
To: user@hbase.apache.org
Cc: dev <de...@hbase.apache.org>
Subject: Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines


What about moving back to having a per-major-version-of-hadoop compatibility module again that builds against one needed major version all the time? (Presumably with some shell script magic to pick the right one?) that would be preferable imho to e.g. producing main project binary tarballs per Hadoop version.

Or! We could move stuff that relies on brittle Hadoop internals into its own repo (or one of our existing repos) and build _that_ with binaries for our supported Hadoop versions. Then in the main project we can include the appropriate artifact for the version of Hadoop we happen to build with (essentially leaving the main repo how it is) and update our "replace the version of Hadoop!" note to including replacing this "HBase stuff that's closely tied to Hadoop internals" jar as well.

On Wed, May 29, 2019, 08:41 张铎(Duo Zhang) <pa...@gmail.com> wrote:


Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines

Posted by Sean Busbey <bu...@apache.org>.
What about moving back to having a per-major-version-of-hadoop
compatibility module again that builds against one needed major version all
the time? (Presumably with some shell script magic to pick the right one?)
that would be preferable imho to e.g. producing main project binary
tarballs per Hadoop version.

Or! We could move stuff that relies on brittle Hadoop internals into its
own repo (or one of our existing repos) and build _that_ with binaries for
our supported Hadoop versions. Then in the main project we can include the
appropriate artifact for the version of Hadoop we happen to build with
(essentially leaving the main repo how it is) and update our "replace the
version of Hadoop!" note to including replacing this "HBase stuff that's
closely tied to Hadoop internals" jar as well.

On Wed, May 29, 2019, 08:41 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> See the comments here
>
> https://issues.apache.org/jira/browse/HBASE-22394
>
> Although we claim that hbase 2.1.x can work together with hadoop 3.1.x,
> actually we require users to build the hbase binary with hadoop 3.1.x by
> their own if they really want to use hbase together with hadoop 3.1.x
> clients.
>
> The problem for HBASE-22394 is that our asyncfswal references some
> IA.Private classes and they have been changed between minor releases. It
> can be solved by using reflection. But in general, since hadoop also
> follows the semantic versioning, I do not think it is safe to do drop-in
> replacement between minor releases. So maybe a better way is to publish a
> hbase binary for each hadoop minor release line?
>
> Thanks.
>

Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines

Posted by Artem Ervits <ar...@gmail.com>.
I can't comment whether we need every Hadoop release but at the minimum,
2.8.5 as we recently switched to it from 2.7.7. I ran into issues with
2.8.5 and 2.1.5rc0 and used workaround in
https://issues.apache.org/jira/browse/HBASE-22052 to overcome it. I guess
if 2.1 will not live past 2.1.6 then it's irrelevant. At the minimum we
need to document steps to build an HBase binary with explicit Hadoop
version. Yes, it's there but we need to do a better job bringing this
information forward.

On a separate note, I ran into issues with 2.2rc4 and Hadoop 2.9.2 and plan
to improve documentation for workarounds, initial thoughts are in
https://issues.apache.org/jira/browse/HBASE-22465. I want to emphacose both
sides of the coin, with strict durability and without. Thoughts welcome on
the jira.



On Wed, May 29, 2019, 9:41 AM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> See the comments here
>
> https://issues.apache.org/jira/browse/HBASE-22394
>
> Although we claim that hbase 2.1.x can work together with hadoop 3.1.x,
> actually we require users to build the hbase binary with hadoop 3.1.x by
> their own if they really want to use hbase together with hadoop 3.1.x
> clients.
>
> The problem for HBASE-22394 is that our asyncfswal references some
> IA.Private classes and they have been changed between minor releases. It
> can be solved by using reflection. But in general, since hadoop also
> follows the semantic versioning, I do not think it is safe to do drop-in
> replacement between minor releases. So maybe a better way is to publish a
> hbase binary for each hadoop minor release line?
>
> Thanks.
>

Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines

Posted by Sean Busbey <bu...@apache.org>.
What about moving back to having a per-major-version-of-hadoop
compatibility module again that builds against one needed major version all
the time? (Presumably with some shell script magic to pick the right one?)
that would be preferable imho to e.g. producing main project binary
tarballs per Hadoop version.

Or! We could move stuff that relies on brittle Hadoop internals into its
own repo (or one of our existing repos) and build _that_ with binaries for
our supported Hadoop versions. Then in the main project we can include the
appropriate artifact for the version of Hadoop we happen to build with
(essentially leaving the main repo how it is) and update our "replace the
version of Hadoop!" note to including replacing this "HBase stuff that's
closely tied to Hadoop internals" jar as well.

On Wed, May 29, 2019, 08:41 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> See the comments here
>
> https://issues.apache.org/jira/browse/HBASE-22394
>
> Although we claim that hbase 2.1.x can work together with hadoop 3.1.x,
> actually we require users to build the hbase binary with hadoop 3.1.x by
> their own if they really want to use hbase together with hadoop 3.1.x
> clients.
>
> The problem for HBASE-22394 is that our asyncfswal references some
> IA.Private classes and they have been changed between minor releases. It
> can be solved by using reflection. But in general, since hadoop also
> follows the semantic versioning, I do not think it is safe to do drop-in
> replacement between minor releases. So maybe a better way is to publish a
> hbase binary for each hadoop minor release line?
>
> Thanks.
>