You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bryan Beaudreault <bb...@hubspot.com.INVALID> on 2021/05/19 12:49:13 UTC

Upgrading cdh5.16.2 to apache hbase 2.4 using replication

We are running about 40 HBase clusters, with over 5000 regionservers total.
These are all running cdh5.16.2. We also have thousands of clients (from
APIs to kafka workers to hadoop jobs, etc) hitting these various clusters,
also running cdh5.16.2.

We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've read
through the docs on https://hbase.apache.org/book.html#_upgrade_paths, and
am starting to plan our approach. More than a few seconds of downtime is
not an option, but rolling upgrade also seems risky (if not impossible for
our version).

One thought I had is whether replication is compatible between these two
versions. If so, we probably would consider swapping onto upgraded clusters
using backup/restore + replication. If we were to go this route we'd
probably want to consider bi-directional replication so that we can roll
back to the old cluster if there's a regression.

Does anyone have any experience with this approach? Is replication protocol
compatible across the seversions? Any concerns, tips or other
considerations to keep in mind? We do the backup/restore + replication
approach pretty regularly to move tables between clusters.

Thanks!

Re: Upgrading cdh5.16.2 to apache hbase 2.4 using replication

Posted by Anoop John <an...@gmail.com>.
Yes the command enable_table_replication will check whether a table exists
in  peer cluster and if so compare the CFs.  Ya you correctly said. The
difference in the table description results in failure of this command. You
can enable replication at src using the alter table command.  We can fix
this issue. WDYT @Wellington ?

Anoop

On Thu, May 20, 2021 at 5:07 PM bq zhao <zb...@gmail.com> wrote:

> We tested replication between Apache HBase1.4 and Apache HBase2.2. We found
> that if you use 'enable_table_replication' command to enable replication,
> it will compares table schemas before start. But HBase2 has more default
> parameters than HBase1, which leads to schema comparison failed. However,
> if you use alter command like “alter 'test:test_replication',{NAME =>
> 'f',REPLICATION_SCOPE => '1'}”, the replication can start successfully !
>
> There is also a related jira tracking this issue:
> https://issues.apache.org/jira/browse/HBASE-24353
>
> Wellington Chevreuil <we...@gmail.com> 于2021年5月20日周四
> 下午5:07写道:
>
> > Yes, replication interfaces are compatible between these two major
> > versions.
> >
> > So I created two clusters in AWS and tried enable replication between
> HBase
> > > 1.4.13 and 2.2.5. But have got error "table exists but descriptors are
> > not
> > > the same" (I will put screenshot in the attachment but not sure it will
> > > work here).
> > >
> > Can you describe in detail which steps (commands/configurations) have you
> > executed to get the mentioned error? After which exact action have you
> seen
> > this message? And are the tables schema identical in both clusters?
> >
> > Em qua., 19 de mai. de 2021 às 21:00, Sergey Semenoff <
> > box4semenoff@gmail.com> escreveu:
> >
> > > We are thinking about simulator issue. Our clusters much less - 4 by
> 100
> > > RS however we need process data continuously too. So I created two
> > clusters
> > > in AWS and tried enable replication between HBase 1.4.13 and 2.2.5. But
> > > have got error "table exists but descriptors are not the same" (I will
> > put
> > > screenshot in the attachment but not sure it will work here).
> > >
> > > I have some ideas how to make upgrade by another way and would glad to
> > > discuss it with you. So you could write me at box4semenoff@gmail.com
> to
> > > dig to details.
> > >
> > > ср, 19 мая 2021 г., 15:50 Bryan Beaudreault
> > > <bb...@hubspot.com.invalid>:
> > >
> > >> We are running about 40 HBase clusters, with over 5000 regionservers
> > >> total.
> > >> These are all running cdh5.16.2. We also have thousands of clients
> (from
> > >> APIs to kafka workers to hadoop jobs, etc) hitting these various
> > clusters,
> > >> also running cdh5.16.2.
> > >>
> > >> We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've
> > read
> > >> through the docs on https://hbase.apache.org/book.html#_upgrade_paths
> ,
> > >> and
> > >> am starting to plan our approach. More than a few seconds of downtime
> is
> > >> not an option, but rolling upgrade also seems risky (if not impossible
> > for
> > >> our version).
> > >>
> > >> One thought I had is whether replication is compatible between these
> two
> > >> versions. If so, we probably would consider swapping onto upgraded
> > >> clusters
> > >> using backup/restore + replication. If we were to go this route we'd
> > >> probably want to consider bi-directional replication so that we can
> roll
> > >> back to the old cluster if there's a regression.
> > >>
> > >> Does anyone have any experience with this approach? Is replication
> > >> protocol
> > >> compatible across the seversions? Any concerns, tips or other
> > >> considerations to keep in mind? We do the backup/restore + replication
> > >> approach pretty regularly to move tables between clusters.
> > >>
> > >> Thanks!
> > >>
> > >
> >
>

Re: Upgrading cdh5.16.2 to apache hbase 2.4 using replication

Posted by bq zhao <zb...@gmail.com>.
We tested replication between Apache HBase1.4 and Apache HBase2.2. We found
that if you use 'enable_table_replication' command to enable replication,
it will compares table schemas before start. But HBase2 has more default
parameters than HBase1, which leads to schema comparison failed. However,
if you use alter command like “alter 'test:test_replication',{NAME =>
'f',REPLICATION_SCOPE => '1'}”, the replication can start successfully !

There is also a related jira tracking this issue:
https://issues.apache.org/jira/browse/HBASE-24353

Wellington Chevreuil <we...@gmail.com> 于2021年5月20日周四
下午5:07写道:

> Yes, replication interfaces are compatible between these two major
> versions.
>
> So I created two clusters in AWS and tried enable replication between HBase
> > 1.4.13 and 2.2.5. But have got error "table exists but descriptors are
> not
> > the same" (I will put screenshot in the attachment but not sure it will
> > work here).
> >
> Can you describe in detail which steps (commands/configurations) have you
> executed to get the mentioned error? After which exact action have you seen
> this message? And are the tables schema identical in both clusters?
>
> Em qua., 19 de mai. de 2021 às 21:00, Sergey Semenoff <
> box4semenoff@gmail.com> escreveu:
>
> > We are thinking about simulator issue. Our clusters much less - 4 by 100
> > RS however we need process data continuously too. So I created two
> clusters
> > in AWS and tried enable replication between HBase 1.4.13 and 2.2.5. But
> > have got error "table exists but descriptors are not the same" (I will
> put
> > screenshot in the attachment but not sure it will work here).
> >
> > I have some ideas how to make upgrade by another way and would glad to
> > discuss it with you. So you could write me at box4semenoff@gmail.com to
> > dig to details.
> >
> > ср, 19 мая 2021 г., 15:50 Bryan Beaudreault
> > <bb...@hubspot.com.invalid>:
> >
> >> We are running about 40 HBase clusters, with over 5000 regionservers
> >> total.
> >> These are all running cdh5.16.2. We also have thousands of clients (from
> >> APIs to kafka workers to hadoop jobs, etc) hitting these various
> clusters,
> >> also running cdh5.16.2.
> >>
> >> We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've
> read
> >> through the docs on https://hbase.apache.org/book.html#_upgrade_paths,
> >> and
> >> am starting to plan our approach. More than a few seconds of downtime is
> >> not an option, but rolling upgrade also seems risky (if not impossible
> for
> >> our version).
> >>
> >> One thought I had is whether replication is compatible between these two
> >> versions. If so, we probably would consider swapping onto upgraded
> >> clusters
> >> using backup/restore + replication. If we were to go this route we'd
> >> probably want to consider bi-directional replication so that we can roll
> >> back to the old cluster if there's a regression.
> >>
> >> Does anyone have any experience with this approach? Is replication
> >> protocol
> >> compatible across the seversions? Any concerns, tips or other
> >> considerations to keep in mind? We do the backup/restore + replication
> >> approach pretty regularly to move tables between clusters.
> >>
> >> Thanks!
> >>
> >
>

Re: Upgrading cdh5.16.2 to apache hbase 2.4 using replication

Posted by Wellington Chevreuil <we...@gmail.com>.
Yes, replication interfaces are compatible between these two major
versions.

So I created two clusters in AWS and tried enable replication between HBase
> 1.4.13 and 2.2.5. But have got error "table exists but descriptors are not
> the same" (I will put screenshot in the attachment but not sure it will
> work here).
>
Can you describe in detail which steps (commands/configurations) have you
executed to get the mentioned error? After which exact action have you seen
this message? And are the tables schema identical in both clusters?

Em qua., 19 de mai. de 2021 às 21:00, Sergey Semenoff <
box4semenoff@gmail.com> escreveu:

> We are thinking about simulator issue. Our clusters much less - 4 by 100
> RS however we need process data continuously too. So I created two clusters
> in AWS and tried enable replication between HBase 1.4.13 and 2.2.5. But
> have got error "table exists but descriptors are not the same" (I will put
> screenshot in the attachment but not sure it will work here).
>
> I have some ideas how to make upgrade by another way and would glad to
> discuss it with you. So you could write me at box4semenoff@gmail.com to
> dig to details.
>
> ср, 19 мая 2021 г., 15:50 Bryan Beaudreault
> <bb...@hubspot.com.invalid>:
>
>> We are running about 40 HBase clusters, with over 5000 regionservers
>> total.
>> These are all running cdh5.16.2. We also have thousands of clients (from
>> APIs to kafka workers to hadoop jobs, etc) hitting these various clusters,
>> also running cdh5.16.2.
>>
>> We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've read
>> through the docs on https://hbase.apache.org/book.html#_upgrade_paths,
>> and
>> am starting to plan our approach. More than a few seconds of downtime is
>> not an option, but rolling upgrade also seems risky (if not impossible for
>> our version).
>>
>> One thought I had is whether replication is compatible between these two
>> versions. If so, we probably would consider swapping onto upgraded
>> clusters
>> using backup/restore + replication. If we were to go this route we'd
>> probably want to consider bi-directional replication so that we can roll
>> back to the old cluster if there's a regression.
>>
>> Does anyone have any experience with this approach? Is replication
>> protocol
>> compatible across the seversions? Any concerns, tips or other
>> considerations to keep in mind? We do the backup/restore + replication
>> approach pretty regularly to move tables between clusters.
>>
>> Thanks!
>>
>

Re: Upgrading cdh5.16.2 to apache hbase 2.4 using replication

Posted by Sergey Semenoff <bo...@gmail.com>.
We are thinking about simulator issue. Our clusters much less - 4 by 100 RS
however we need process data continuously too. So I created two clusters in
AWS and tried enable replication between HBase 1.4.13 and 2.2.5. But have
got error "table exists but descriptors are not the same" (I will put
screenshot in the attachment but not sure it will work here).

I have some ideas how to make upgrade by another way and would glad to
discuss it with you. So you could write me at box4semenoff@gmail.com to dig
to details.

ср, 19 мая 2021 г., 15:50 Bryan Beaudreault
<bb...@hubspot.com.invalid>:

> We are running about 40 HBase clusters, with over 5000 regionservers total.
> These are all running cdh5.16.2. We also have thousands of clients (from
> APIs to kafka workers to hadoop jobs, etc) hitting these various clusters,
> also running cdh5.16.2.
>
> We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've read
> through the docs on https://hbase.apache.org/book.html#_upgrade_paths, and
> am starting to plan our approach. More than a few seconds of downtime is
> not an option, but rolling upgrade also seems risky (if not impossible for
> our version).
>
> One thought I had is whether replication is compatible between these two
> versions. If so, we probably would consider swapping onto upgraded clusters
> using backup/restore + replication. If we were to go this route we'd
> probably want to consider bi-directional replication so that we can roll
> back to the old cluster if there's a regression.
>
> Does anyone have any experience with this approach? Is replication protocol
> compatible across the seversions? Any concerns, tips or other
> considerations to keep in mind? We do the backup/restore + replication
> approach pretty regularly to move tables between clusters.
>
> Thanks!
>

Re: Upgrading cdh5.16.2 to apache hbase 2.4 using replication

Posted by Bryan Beaudreault <bb...@hubspot.com.INVALID>.
We are not paying for CDH -- our older version of CDH (5.16.2) was
pre-licensing. We've never used CM. We are planning to migrate off of CDH
onto apache, and have 10+ years of experience working with HBase internals
and operating HBase at scale. I'm curious if anyone has knowledge of any
incompatibilities in the replication layer between these 2 versions, as
that is not very well covered in the public docs afaict. I'm aware this
will likely be a multi-month or year+ long project for us, and am just
starting the investigation phase :) It honestly looks like it might be an
easier project than the pre-0.96 to 1.x upgrade we undertook years ago,
though we're at a different scale today.

On Wed, May 19, 2021 at 9:17 AM Marc Hoppins <ma...@eset.sk> wrote:

> If you are paying for CDH then just upgrade via cloudera manager. If you
> are not paying for it then I think you will find it a huge problem.
>
> Upgade may have to be done using a version 6 then a newer version to get
> to a suitable Hbase/Hadoop version.
>
> We are currently on CDH6.3.2 but the Hbase is an extremely useless version
> (2.1.0) and we are not in the business of generating income from the data
> so cannot justify the exorbitant cost per node that cloudera are asking for
> later versions.
>
> -----Original Message-----
> From: Bryan Beaudreault <bb...@hubspot.com.INVALID>
> Sent: Wednesday, May 19, 2021 2:49 PM
> To: user@hbase.apache.org
> Subject: Upgrading cdh5.16.2 to apache hbase 2.4 using replication
>
> EXTERNAL
>
> We are running about 40 HBase clusters, with over 5000 regionservers total.
> These are all running cdh5.16.2. We also have thousands of clients (from
> APIs to kafka workers to hadoop jobs, etc) hitting these various clusters,
> also running cdh5.16.2.
>
> We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've read
> through the docs on https://hbase.apache.org/book.html#_upgrade_paths
> <https://hbase.apache.org/book.html#_upgrade_paths>,
> and am starting to plan our approach. More than a few seconds of downtime
> is not an option, but rolling upgrade also seems risky (if not impossible
> for our version).
>
> One thought I had is whether replication is compatible between these two
> versions. If so, we probably would consider swapping onto upgraded clusters
> using backup/restore + replication. If we were to go this route we'd
> probably want to consider bi-directional replication so that we can roll
> back to the old cluster if there's a regression.
>
> Does anyone have any experience with this approach? Is replication
> protocol compatible across the seversions? Any concerns, tips or other
> considerations to keep in mind? We do the backup/restore + replication
> approach pretty regularly to move tables between clusters.
>
> Thanks!
>

RE: Upgrading cdh5.16.2 to apache hbase 2.4 using replication

Posted by Marc Hoppins <ma...@eset.sk>.
If you are paying for CDH then just upgrade via cloudera manager.  If you are not paying for it then I think you will find it a huge problem.

Upgade may have to be done using a version 6 then a newer version to get to a suitable Hbase/Hadoop version.

We are currently on CDH6.3.2 but the Hbase is an extremely useless version (2.1.0) and we are not in the business of generating income from the data so cannot justify the exorbitant cost per node that cloudera are asking for later versions.

-----Original Message-----
From: Bryan Beaudreault <bb...@hubspot.com.INVALID> 
Sent: Wednesday, May 19, 2021 2:49 PM
To: user@hbase.apache.org
Subject: Upgrading cdh5.16.2 to apache hbase 2.4 using replication

EXTERNAL

We are running about 40 HBase clusters, with over 5000 regionservers total.
These are all running cdh5.16.2. We also have thousands of clients (from APIs to kafka workers to hadoop jobs, etc) hitting these various clusters, also running cdh5.16.2.

We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've read through the docs on https://hbase.apache.org/book.html#_upgrade_paths, and am starting to plan our approach. More than a few seconds of downtime is not an option, but rolling upgrade also seems risky (if not impossible for our version).

One thought I had is whether replication is compatible between these two versions. If so, we probably would consider swapping onto upgraded clusters using backup/restore + replication. If we were to go this route we'd probably want to consider bi-directional replication so that we can roll back to the old cluster if there's a regression.

Does anyone have any experience with this approach? Is replication protocol compatible across the seversions? Any concerns, tips or other considerations to keep in mind? We do the backup/restore + replication approach pretty regularly to move tables between clusters.

Thanks!