You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shuai Lin <li...@gmail.com> on 2019/11/11 06:47:25 UTC

HBase table snapshots compatibility between 1.0 and 2.1

Hi all,

TL;DR Could table snapshots taken in hbase 1.0 be used in hbase 2.1?

We have an existing production hbase 1.0 cluster (CDH 5.4) , and we're
setting up a new cluster with hbase 2.1 (CDH 6.3). Let's call the old
cluster C1 and new one C2.

To migrate the existing data from C1 to C2, we plan to use the "snapshot +
replication" approach (snapshot would capture the existing part, and
replication would do the incremental part) . However when I was testing the
feasibility of this approach locally, I found that the snapshot could be
successfully export to c2, but but the restored table on C2 has no data.

Here is a minimal reproducible example:

1. on C1: take the snapshot and export it to C2

hbase shell:
    create "t1", {"NAME"=>"f1", "REPLICATION_SCOPE" => 1}
    put "t1", "r1", "f1:c1", "value"
    snapshot 't1', 't1s1'

sudo -u hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot
-snapshot t1s1 \
            -copy-to hdfs://c2:8020/hbase -mappers 1

2. Then on C2 restore the table

hbase shell:
    create "t1", {"NAME"=>"f1", "REPLICATION_SCOPE" => 1}
    disable "t1"
    restore_snapshot "t1s1"
    enable "t1"
    scan "t1"

All these steps succeeds, except that the final "scan" command shows no
data at all. Also worth noting that on the master web ui on C2 it shows the
table t1 has two regions and one is not assigned - It shall have only one
region obviously.

So my question is:  Could table snapshots taken in hbase 1.0 be used in
hbase 2.1?
- If yes, anything I'm doing wrong here?
- If no, is there any workaround? (e.g. performing some preprocessing on
the snapshot data on hbase 2.1 side before restoring it?)

If this can't work, the only alternative way to migrate the data is too
install hbase 1.0 on C2 (so it could use the snapshot from C1), and upgrade
it to hbase 2.1 after restoring the snapshot. I I'd like to avoid going
this way as much as possible because it would be too cumbersome.

Any information would be much appreciated, thx!

Re: HBase table snapshots compatibility between 1.0 and 2.1

Posted by Shuai Lin <li...@gmail.com>.
It turned out to be https://issues.apache.org/jira/browse/HBASE-19893 ,
which isn't included in CDH 6.3.0 . In CDH 6.3.1 they included this fix and
now it works like a charm.

Thanks all for the help!

On Wed, Nov 13, 2019 at 12:56 AM Josh Elser <el...@apache.org> wrote:

> Hey Shuai,
>
> You're likely to get some more traction with this question via
> contacting Cloudera's customer support channels. We try to keep this
> forum focused on Apache HBase versions.
>
> If you are not seeing records after restoring, it sounds like there is
> some (missing?) metadata in the old version which is not handled in the
> newer versions.
>
> As far as your procedurces, you could combine your
> create/disable/restore_snapshot to just use clone_snapshot instead.
> However, if there is some incompatibility, this is of little consequence.
>
> You could try to use CopyTable instead of snapshots.
>
> On 11/11/19 1:47 AM, Shuai Lin wrote:
> > Hi all,
> >
> > TL;DR Could table snapshots taken in hbase 1.0 be used in hbase 2.1?
> >
> > We have an existing production hbase 1.0 cluster (CDH 5.4) , and we're
> > setting up a new cluster with hbase 2.1 (CDH 6.3). Let's call the old
> > cluster C1 and new one C2.
> >
> > To migrate the existing data from C1 to C2, we plan to use the "snapshot
> +
> > replication" approach (snapshot would capture the existing part, and
> > replication would do the incremental part) . However when I was testing
> the
> > feasibility of this approach locally, I found that the snapshot could be
> > successfully export to c2, but but the restored table on C2 has no data.
> >
> > Here is a minimal reproducible example:
> >
> > 1. on C1: take the snapshot and export it to C2
> >
> > hbase shell:
> >      create "t1", {"NAME"=>"f1", "REPLICATION_SCOPE" => 1}
> >      put "t1", "r1", "f1:c1", "value"
> >      snapshot 't1', 't1s1'
> >
> > sudo -u hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot
> > -snapshot t1s1 \
> >              -copy-to hdfs://c2:8020/hbase -mappers 1
> >
> > 2. Then on C2 restore the table
> >
> > hbase shell:
> >      create "t1", {"NAME"=>"f1", "REPLICATION_SCOPE" => 1}
> >      disable "t1"
> >      restore_snapshot "t1s1"
> >      enable "t1"
> >      scan "t1"
> >
> > All these steps succeeds, except that the final "scan" command shows no
> > data at all. Also worth noting that on the master web ui on C2 it shows
> the
> > table t1 has two regions and one is not assigned - It shall have only one
> > region obviously.
> >
> > So my question is:  Could table snapshots taken in hbase 1.0 be used in
> > hbase 2.1?
> > - If yes, anything I'm doing wrong here?
> > - If no, is there any workaround? (e.g. performing some preprocessing on
> > the snapshot data on hbase 2.1 side before restoring it?)
> >
> > If this can't work, the only alternative way to migrate the data is too
> > install hbase 1.0 on C2 (so it could use the snapshot from C1), and
> upgrade
> > it to hbase 2.1 after restoring the snapshot. I I'd like to avoid going
> > this way as much as possible because it would be too cumbersome.
> >
> > Any information would be much appreciated, thx!
> >
>

Re: HBase table snapshots compatibility between 1.0 and 2.1

Posted by Sean Busbey <bu...@apache.org>.
Snapshots should work between HBase 1 and HBase 2. There are only
three gotchas I know about for releases from the project.

1) Minimum Hfile Version bump - HBase 2 can still read HFile V2 and
HFile V3 and it can only write HFile V3. It can't read HFile V1. I
have seen clusters with old snapshots that still have HFile V1 files
present.
2) Removal of PREFIX_TREE data block encoding.

AFAIK both of the above can be checked with the check added in HBASE-20649

hbase pre-upgrade validate-hfile

http://hbase.apache.org/2.1/book.html#_hfile_content_validation

3) If exporting a snapshot into the cluster takes too long, the
cleaner in the destination will destroy the hfiles. HBASE-23202
"ExportSnapshot (import) will fail if copying files to root directory
takes longer than cleaner TTL" AFAIK this is still open. and the work
around is probably disabling the cleaner while you're doing the
snapshot export.

On Tue, Nov 12, 2019 at 10:56 AM Josh Elser <el...@apache.org> wrote:
>
> Hey Shuai,
>
> You're likely to get some more traction with this question via
> contacting Cloudera's customer support channels. We try to keep this
> forum focused on Apache HBase versions.
>
> If you are not seeing records after restoring, it sounds like there is
> some (missing?) metadata in the old version which is not handled in the
> newer versions.
>
> As far as your procedurces, you could combine your
> create/disable/restore_snapshot to just use clone_snapshot instead.
> However, if there is some incompatibility, this is of little consequence.
>
> You could try to use CopyTable instead of snapshots.
>
> On 11/11/19 1:47 AM, Shuai Lin wrote:
> > Hi all,
> >
> > TL;DR Could table snapshots taken in hbase 1.0 be used in hbase 2.1?
> >
> > We have an existing production hbase 1.0 cluster (CDH 5.4) , and we're
> > setting up a new cluster with hbase 2.1 (CDH 6.3). Let's call the old
> > cluster C1 and new one C2.
> >
> > To migrate the existing data from C1 to C2, we plan to use the "snapshot +
> > replication" approach (snapshot would capture the existing part, and
> > replication would do the incremental part) . However when I was testing the
> > feasibility of this approach locally, I found that the snapshot could be
> > successfully export to c2, but but the restored table on C2 has no data.
> >
> > Here is a minimal reproducible example:
> >
> > 1. on C1: take the snapshot and export it to C2
> >
> > hbase shell:
> >      create "t1", {"NAME"=>"f1", "REPLICATION_SCOPE" => 1}
> >      put "t1", "r1", "f1:c1", "value"
> >      snapshot 't1', 't1s1'
> >
> > sudo -u hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot
> > -snapshot t1s1 \
> >              -copy-to hdfs://c2:8020/hbase -mappers 1
> >
> > 2. Then on C2 restore the table
> >
> > hbase shell:
> >      create "t1", {"NAME"=>"f1", "REPLICATION_SCOPE" => 1}
> >      disable "t1"
> >      restore_snapshot "t1s1"
> >      enable "t1"
> >      scan "t1"
> >
> > All these steps succeeds, except that the final "scan" command shows no
> > data at all. Also worth noting that on the master web ui on C2 it shows the
> > table t1 has two regions and one is not assigned - It shall have only one
> > region obviously.
> >
> > So my question is:  Could table snapshots taken in hbase 1.0 be used in
> > hbase 2.1?
> > - If yes, anything I'm doing wrong here?
> > - If no, is there any workaround? (e.g. performing some preprocessing on
> > the snapshot data on hbase 2.1 side before restoring it?)
> >
> > If this can't work, the only alternative way to migrate the data is too
> > install hbase 1.0 on C2 (so it could use the snapshot from C1), and upgrade
> > it to hbase 2.1 after restoring the snapshot. I I'd like to avoid going
> > this way as much as possible because it would be too cumbersome.
> >
> > Any information would be much appreciated, thx!
> >

Re: HBase table snapshots compatibility between 1.0 and 2.1

Posted by Josh Elser <el...@apache.org>.
Hey Shuai,

You're likely to get some more traction with this question via 
contacting Cloudera's customer support channels. We try to keep this 
forum focused on Apache HBase versions.

If you are not seeing records after restoring, it sounds like there is 
some (missing?) metadata in the old version which is not handled in the 
newer versions.

As far as your procedurces, you could combine your 
create/disable/restore_snapshot to just use clone_snapshot instead. 
However, if there is some incompatibility, this is of little consequence.

You could try to use CopyTable instead of snapshots.

On 11/11/19 1:47 AM, Shuai Lin wrote:
> Hi all,
> 
> TL;DR Could table snapshots taken in hbase 1.0 be used in hbase 2.1?
> 
> We have an existing production hbase 1.0 cluster (CDH 5.4) , and we're
> setting up a new cluster with hbase 2.1 (CDH 6.3). Let's call the old
> cluster C1 and new one C2.
> 
> To migrate the existing data from C1 to C2, we plan to use the "snapshot +
> replication" approach (snapshot would capture the existing part, and
> replication would do the incremental part) . However when I was testing the
> feasibility of this approach locally, I found that the snapshot could be
> successfully export to c2, but but the restored table on C2 has no data.
> 
> Here is a minimal reproducible example:
> 
> 1. on C1: take the snapshot and export it to C2
> 
> hbase shell:
>      create "t1", {"NAME"=>"f1", "REPLICATION_SCOPE" => 1}
>      put "t1", "r1", "f1:c1", "value"
>      snapshot 't1', 't1s1'
> 
> sudo -u hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot
> -snapshot t1s1 \
>              -copy-to hdfs://c2:8020/hbase -mappers 1
> 
> 2. Then on C2 restore the table
> 
> hbase shell:
>      create "t1", {"NAME"=>"f1", "REPLICATION_SCOPE" => 1}
>      disable "t1"
>      restore_snapshot "t1s1"
>      enable "t1"
>      scan "t1"
> 
> All these steps succeeds, except that the final "scan" command shows no
> data at all. Also worth noting that on the master web ui on C2 it shows the
> table t1 has two regions and one is not assigned - It shall have only one
> region obviously.
> 
> So my question is:  Could table snapshots taken in hbase 1.0 be used in
> hbase 2.1?
> - If yes, anything I'm doing wrong here?
> - If no, is there any workaround? (e.g. performing some preprocessing on
> the snapshot data on hbase 2.1 side before restoring it?)
> 
> If this can't work, the only alternative way to migrate the data is too
> install hbase 1.0 on C2 (so it could use the snapshot from C1), and upgrade
> it to hbase 2.1 after restoring the snapshot. I I'd like to avoid going
> this way as much as possible because it would be too cumbersome.
> 
> Any information would be much appreciated, thx!
>