You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Subash K <su...@ericsson.com> on 2017/12/21 04:36:53 UTC

Hbase Snapshot Export Data storage

We have a use case to transfer data from one cluster to another cluster. As of now we are using CopyTable, but it is having impact on region server and it is taking lot of time to complete data transfer from one to another.

So we are exploring on HBase Export Snapshot feature and we have planned to go ahead with the below steps.


  1.  Take snapshot of a table in Source
  2.  Execute ExportSnapshot job and send the snapshot to the destination
  3.  Restore the snapshot sent from source.
  4.  Now we are able to access the data.

We want to understand how the data is handled in destination after restoring the snapshot. Because we can still see the data under /hbase/archive/data directory in HDFS and only reference data is being maintained in /hbase/data/

Can someone help us to understand

  1.  When the data under /hbase/archive/data will be removed?
  2.  When new data is inserted into the table, where the data will be stored either in /hbase/archive/data or /hbase/data?
  3.  I tried to delete the snapshot and run major_compaction for the table, the data got moved from /hbase/archive/data to /hbase /data. So, is major_compaction required always after restoring snapshot to move the data to its respective data location?
  4.  I'm able to see that data is being stored in archive even if there is no snapshot. Under what other scenario data will be stored in /hbase/archive/data/ ?

Regards,
Subash Kunjupillai


Re: Hbase Snapshot Export Data storage

Posted by Jerry He <je...@gmail.com>.
Keeping the data files in the /hbase/archive is the way how snapshot (and
exported snapshot) works.
There are reference links to them so that they are not actually
cleaned/deleted.
And restored snapshot will answer client query by following the
reference links to these data files.
It is not necessary to do compaction just to rewrite the data files back to
/hbase/data to access the data.

Thanks,

Jerry





On Thu, Jan 4, 2018 at 7:57 PM, Subash Kunjupillai <su...@ericsson.com>
wrote:

> Hi,
>
> I was just trying to understand on what basis this best practice is advised
> to run major compaction immediately after restoring snapshot?
>
> Because in our system, we have a scheduled Major Compaction which runs
> every
> day for this table. If new data is going to fall in /hbase/data in the
> destination cluster, at the end of the day scheduled major compaction can
> move the data brought in with snapshot from /hbase/archive to /hbase/data
> and compact new data loaded on that particular day.
>
> Please let me know how immediate major compaction is going to benefit the
> cluster?
>
> Regards,
> Subash Kunjupillai
>
>
>
> --
> Sent from: http://apache-hbase.679495.n3.nabble.com/HBase-User-
> f4020416.html
>

RE: Hbase Snapshot Export Data storage

Posted by Subash Kunjupillai <su...@ericsson.com>.
Hi,

I was just trying to understand on what basis this best practice is advised
to run major compaction immediately after restoring snapshot?

Because in our system, we have a scheduled Major Compaction which runs every
day for this table. If new data is going to fall in /hbase/data in the
destination cluster, at the end of the day scheduled major compaction can
move the data brought in with snapshot from /hbase/archive to /hbase/data
and compact new data loaded on that particular day. 

Please let me know how immediate major compaction is going to benefit the
cluster? 

Regards,
Subash Kunjupillai



--
Sent from: http://apache-hbase.679495.n3.nabble.com/HBase-User-f4020416.html

RE: Hbase Snapshot Export Data storage

Posted by Subash Kunjupillai <su...@ericsson.com>.
Hi @Ted Yu and @sawant,

Thanks for your guidance.





--
Sent from: http://apache-hbase.679495.n3.nabble.com/HBase-User-f4020416.html

RE: Hbase Snapshot Export Data storage

Posted by "Sawant, Chandramohan " <ch...@citi.com.INVALID>.
Yes we have to follow the best practice of running major compaction after restoring the snapshot so that data will move from archive to default data folder /hbase/data.


Regards,
CM
+1 201 763 1656


-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Wednesday, December 20, 2017 11:42 PM
To: user@hbase.apache.org
Subject: Re: Hbase Snapshot Export Data storage

For #1, major compaction would produce data files under /hbase/data,
releasing archived data.

For #2, /hbase/data

For #3, you can access your data before major compaction is performed. You
should follow best practice for major compaction on restored table.

On Wed, Dec 20, 2017 at 8:36 PM, Subash K <su...@ericsson.com> wrote:

> We have a use case to transfer data from one cluster to another cluster.
> As of now we are using CopyTable, but it is having impact on region server
> and it is taking lot of time to complete data transfer from one to another.
>
> So we are exploring on HBase Export Snapshot feature and we have planned
> to go ahead with the below steps.
>
>
>   1.  Take snapshot of a table in Source
>   2.  Execute ExportSnapshot job and send the snapshot to the destination
>   3.  Restore the snapshot sent from source.
>   4.  Now we are able to access the data.
>
> We want to understand how the data is handled in destination after
> restoring the snapshot. Because we can still see the data under
> /hbase/archive/data directory in HDFS and only reference data is being
> maintained in /hbase/data/
>
> Can someone help us to understand
>
>   1.  When the data under /hbase/archive/data will be removed?
>   2.  When new data is inserted into the table, where the data will be
> stored either in /hbase/archive/data or /hbase/data?
>   3.  I tried to delete the snapshot and run major_compaction for the
> table, the data got moved from /hbase/archive/data to /hbase /data. So, is
> major_compaction required always after restoring snapshot to move the data
> to its respective data location?
>   4.  I'm able to see that data is being stored in archive even if there
> is no snapshot. Under what other scenario data will be stored in
> /hbase/archive/data/ ?
>
> Regards,
> Subash Kunjupillai
>
>

Re: Hbase Snapshot Export Data storage

Posted by Ted Yu <yu...@gmail.com>.
For #1, major compaction would produce data files under /hbase/data,
releasing archived data.

For #2, /hbase/data

For #3, you can access your data before major compaction is performed. You
should follow best practice for major compaction on restored table.

On Wed, Dec 20, 2017 at 8:36 PM, Subash K <su...@ericsson.com> wrote:

> We have a use case to transfer data from one cluster to another cluster.
> As of now we are using CopyTable, but it is having impact on region server
> and it is taking lot of time to complete data transfer from one to another.
>
> So we are exploring on HBase Export Snapshot feature and we have planned
> to go ahead with the below steps.
>
>
>   1.  Take snapshot of a table in Source
>   2.  Execute ExportSnapshot job and send the snapshot to the destination
>   3.  Restore the snapshot sent from source.
>   4.  Now we are able to access the data.
>
> We want to understand how the data is handled in destination after
> restoring the snapshot. Because we can still see the data under
> /hbase/archive/data directory in HDFS and only reference data is being
> maintained in /hbase/data/
>
> Can someone help us to understand
>
>   1.  When the data under /hbase/archive/data will be removed?
>   2.  When new data is inserted into the table, where the data will be
> stored either in /hbase/archive/data or /hbase/data?
>   3.  I tried to delete the snapshot and run major_compaction for the
> table, the data got moved from /hbase/archive/data to /hbase /data. So, is
> major_compaction required always after restoring snapshot to move the data
> to its respective data location?
>   4.  I'm able to see that data is being stored in archive even if there
> is no snapshot. Under what other scenario data will be stored in
> /hbase/archive/data/ ?
>
> Regards,
> Subash Kunjupillai
>
>