You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by thib <th...@gresearch.co.uk> on 2016/05/18 10:45:05 UTC

Snapshot performance and helper script

Hi,

I am thinking to implement regular snapshots on HBase to protect against
user mistakes, e.g. if something bad happens go back to the previous
snapshot.
I am thinking to keep something as one snapshot per week for four weeks, and
one snapshot a day for 7 days, so always have about 11 snapshots.  Then each
time a new snapshot is created, an old one would be deleted.

From reading the doc I get the impression that snapshots are quite light to
take, and have zero on-going performance impact, i.e. HBase will be just as
fast with 11 snapshots than with none.
Is that right?

Am I also right to believe that the extra disk usage be very low in our
setup where we never deleted any data, just add more?

Finally, is anyone aware of a tool / helper script to implement such a
snapshot strategy, before I spend time writing my own?

Thank you,
Thibault.



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Snapshot-performance-and-helper-script-tp4080073.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Snapshot performance and helper script

Posted by Vladimir Rodionov <vl...@gmail.com>.
Snapshots are light when you take them, but not that light when you export
them. If you do not do export and only need to protect against
user errors - fine, otherwise, bear in mind that export snapshot is M/R job
and it materializes (copies) all your data to another location

Another possible problem with snapshot is eventual data duplication.
Snapshots store references to store files, store files, usually, get
compacted quite often, new files are created, old files get archived and
later deleted. If you have snapshot, all store files it refers to will be
kept in archive until you delete this snapshot.

-Vlad

On Wed, May 18, 2016 at 3:45 AM, thib <th...@gresearch.co.uk>
wrote:

> Hi,
>
> I am thinking to implement regular snapshots on HBase to protect against
> user mistakes, e.g. if something bad happens go back to the previous
> snapshot.
> I am thinking to keep something as one snapshot per week for four weeks,
> and
> one snapshot a day for 7 days, so always have about 11 snapshots.  Then
> each
> time a new snapshot is created, an old one would be deleted.
>
> From reading the doc I get the impression that snapshots are quite light to
> take, and have zero on-going performance impact, i.e. HBase will be just as
> fast with 11 snapshots than with none.
> Is that right?
>
> Am I also right to believe that the extra disk usage be very low in our
> setup where we never deleted any data, just add more?
>
> Finally, is anyone aware of a tool / helper script to implement such a
> snapshot strategy, before I spend time writing my own?
>
> Thank you,
> Thibault.
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Snapshot-performance-and-helper-script-tp4080073.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Snapshot performance and helper script

Posted by Huaxiang Sun <hs...@cloudera.com>.
Hi Thibault,

    Yes, snapshots are very light to take as it does not copy the hfiles. As for disk space, per my understanding, it may not be low. Once compaction happens, snapshot will hold these hfiles which are supposed to be cleaned up.
    Hbase master webpage can provide you more information about how much extra diskspace snapshots takes. Please see the following links for more info. Just my 2 cents, Experts here may correct/provide more information.

https://issues.apache.org/jira/secure/attachment/12802250/master-snapshot.png <https://issues.apache.org/jira/secure/attachment/12802250/master-snapshot.png>
https://issues.apache.org/jira/browse/HBASE-15415 <https://issues.apache.org/jira/browse/HBASE-15415>

   Thanks,
   Huaxiang


> On May 18, 2016, at 3:45 AM, thib <th...@gresearch.co.uk> wrote:
> 
> Hi,
> 
> I am thinking to implement regular snapshots on HBase to protect against
> user mistakes, e.g. if something bad happens go back to the previous
> snapshot.
> I am thinking to keep something as one snapshot per week for four weeks, and
> one snapshot a day for 7 days, so always have about 11 snapshots.  Then each
> time a new snapshot is created, an old one would be deleted.
> 
> From reading the doc I get the impression that snapshots are quite light to
> take, and have zero on-going performance impact, i.e. HBase will be just as
> fast with 11 snapshots than with none.
> Is that right?
> 
> Am I also right to believe that the extra disk usage be very low in our
> setup where we never deleted any data, just add more?
> 
> Finally, is anyone aware of a tool / helper script to implement such a
> snapshot strategy, before I spend time writing my own?
> 
> Thank you,
> Thibault.
> 
> 
> 
> --
> View this message in context: http://apache-hbase.679495.n3.nabble.com/Snapshot-performance-and-helper-script-tp4080073.html
> Sent from the HBase User mailing list archive at Nabble.com.