You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Lex Toumbourou <le...@scrunch.com> on 2017/11/21 03:45:28 UTC

Deleting and cleaning old snapshots exported to S3

Hi all,

Wondering if I could get some help figuring out how to clean out old
snapshots that have been exported to S3?

We've been exporting snapshots to S3 using the export snapshot command:

bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
some-snapshot -copy-to s3a://some-bucket/hbase


Now the size of the S3 bucket is getting a little out of control and I'd
like to remove the old snapshots and let HBase garbage collect blocks no
longer referenced.

One idea I had was to spin up an entirely new cluster that uses the S3
bucket as the hbase.rootdir then just delete the snapshots as normal and
maybe use cleaner_run to clean up the old files but it feels like overkill
having to spin up an entire cluster.

So my question is: what's the best approach for deleting snapshots exported
to an s3 bucket and cleaning old store files no longer referenced? We are
using HBase 1.3.1 on EMR.

Thanks!

Lex ToumbourouCTO at scrunch.com <http://scrunch.com/>

Re: Deleting and cleaning old snapshots exported to S3

Posted by Timothy Brown <ti...@siftscience.com>.
Hi Lex,

We had a similar issue with our S3 bucket growing in size and we wrote our
own cleaner. The cleaner first looks at the HFiles required by the current
snapshots. We then figure out which snapshots we no longer want (for
example snapshots older than a week or whatever rules you want). Then we
find the HFiles that are only referenced by these unwanted snapshots and
delete these HFiles from S3.

The tricky part is finding the HFiles for a given snapshot. There are two
ways to this.

1) Use:

SnapshotDescription snapshotDesc =
SnapshotDescriptionUtils.readSnapshotInfo(fs, snapshotDir);
SnapshotReferenceUtil.visitReferencedFiles(conf, fs, snapshotDir,
snapshotDesc, snapshotVisitor)

where snapshotVisitor is an implementation of the SnapshotVisitor interface
found here:
https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotReferenceUtil.java#L63

2) The ExportSnapshot class provides a private method that does this for
you. We ended up using reflection to make ExportSnapshot#getSnapshotFiles
public (see
https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#L539).
For example:

Path snapshotPath = getCompletedSnapshotDir(snapshotName, rootDir);
Method method = ExportSnapshot.class.getDeclaredMethod("getSnapshotFiles",
    Configuration.class, FileSystem.class, Path.class);
method.setAccessible(true);
List<Pair<SnapshotFileInfo, Long>> snapshotFiles = method.invoke(null,
conf, fs, snapshotPath);

I would love to know how other people are tackling this issue as well.

-Tim

On Mon, Nov 20, 2017 at 7:45 PM, Lex Toumbourou <le...@scrunch.com> wrote:

> Hi all,
>
> Wondering if I could get some help figuring out how to clean out old
> snapshots that have been exported to S3?
>
> We've been exporting snapshots to S3 using the export snapshot command:
>
> bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> some-snapshot -copy-to s3a://some-bucket/hbase
>
>
> Now the size of the S3 bucket is getting a little out of control and I'd
> like to remove the old snapshots and let HBase garbage collect blocks no
> longer referenced.
>
> One idea I had was to spin up an entirely new cluster that uses the S3
> bucket as the hbase.rootdir then just delete the snapshots as normal and
> maybe use cleaner_run to clean up the old files but it feels like overkill
> having to spin up an entire cluster.
>
> So my question is: what's the best approach for deleting snapshots exported
> to an s3 bucket and cleaning old store files no longer referenced? We are
> using HBase 1.3.1 on EMR.
>
> Thanks!
>
> Lex ToumbourouCTO at scrunch.com <http://scrunch.com/>
>