You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Afroz Ahmad <ah...@gmail.com> on 2015/04/09 20:35:09 UTC

Export Hbase Snapshot

We are planning to use the snapshot feature that takes a backup of a table
with 1.2 TB of data. We are planning to export the data using
ExportSnapshot and copy the resulting files to a NFS mount periodically.

Out infrastructure team is very concerned about the amount of data that
will be going over the wire and how long it will take

This is just one table. There may be other tables in the future that we
want to back up.

So I wanted to get a sense of what others are doing with ExportSnapshot.
What is the size of the tables that are backed up and whether the concerns
raised by our infra team are valid?


Thanks

Afroz

Re: Export Hbase Snapshot

Posted by Esteban Gutierrez <es...@cloudera.com>.
Another practice is to send the snapshots to S3. That works great for
disaster recovery and specially if you are running your HBase cluster on
EC2 or if you have means to use AWD Direct Connect from your private
infrastructure.

cheers,
esteban.





--
Cloudera, Inc.


On Thu, Apr 9, 2015 at 3:38 PM, anil gupta <an...@gmail.com> wrote:

> If you are not using compression in your HBase tables. Maybe, you can
> enable compression on that table to reduce data storage requirements.
> SNAPPY compression is supported out of the box.
> Also, usually, HDFS has 3 replicas of you data. You can store this data in
> non-nfs storage that would be much cheaper.
>
> On Thu, Apr 9, 2015 at 12:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > Indeed you will be sending 1.2TB over the wire. I think the common
> practice
> > is to export a snapshot from local HDFS to remote HDFS (or HDFS-alike,
> such
> > as S3). The idea is you get full bi-directional bandwidth (modulo
> > head-of-rack switching) between all peers in both clusters.
> >
> > On Thu, Apr 9, 2015 at 11:46 AM, Serega Sheypak <
> serega.sheypak@gmail.com>
> > wrote:
> >
> > > Hi,
> > > what is the reason to backup HDFS? It's distributed, reliable,
> > > fault-tolerant, e.t.c.
> > > NFS should expensive in order to keep TBs of data.
> > >
> > >
> > > What problem you are trying to solve?
> > >
> > >
> > > 2015-04-09 20:35 GMT+02:00 Afroz Ahmad <ah...@gmail.com>:
> > >
> > > > We are planning to use the snapshot feature that takes a backup of a
> > > table
> > > > with 1.2 TB of data. We are planning to export the data using
> > > > ExportSnapshot and copy the resulting files to a NFS mount
> > periodically.
> > > >
> > > > Out infrastructure team is very concerned about the amount of data
> that
> > > > will be going over the wire and how long it will take
> > > >
> > > > This is just one table. There may be other tables in the future that
> we
> > > > want to back up.
> > > >
> > > > So I wanted to get a sense of what others are doing with
> > ExportSnapshot.
> > > > What is the size of the tables that are backed up and whether the
> > > concerns
> > > > raised by our infra team are valid?
> > > >
> > > >
> > > > Thanks
> > > >
> > > > Afroz
> > > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Export Hbase Snapshot

Posted by anil gupta <an...@gmail.com>.
If you are not using compression in your HBase tables. Maybe, you can
enable compression on that table to reduce data storage requirements.
SNAPPY compression is supported out of the box.
Also, usually, HDFS has 3 replicas of you data. You can store this data in
non-nfs storage that would be much cheaper.

On Thu, Apr 9, 2015 at 12:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> Indeed you will be sending 1.2TB over the wire. I think the common practice
> is to export a snapshot from local HDFS to remote HDFS (or HDFS-alike, such
> as S3). The idea is you get full bi-directional bandwidth (modulo
> head-of-rack switching) between all peers in both clusters.
>
> On Thu, Apr 9, 2015 at 11:46 AM, Serega Sheypak <se...@gmail.com>
> wrote:
>
> > Hi,
> > what is the reason to backup HDFS? It's distributed, reliable,
> > fault-tolerant, e.t.c.
> > NFS should expensive in order to keep TBs of data.
> >
> >
> > What problem you are trying to solve?
> >
> >
> > 2015-04-09 20:35 GMT+02:00 Afroz Ahmad <ah...@gmail.com>:
> >
> > > We are planning to use the snapshot feature that takes a backup of a
> > table
> > > with 1.2 TB of data. We are planning to export the data using
> > > ExportSnapshot and copy the resulting files to a NFS mount
> periodically.
> > >
> > > Out infrastructure team is very concerned about the amount of data that
> > > will be going over the wire and how long it will take
> > >
> > > This is just one table. There may be other tables in the future that we
> > > want to back up.
> > >
> > > So I wanted to get a sense of what others are doing with
> ExportSnapshot.
> > > What is the size of the tables that are backed up and whether the
> > concerns
> > > raised by our infra team are valid?
> > >
> > >
> > > Thanks
> > >
> > > Afroz
> > >
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Export Hbase Snapshot

Posted by Nick Dimiduk <nd...@gmail.com>.
Indeed you will be sending 1.2TB over the wire. I think the common practice
is to export a snapshot from local HDFS to remote HDFS (or HDFS-alike, such
as S3). The idea is you get full bi-directional bandwidth (modulo
head-of-rack switching) between all peers in both clusters.

On Thu, Apr 9, 2015 at 11:46 AM, Serega Sheypak <se...@gmail.com>
wrote:

> Hi,
> what is the reason to backup HDFS? It's distributed, reliable,
> fault-tolerant, e.t.c.
> NFS should expensive in order to keep TBs of data.
>
>
> What problem you are trying to solve?
>
>
> 2015-04-09 20:35 GMT+02:00 Afroz Ahmad <ah...@gmail.com>:
>
> > We are planning to use the snapshot feature that takes a backup of a
> table
> > with 1.2 TB of data. We are planning to export the data using
> > ExportSnapshot and copy the resulting files to a NFS mount periodically.
> >
> > Out infrastructure team is very concerned about the amount of data that
> > will be going over the wire and how long it will take
> >
> > This is just one table. There may be other tables in the future that we
> > want to back up.
> >
> > So I wanted to get a sense of what others are doing with ExportSnapshot.
> > What is the size of the tables that are backed up and whether the
> concerns
> > raised by our infra team are valid?
> >
> >
> > Thanks
> >
> > Afroz
> >
>

Re: Export Hbase Snapshot

Posted by Serega Sheypak <se...@gmail.com>.
Hi,
what is the reason to backup HDFS? It's distributed, reliable,
fault-tolerant, e.t.c.
NFS should expensive in order to keep TBs of data.


What problem you are trying to solve?


2015-04-09 20:35 GMT+02:00 Afroz Ahmad <ah...@gmail.com>:

> We are planning to use the snapshot feature that takes a backup of a table
> with 1.2 TB of data. We are planning to export the data using
> ExportSnapshot and copy the resulting files to a NFS mount periodically.
>
> Out infrastructure team is very concerned about the amount of data that
> will be going over the wire and how long it will take
>
> This is just one table. There may be other tables in the future that we
> want to back up.
>
> So I wanted to get a sense of what others are doing with ExportSnapshot.
> What is the size of the tables that are backed up and whether the concerns
> raised by our infra team are valid?
>
>
> Thanks
>
> Afroz
>