You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Juan Martin Pampliega <jp...@gmail.com> on 2013/11/28 14:47:10 UTC

HDFS snapshots restore

Hi,

I have read the documentation about HDFS snapshots for hadoop 2 (
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
but it is still not clear how do I use this snapshots to restore the data.

Lets say I have a directory with the data corresponding to a Hive table
that I want to backup. I take a snapshot today and tomorrow I find out that
the modifications done to the table/directory after the snapshot are wrong
and I want to revert the directory to the snapshot state. How do I achieve
this?

Also, can I extract the snapshot from HDFS and save it in an external
storage and later use it to restore this directory in a new empty cluster?
or which is the recommended way to do this?


Thanks,
Juan.

RE: HDFS snapshots restore

Posted by Bennie Schut <bs...@ebuddy.com>.
Hi Juan,

In addition to Binglin Chang's reply. When you either snapshot or manually copy the data you need to understand a little bit about how hive works to be able to do a correct restore.
Hive keeps metadata in a separate database. So for example if you have a table with a date partition it will use the metadata to know which partitions exist. So for example you have these partitions on hdfs:
/user/hive/warehouse/table/logindate=2013-11-26
/user/hive/warehouse/table/logindate=2013-11-27
/user/hive/warehouse/table/logindate=2013-11-28

If you drop parition "2013-11-27" it will also remove the metadata reference. So if you restore the data the partition will exist on hdfs but you still need to do some "add partition" commands before hive will know the partition exists.
It's usually a good idea to snapshot the metadata at the same time you snapshot the hdfs data so you get one consistent view which you can trust to be correct.

Bennie.

From: Binglin Chang [mailto:decstery@gmail.com]
Sent: Thursday, November 28, 2013 4:27 PM
To: user@hadoop.apache.org
Subject: Re: HDFS snapshots restore

snapshot restore feature is not implemented yet. Currently you can use distcp to copy snapshot dir to your new cluster, suppose your hive dir is /user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0 hdfs://newcluster:8020/somedir


On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <jp...@gmail.com>> wrote:
Hi,

I have read the documentation about HDFS snapshots for hadoop 2 (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html) but it is still not clear how do I use this snapshots to restore the data.

Lets say I have a directory with the data corresponding to a Hive table that I want to backup. I take a snapshot today and tomorrow I find out that the modifications done to the table/directory after the snapshot are wrong and I want to revert the directory to the snapshot state. How do I achieve this?

Also, can I extract the snapshot from HDFS and save it in an external storage and later use it to restore this directory in a new empty cluster? or which is the recommended way to do this?


Thanks,
Juan.


RE: HDFS snapshots restore

Posted by Bennie Schut <bs...@ebuddy.com>.
Hi Juan,

In addition to Binglin Chang's reply. When you either snapshot or manually copy the data you need to understand a little bit about how hive works to be able to do a correct restore.
Hive keeps metadata in a separate database. So for example if you have a table with a date partition it will use the metadata to know which partitions exist. So for example you have these partitions on hdfs:
/user/hive/warehouse/table/logindate=2013-11-26
/user/hive/warehouse/table/logindate=2013-11-27
/user/hive/warehouse/table/logindate=2013-11-28

If you drop parition "2013-11-27" it will also remove the metadata reference. So if you restore the data the partition will exist on hdfs but you still need to do some "add partition" commands before hive will know the partition exists.
It's usually a good idea to snapshot the metadata at the same time you snapshot the hdfs data so you get one consistent view which you can trust to be correct.

Bennie.

From: Binglin Chang [mailto:decstery@gmail.com]
Sent: Thursday, November 28, 2013 4:27 PM
To: user@hadoop.apache.org
Subject: Re: HDFS snapshots restore

snapshot restore feature is not implemented yet. Currently you can use distcp to copy snapshot dir to your new cluster, suppose your hive dir is /user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0 hdfs://newcluster:8020/somedir


On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <jp...@gmail.com>> wrote:
Hi,

I have read the documentation about HDFS snapshots for hadoop 2 (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html) but it is still not clear how do I use this snapshots to restore the data.

Lets say I have a directory with the data corresponding to a Hive table that I want to backup. I take a snapshot today and tomorrow I find out that the modifications done to the table/directory after the snapshot are wrong and I want to revert the directory to the snapshot state. How do I achieve this?

Also, can I extract the snapshot from HDFS and save it in an external storage and later use it to restore this directory in a new empty cluster? or which is the recommended way to do this?


Thanks,
Juan.


RE: HDFS snapshots restore

Posted by Bennie Schut <bs...@ebuddy.com>.
Hi Juan,

In addition to Binglin Chang's reply. When you either snapshot or manually copy the data you need to understand a little bit about how hive works to be able to do a correct restore.
Hive keeps metadata in a separate database. So for example if you have a table with a date partition it will use the metadata to know which partitions exist. So for example you have these partitions on hdfs:
/user/hive/warehouse/table/logindate=2013-11-26
/user/hive/warehouse/table/logindate=2013-11-27
/user/hive/warehouse/table/logindate=2013-11-28

If you drop parition "2013-11-27" it will also remove the metadata reference. So if you restore the data the partition will exist on hdfs but you still need to do some "add partition" commands before hive will know the partition exists.
It's usually a good idea to snapshot the metadata at the same time you snapshot the hdfs data so you get one consistent view which you can trust to be correct.

Bennie.

From: Binglin Chang [mailto:decstery@gmail.com]
Sent: Thursday, November 28, 2013 4:27 PM
To: user@hadoop.apache.org
Subject: Re: HDFS snapshots restore

snapshot restore feature is not implemented yet. Currently you can use distcp to copy snapshot dir to your new cluster, suppose your hive dir is /user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0 hdfs://newcluster:8020/somedir


On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <jp...@gmail.com>> wrote:
Hi,

I have read the documentation about HDFS snapshots for hadoop 2 (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html) but it is still not clear how do I use this snapshots to restore the data.

Lets say I have a directory with the data corresponding to a Hive table that I want to backup. I take a snapshot today and tomorrow I find out that the modifications done to the table/directory after the snapshot are wrong and I want to revert the directory to the snapshot state. How do I achieve this?

Also, can I extract the snapshot from HDFS and save it in an external storage and later use it to restore this directory in a new empty cluster? or which is the recommended way to do this?


Thanks,
Juan.


RE: HDFS snapshots restore

Posted by Bennie Schut <bs...@ebuddy.com>.
Hi Juan,

In addition to Binglin Chang's reply. When you either snapshot or manually copy the data you need to understand a little bit about how hive works to be able to do a correct restore.
Hive keeps metadata in a separate database. So for example if you have a table with a date partition it will use the metadata to know which partitions exist. So for example you have these partitions on hdfs:
/user/hive/warehouse/table/logindate=2013-11-26
/user/hive/warehouse/table/logindate=2013-11-27
/user/hive/warehouse/table/logindate=2013-11-28

If you drop parition "2013-11-27" it will also remove the metadata reference. So if you restore the data the partition will exist on hdfs but you still need to do some "add partition" commands before hive will know the partition exists.
It's usually a good idea to snapshot the metadata at the same time you snapshot the hdfs data so you get one consistent view which you can trust to be correct.

Bennie.

From: Binglin Chang [mailto:decstery@gmail.com]
Sent: Thursday, November 28, 2013 4:27 PM
To: user@hadoop.apache.org
Subject: Re: HDFS snapshots restore

snapshot restore feature is not implemented yet. Currently you can use distcp to copy snapshot dir to your new cluster, suppose your hive dir is /user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0 hdfs://newcluster:8020/somedir


On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <jp...@gmail.com>> wrote:
Hi,

I have read the documentation about HDFS snapshots for hadoop 2 (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html) but it is still not clear how do I use this snapshots to restore the data.

Lets say I have a directory with the data corresponding to a Hive table that I want to backup. I take a snapshot today and tomorrow I find out that the modifications done to the table/directory after the snapshot are wrong and I want to revert the directory to the snapshot state. How do I achieve this?

Also, can I extract the snapshot from HDFS and save it in an external storage and later use it to restore this directory in a new empty cluster? or which is the recommended way to do this?


Thanks,
Juan.


Re: HDFS snapshots restore

Posted by Binglin Chang <de...@gmail.com>.
snapshot restore feature is not implemented yet. Currently you can use
distcp to copy snapshot dir to your new cluster, suppose your hive dir is
/user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0
hdfs://newcluster:8020/somedir



On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <jpampliega@gmail.com
> wrote:

> Hi,
>
> I have read the documentation about HDFS snapshots for hadoop 2 (
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
> but it is still not clear how do I use this snapshots to restore the data.
>
> Lets say I have a directory with the data corresponding to a Hive table
> that I want to backup. I take a snapshot today and tomorrow I find out that
> the modifications done to the table/directory after the snapshot are wrong
> and I want to revert the directory to the snapshot state. How do I achieve
> this?
>
> Also, can I extract the snapshot from HDFS and save it in an external
> storage and later use it to restore this directory in a new empty cluster?
> or which is the recommended way to do this?
>
>
> Thanks,
> Juan.
>

Re: HDFS snapshots restore

Posted by Binglin Chang <de...@gmail.com>.
snapshot restore feature is not implemented yet. Currently you can use
distcp to copy snapshot dir to your new cluster, suppose your hive dir is
/user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0
hdfs://newcluster:8020/somedir



On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <jpampliega@gmail.com
> wrote:

> Hi,
>
> I have read the documentation about HDFS snapshots for hadoop 2 (
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
> but it is still not clear how do I use this snapshots to restore the data.
>
> Lets say I have a directory with the data corresponding to a Hive table
> that I want to backup. I take a snapshot today and tomorrow I find out that
> the modifications done to the table/directory after the snapshot are wrong
> and I want to revert the directory to the snapshot state. How do I achieve
> this?
>
> Also, can I extract the snapshot from HDFS and save it in an external
> storage and later use it to restore this directory in a new empty cluster?
> or which is the recommended way to do this?
>
>
> Thanks,
> Juan.
>

Re: HDFS snapshots restore

Posted by Binglin Chang <de...@gmail.com>.
snapshot restore feature is not implemented yet. Currently you can use
distcp to copy snapshot dir to your new cluster, suppose your hive dir is
/user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0
hdfs://newcluster:8020/somedir



On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <jpampliega@gmail.com
> wrote:

> Hi,
>
> I have read the documentation about HDFS snapshots for hadoop 2 (
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
> but it is still not clear how do I use this snapshots to restore the data.
>
> Lets say I have a directory with the data corresponding to a Hive table
> that I want to backup. I take a snapshot today and tomorrow I find out that
> the modifications done to the table/directory after the snapshot are wrong
> and I want to revert the directory to the snapshot state. How do I achieve
> this?
>
> Also, can I extract the snapshot from HDFS and save it in an external
> storage and later use it to restore this directory in a new empty cluster?
> or which is the recommended way to do this?
>
>
> Thanks,
> Juan.
>

Re: HDFS snapshots restore

Posted by Binglin Chang <de...@gmail.com>.
snapshot restore feature is not implemented yet. Currently you can use
distcp to copy snapshot dir to your new cluster, suppose your hive dir is
/user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0
hdfs://newcluster:8020/somedir



On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <jpampliega@gmail.com
> wrote:

> Hi,
>
> I have read the documentation about HDFS snapshots for hadoop 2 (
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
> but it is still not clear how do I use this snapshots to restore the data.
>
> Lets say I have a directory with the data corresponding to a Hive table
> that I want to backup. I take a snapshot today and tomorrow I find out that
> the modifications done to the table/directory after the snapshot are wrong
> and I want to revert the directory to the snapshot state. How do I achieve
> this?
>
> Also, can I extract the snapshot from HDFS and save it in an external
> storage and later use it to restore this directory in a new empty cluster?
> or which is the recommended way to do this?
>
>
> Thanks,
> Juan.
>