You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Arthur Chan <ar...@gmail.com> on 2015/09/09 08:04:21 UTC

How to backup and Restore Hadoop 2.x ?

Hi,

Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
Hadoop cluster, or any other options?

I use Hadoop 2.6 with HBase and Hive

Thanks

Regards

Re: How to backup and Restore Hadoop 2.x ?

Posted by Chetna C <ch...@gmail.com>.

We recently did a backup from one cluster to another cluster using our
in-house tool Blueshift (https://github.com/flipkart-incubator/BlueShift),
you can try this tool.

Thanks,
Chetna Chaudhari

On 9 September 2015 at 12:12, James Bond <bo...@gmail.com> wrote:

> One way is to create a backup cluster or a secondary cluster.
> 1. Ingest data in both clusters in "parallel", basically run jobs in both
> the clusters. This will kind of help you in backup and also make sure that
> you can switch over to the back up cluster when you have troubles with the
> Primary cluster. This setup usually makes sense when you have 2 Data
> centers with one being Primary DC and the other Backup.
> 2. Have a primary cluster and a secondary which is kept in sync with thr
> primary. Usually distcp type of jobs. Cloudera gives a front end to manage
> this replications but essentially does a distcp in the background.
> 3. If your data ingestion is flume/kafka etc, you can use it to write to
> both Primary/secondary clusters.
>
> I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
> guess somebody who does can comment.
>
>
> On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <ar...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
>> Hadoop cluster, or any other options?
>>
>> I use Hadoop 2.6 with HBase and Hive
>>
>> Thanks
>>
>> Regards
>>
>>
>

Re: How to backup and Restore Hadoop 2.x ?

Posted by Chetna C <ch...@gmail.com>.

We recently did a backup from one cluster to another cluster using our
in-house tool Blueshift (https://github.com/flipkart-incubator/BlueShift),
you can try this tool.

Thanks,
Chetna Chaudhari

On 9 September 2015 at 12:12, James Bond <bo...@gmail.com> wrote:

> One way is to create a backup cluster or a secondary cluster.
> 1. Ingest data in both clusters in "parallel", basically run jobs in both
> the clusters. This will kind of help you in backup and also make sure that
> you can switch over to the back up cluster when you have troubles with the
> Primary cluster. This setup usually makes sense when you have 2 Data
> centers with one being Primary DC and the other Backup.
> 2. Have a primary cluster and a secondary which is kept in sync with thr
> primary. Usually distcp type of jobs. Cloudera gives a front end to manage
> this replications but essentially does a distcp in the background.
> 3. If your data ingestion is flume/kafka etc, you can use it to write to
> both Primary/secondary clusters.
>
> I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
> guess somebody who does can comment.
>
>
> On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <ar...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
>> Hadoop cluster, or any other options?
>>
>> I use Hadoop 2.6 with HBase and Hive
>>
>> Thanks
>>
>> Regards
>>
>>
>

Re: How to backup and Restore Hadoop 2.x ?

Posted by Chetna C <ch...@gmail.com>.

We recently did a backup from one cluster to another cluster using our
in-house tool Blueshift (https://github.com/flipkart-incubator/BlueShift),
you can try this tool.

Thanks,
Chetna Chaudhari

On 9 September 2015 at 12:12, James Bond <bo...@gmail.com> wrote:

> One way is to create a backup cluster or a secondary cluster.
> 1. Ingest data in both clusters in "parallel", basically run jobs in both
> the clusters. This will kind of help you in backup and also make sure that
> you can switch over to the back up cluster when you have troubles with the
> Primary cluster. This setup usually makes sense when you have 2 Data
> centers with one being Primary DC and the other Backup.
> 2. Have a primary cluster and a secondary which is kept in sync with thr
> primary. Usually distcp type of jobs. Cloudera gives a front end to manage
> this replications but essentially does a distcp in the background.
> 3. If your data ingestion is flume/kafka etc, you can use it to write to
> both Primary/secondary clusters.
>
> I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
> guess somebody who does can comment.
>
>
> On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <ar...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
>> Hadoop cluster, or any other options?
>>
>> I use Hadoop 2.6 with HBase and Hive
>>
>> Thanks
>>
>> Regards
>>
>>
>

Re: How to backup and Restore Hadoop 2.x ?

Posted by Chetna C <ch...@gmail.com>.

We recently did a backup from one cluster to another cluster using our
in-house tool Blueshift (https://github.com/flipkart-incubator/BlueShift),
you can try this tool.

Thanks,
Chetna Chaudhari

On 9 September 2015 at 12:12, James Bond <bo...@gmail.com> wrote:

> One way is to create a backup cluster or a secondary cluster.
> 1. Ingest data in both clusters in "parallel", basically run jobs in both
> the clusters. This will kind of help you in backup and also make sure that
> you can switch over to the back up cluster when you have troubles with the
> Primary cluster. This setup usually makes sense when you have 2 Data
> centers with one being Primary DC and the other Backup.
> 2. Have a primary cluster and a secondary which is kept in sync with thr
> primary. Usually distcp type of jobs. Cloudera gives a front end to manage
> this replications but essentially does a distcp in the background.
> 3. If your data ingestion is flume/kafka etc, you can use it to write to
> both Primary/secondary clusters.
>
> I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
> guess somebody who does can comment.
>
>
> On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <ar...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
>> Hadoop cluster, or any other options?
>>
>> I use Hadoop 2.6 with HBase and Hive
>>
>> Thanks
>>
>> Regards
>>
>>
>

Re: How to backup and Restore Hadoop 2.x ?

Posted by James Bond <bo...@gmail.com>.

One way is to create a backup cluster or a secondary cluster.
1. Ingest data in both clusters in "parallel", basically run jobs in both
the clusters. This will kind of help you in backup and also make sure that
you can switch over to the back up cluster when you have troubles with the
Primary cluster. This setup usually makes sense when you have 2 Data
centers with one being Primary DC and the other Backup.
2. Have a primary cluster and a secondary which is kept in sync with thr
primary. Usually distcp type of jobs. Cloudera gives a front end to manage
this replications but essentially does a distcp in the background.
3. If your data ingestion is flume/kafka etc, you can use it to write to
both Primary/secondary clusters.

I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
guess somebody who does can comment.

On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <ar...@gmail.com>
wrote:

> Hi,
>
> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
> Hadoop cluster, or any other options?
>
> I use Hadoop 2.6 with HBase and Hive
>
> Thanks
>
> Regards
>
>

Re: How to backup and Restore Hadoop 2.x ?

Posted by James Bond <bo...@gmail.com>.

One way is to create a backup cluster or a secondary cluster.
1. Ingest data in both clusters in "parallel", basically run jobs in both
the clusters. This will kind of help you in backup and also make sure that
you can switch over to the back up cluster when you have troubles with the
Primary cluster. This setup usually makes sense when you have 2 Data
centers with one being Primary DC and the other Backup.
2. Have a primary cluster and a secondary which is kept in sync with thr
primary. Usually distcp type of jobs. Cloudera gives a front end to manage
this replications but essentially does a distcp in the background.
3. If your data ingestion is flume/kafka etc, you can use it to write to
both Primary/secondary clusters.

I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
guess somebody who does can comment.

On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <ar...@gmail.com>
wrote:

> Hi,
>
> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
> Hadoop cluster, or any other options?
>
> I use Hadoop 2.6 with HBase and Hive
>
> Thanks
>
> Regards
>
>

Re: How to backup and Restore Hadoop 2.x ?

Posted by James Bond <bo...@gmail.com>.

One way is to create a backup cluster or a secondary cluster.
1. Ingest data in both clusters in "parallel", basically run jobs in both
the clusters. This will kind of help you in backup and also make sure that
you can switch over to the back up cluster when you have troubles with the
Primary cluster. This setup usually makes sense when you have 2 Data
centers with one being Primary DC and the other Backup.
2. Have a primary cluster and a secondary which is kept in sync with thr
primary. Usually distcp type of jobs. Cloudera gives a front end to manage
this replications but essentially does a distcp in the background.
3. If your data ingestion is flume/kafka etc, you can use it to write to
both Primary/secondary clusters.

I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
guess somebody who does can comment.

On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <ar...@gmail.com>
wrote:

> Hi,
>
> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
> Hadoop cluster, or any other options?
>
> I use Hadoop 2.6 with HBase and Hive
>
> Thanks
>
> Regards
>
>

Re: How to backup and Restore Hadoop 2.x ?

Posted by James Bond <bo...@gmail.com>.

One way is to create a backup cluster or a secondary cluster.
1. Ingest data in both clusters in "parallel", basically run jobs in both
the clusters. This will kind of help you in backup and also make sure that
you can switch over to the back up cluster when you have troubles with the
Primary cluster. This setup usually makes sense when you have 2 Data
centers with one being Primary DC and the other Backup.
2. Have a primary cluster and a secondary which is kept in sync with thr
primary. Usually distcp type of jobs. Cloudera gives a front end to manage
this replications but essentially does a distcp in the background.
3. If your data ingestion is flume/kafka etc, you can use it to write to
both Primary/secondary clusters.

I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
guess somebody who does can comment.

On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <ar...@gmail.com>
wrote:

> Hi,
>
> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
> Hadoop cluster, or any other options?
>
> I use Hadoop 2.6 with HBase and Hive
>
> Thanks
>
> Regards
>
>