You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Steve Edison <se...@gmail.com> on 2013/01/25 00:29:11 UTC

How to Backup HDFS data ?

Folks,

Its been an year and my HDFS / Solar /Hive setup is working flawless. The
data logs which were meaningless to my business all of a sudden became
precious to the extent that our management wants to backup this data. I am
talking about 20 TB of active HDFS data with an incremental of 2 TB/month.
We would like to have weekly and monthly backups upto 12 months.

Any ideas how to do this ?

-- Steve

Re: How to Backup HDFS data ?

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Steve,

            On top of Harsh answer, other than Backup there is a feature
called Snapshot offered by some third party vendors like MapR.
            Though its not really a backup it is just a point for which you
can revert back at any point in time.

Best,
Mahesh Balija,
CalsoftLabs.

On Fri, Jan 25, 2013 at 11:53 AM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <ma...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to Backup HDFS data ?

Posted by Ted Dunning <td...@maprtech.com>.
Incremental backups are nice to avoid copying all your data again.

You can code these at the application layer if you have nice partitioning
and keep track correctly.

You can also use platform level capabilities such as provided for by the
MapR distribution.

On Fri, Jan 25, 2013 at 3:23 PM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <ma...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to Backup HDFS data ?

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Steve,

            On top of Harsh answer, other than Backup there is a feature
called Snapshot offered by some third party vendors like MapR.
            Though its not really a backup it is just a point for which you
can revert back at any point in time.

Best,
Mahesh Balija,
CalsoftLabs.

On Fri, Jan 25, 2013 at 11:53 AM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <ma...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to Backup HDFS data ?

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Steve,

            On top of Harsh answer, other than Backup there is a feature
called Snapshot offered by some third party vendors like MapR.
            Though its not really a backup it is just a point for which you
can revert back at any point in time.

Best,
Mahesh Balija,
CalsoftLabs.

On Fri, Jan 25, 2013 at 11:53 AM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <ma...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to Backup HDFS data ?

Posted by Ted Dunning <td...@maprtech.com>.
Incremental backups are nice to avoid copying all your data again.

You can code these at the application layer if you have nice partitioning
and keep track correctly.

You can also use platform level capabilities such as provided for by the
MapR distribution.

On Fri, Jan 25, 2013 at 3:23 PM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <ma...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to Backup HDFS data ?

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Steve,

            On top of Harsh answer, other than Backup there is a feature
called Snapshot offered by some third party vendors like MapR.
            Though its not really a backup it is just a point for which you
can revert back at any point in time.

Best,
Mahesh Balija,
CalsoftLabs.

On Fri, Jan 25, 2013 at 11:53 AM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <ma...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to Backup HDFS data ?

Posted by Ted Dunning <td...@maprtech.com>.
Incremental backups are nice to avoid copying all your data again.

You can code these at the application layer if you have nice partitioning
and keep track correctly.

You can also use platform level capabilities such as provided for by the
MapR distribution.

On Fri, Jan 25, 2013 at 3:23 PM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <ma...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to Backup HDFS data ?

Posted by Ted Dunning <td...@maprtech.com>.
Incremental backups are nice to avoid copying all your data again.

You can code these at the application layer if you have nice partitioning
and keep track correctly.

You can also use platform level capabilities such as provided for by the
MapR distribution.

On Fri, Jan 25, 2013 at 3:23 PM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <ma...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to Backup HDFS data ?

Posted by Harsh J <ha...@cloudera.com>.
You need some form of space capacity on the backup cluster that can
withstand it. Lower replication (<3) may also be an option there to
save yourself some disks/nodes?

On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> Backup to disks is what we do right now. Distcp would copy across HDFS
> clusters, meaning by I will have to build another 12 node cluster ? Is that
> correct ?
>
>
> On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> <ma...@gmail.com> wrote:
>>
>> Backup on tape or on disk?
>>
>> On disk, have another Hadoop cluster dans do regular distcp.
>>
>> On tape, make sure you have a backup program which can backup streams
>> so you don't have to materialize your TB files outside of your Hadoop
>> cluster first... (I know Simpana can't do that :-().
>>
>> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
>> > The
>> > data logs which were meaningless to my business all of a sudden became
>> > precious to the extent that our management wants to backup this data. I
>> > am
>> > talking about 20 TB of active HDFS data with an incremental of 2
>> > TB/month.
>> > We would like to have weekly and monthly backups upto 12 months.
>> >
>> > Any ideas how to do this ?
>> >
>> > -- Steve
>
>



-- 
Harsh J

Re: How to Backup HDFS data ?

Posted by Harsh J <ha...@cloudera.com>.
You need some form of space capacity on the backup cluster that can
withstand it. Lower replication (<3) may also be an option there to
save yourself some disks/nodes?

On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> Backup to disks is what we do right now. Distcp would copy across HDFS
> clusters, meaning by I will have to build another 12 node cluster ? Is that
> correct ?
>
>
> On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> <ma...@gmail.com> wrote:
>>
>> Backup on tape or on disk?
>>
>> On disk, have another Hadoop cluster dans do regular distcp.
>>
>> On tape, make sure you have a backup program which can backup streams
>> so you don't have to materialize your TB files outside of your Hadoop
>> cluster first... (I know Simpana can't do that :-().
>>
>> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
>> > The
>> > data logs which were meaningless to my business all of a sudden became
>> > precious to the extent that our management wants to backup this data. I
>> > am
>> > talking about 20 TB of active HDFS data with an incremental of 2
>> > TB/month.
>> > We would like to have weekly and monthly backups upto 12 months.
>> >
>> > Any ideas how to do this ?
>> >
>> > -- Steve
>
>



-- 
Harsh J

Re: How to Backup HDFS data ?

Posted by Harsh J <ha...@cloudera.com>.
You need some form of space capacity on the backup cluster that can
withstand it. Lower replication (<3) may also be an option there to
save yourself some disks/nodes?

On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> Backup to disks is what we do right now. Distcp would copy across HDFS
> clusters, meaning by I will have to build another 12 node cluster ? Is that
> correct ?
>
>
> On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> <ma...@gmail.com> wrote:
>>
>> Backup on tape or on disk?
>>
>> On disk, have another Hadoop cluster dans do regular distcp.
>>
>> On tape, make sure you have a backup program which can backup streams
>> so you don't have to materialize your TB files outside of your Hadoop
>> cluster first... (I know Simpana can't do that :-().
>>
>> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
>> > The
>> > data logs which were meaningless to my business all of a sudden became
>> > precious to the extent that our management wants to backup this data. I
>> > am
>> > talking about 20 TB of active HDFS data with an incremental of 2
>> > TB/month.
>> > We would like to have weekly and monthly backups upto 12 months.
>> >
>> > Any ideas how to do this ?
>> >
>> > -- Steve
>
>



-- 
Harsh J

Re: How to Backup HDFS data ?

Posted by Harsh J <ha...@cloudera.com>.
You need some form of space capacity on the backup cluster that can
withstand it. Lower replication (<3) may also be an option there to
save yourself some disks/nodes?

On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <se...@gmail.com> wrote:
> Backup to disks is what we do right now. Distcp would copy across HDFS
> clusters, meaning by I will have to build another 12 node cluster ? Is that
> correct ?
>
>
> On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> <ma...@gmail.com> wrote:
>>
>> Backup on tape or on disk?
>>
>> On disk, have another Hadoop cluster dans do regular distcp.
>>
>> On tape, make sure you have a backup program which can backup streams
>> so you don't have to materialize your TB files outside of your Hadoop
>> cluster first... (I know Simpana can't do that :-().
>>
>> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
>> > The
>> > data logs which were meaningless to my business all of a sudden became
>> > precious to the extent that our management wants to backup this data. I
>> > am
>> > talking about 20 TB of active HDFS data with an incremental of 2
>> > TB/month.
>> > We would like to have weekly and monthly backups upto 12 months.
>> >
>> > Any ideas how to do this ?
>> >
>> > -- Steve
>
>



-- 
Harsh J

Re: How to Backup HDFS data ?

Posted by Steve Edison <se...@gmail.com>.
Backup to disks is what we do right now. Distcp would copy across HDFS
clusters, meaning by I will have to build another 12 node cluster ? Is that
correct ?


On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> Backup on tape or on disk?
>
> On disk, have another Hadoop cluster dans do regular distcp.
>
> On tape, make sure you have a backup program which can backup streams
> so you don't have to materialize your TB files outside of your Hadoop
> cluster first... (I know Simpana can't do that :-().
>
> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> wrote:
> > Folks,
> >
> > Its been an year and my HDFS / Solar /Hive setup is working flawless. The
> > data logs which were meaningless to my business all of a sudden became
> > precious to the extent that our management wants to backup this data. I
> am
> > talking about 20 TB of active HDFS data with an incremental of 2
> TB/month.
> > We would like to have weekly and monthly backups upto 12 months.
> >
> > Any ideas how to do this ?
> >
> > -- Steve
>

Re: How to Backup HDFS data ?

Posted by Steve Edison <se...@gmail.com>.
Backup to disks is what we do right now. Distcp would copy across HDFS
clusters, meaning by I will have to build another 12 node cluster ? Is that
correct ?


On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> Backup on tape or on disk?
>
> On disk, have another Hadoop cluster dans do regular distcp.
>
> On tape, make sure you have a backup program which can backup streams
> so you don't have to materialize your TB files outside of your Hadoop
> cluster first... (I know Simpana can't do that :-().
>
> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> wrote:
> > Folks,
> >
> > Its been an year and my HDFS / Solar /Hive setup is working flawless. The
> > data logs which were meaningless to my business all of a sudden became
> > precious to the extent that our management wants to backup this data. I
> am
> > talking about 20 TB of active HDFS data with an incremental of 2
> TB/month.
> > We would like to have weekly and monthly backups upto 12 months.
> >
> > Any ideas how to do this ?
> >
> > -- Steve
>

Re: How to Backup HDFS data ?

Posted by Steve Edison <se...@gmail.com>.
Backup to disks is what we do right now. Distcp would copy across HDFS
clusters, meaning by I will have to build another 12 node cluster ? Is that
correct ?


On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> Backup on tape or on disk?
>
> On disk, have another Hadoop cluster dans do regular distcp.
>
> On tape, make sure you have a backup program which can backup streams
> so you don't have to materialize your TB files outside of your Hadoop
> cluster first... (I know Simpana can't do that :-().
>
> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> wrote:
> > Folks,
> >
> > Its been an year and my HDFS / Solar /Hive setup is working flawless. The
> > data logs which were meaningless to my business all of a sudden became
> > precious to the extent that our management wants to backup this data. I
> am
> > talking about 20 TB of active HDFS data with an incremental of 2
> TB/month.
> > We would like to have weekly and monthly backups upto 12 months.
> >
> > Any ideas how to do this ?
> >
> > -- Steve
>

Re: How to Backup HDFS data ?

Posted by Steve Edison <se...@gmail.com>.
Backup to disks is what we do right now. Distcp would copy across HDFS
clusters, meaning by I will have to build another 12 node cluster ? Is that
correct ?


On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> Backup on tape or on disk?
>
> On disk, have another Hadoop cluster dans do regular distcp.
>
> On tape, make sure you have a backup program which can backup streams
> so you don't have to materialize your TB files outside of your Hadoop
> cluster first... (I know Simpana can't do that :-().
>
> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com>
> wrote:
> > Folks,
> >
> > Its been an year and my HDFS / Solar /Hive setup is working flawless. The
> > data logs which were meaningless to my business all of a sudden became
> > precious to the extent that our management wants to backup this data. I
> am
> > talking about 20 TB of active HDFS data with an incremental of 2
> TB/month.
> > We would like to have weekly and monthly backups upto 12 months.
> >
> > Any ideas how to do this ?
> >
> > -- Steve
>

Re: How to Backup HDFS data ?

Posted by Mathias Herberts <ma...@gmail.com>.
Backup on tape or on disk?

On disk, have another Hadoop cluster dans do regular distcp.

On tape, make sure you have a backup program which can backup streams
so you don't have to materialize your TB files outside of your Hadoop
cluster first... (I know Simpana can't do that :-().

On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com> wrote:
> Folks,
>
> Its been an year and my HDFS / Solar /Hive setup is working flawless. The
> data logs which were meaningless to my business all of a sudden became
> precious to the extent that our management wants to backup this data. I am
> talking about 20 TB of active HDFS data with an incremental of 2 TB/month.
> We would like to have weekly and monthly backups upto 12 months.
>
> Any ideas how to do this ?
>
> -- Steve

Re: How to Backup HDFS data ?

Posted by Mathias Herberts <ma...@gmail.com>.
Backup on tape or on disk?

On disk, have another Hadoop cluster dans do regular distcp.

On tape, make sure you have a backup program which can backup streams
so you don't have to materialize your TB files outside of your Hadoop
cluster first... (I know Simpana can't do that :-().

On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com> wrote:
> Folks,
>
> Its been an year and my HDFS / Solar /Hive setup is working flawless. The
> data logs which were meaningless to my business all of a sudden became
> precious to the extent that our management wants to backup this data. I am
> talking about 20 TB of active HDFS data with an incremental of 2 TB/month.
> We would like to have weekly and monthly backups upto 12 months.
>
> Any ideas how to do this ?
>
> -- Steve

Re: How to Backup HDFS data ?

Posted by Mathias Herberts <ma...@gmail.com>.
Backup on tape or on disk?

On disk, have another Hadoop cluster dans do regular distcp.

On tape, make sure you have a backup program which can backup streams
so you don't have to materialize your TB files outside of your Hadoop
cluster first... (I know Simpana can't do that :-().

On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com> wrote:
> Folks,
>
> Its been an year and my HDFS / Solar /Hive setup is working flawless. The
> data logs which were meaningless to my business all of a sudden became
> precious to the extent that our management wants to backup this data. I am
> talking about 20 TB of active HDFS data with an incremental of 2 TB/month.
> We would like to have weekly and monthly backups upto 12 months.
>
> Any ideas how to do this ?
>
> -- Steve

Re: How to Backup HDFS data ?

Posted by Mathias Herberts <ma...@gmail.com>.
Backup on tape or on disk?

On disk, have another Hadoop cluster dans do regular distcp.

On tape, make sure you have a backup program which can backup streams
so you don't have to materialize your TB files outside of your Hadoop
cluster first... (I know Simpana can't do that :-().

On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <se...@gmail.com> wrote:
> Folks,
>
> Its been an year and my HDFS / Solar /Hive setup is working flawless. The
> data logs which were meaningless to my business all of a sudden became
> precious to the extent that our management wants to backup this data. I am
> talking about 20 TB of active HDFS data with an incremental of 2 TB/month.
> We would like to have weekly and monthly backups upto 12 months.
>
> Any ideas how to do this ?
>
> -- Steve