You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Aji Janis <aj...@gmail.com> on 2012/08/13 18:31:18 UTC

Hardware failure and data protection

I am very new to Hadoop and Accumulo. I need some information on how data
is backed up or guaranteed against system failures (if it is).I am
considering setting up a Hadoop cluster consisting of 5 nodes where each
node has 3 internal hard drives. I understand HDFS has a configurable
redundancy feature but what happens if an entire drive crashes (physically)
for whatever reason? How does Hadoop recover, if it can, from this
situation? More specifically, I am assuming Accumulo uses HDFS redundancy
to make back ups of the data.

One, is this assumption true?
Two, if I had a copy of the hard drive and I duplicate that to a new drive
and pop it in where the old/crashed drive used to be would this work?

I apologize if this is a really stupid question. But I highly appreciate
any help, pointers and suggestions! Thanks in advance.

Re: Hardware failure and data protection

Posted by Billie Rinaldi <bi...@apache.org>.
On Mon, Aug 13, 2012 at 12:31 PM, Aji Janis <aj...@gmail.com> wrote:

> I am very new to Hadoop and Accumulo. I need some information on how data
> is backed up or guaranteed against system failures (if it is).I am
> considering setting up a Hadoop cluster consisting of 5 nodes where each
> node has 3 internal hard drives. I understand HDFS has a configurable
> redundancy feature but what happens if an entire drive crashes (physically)
> for whatever reason? How does Hadoop recover, if it can, from this
> situation? More specifically, I am assuming Accumulo uses HDFS redundancy
> to make back ups of the data.
>
> One, is this assumption true?
>

Yes, Accumulo uses HDFS replication to preserve data in the presence of
failures.  HDFS stores N exact copies of each data block, with each copy
being stored on a different server.  If a drive crashes, HDFS notices that
blocks are under-replicated, and copies those blocks to an available
drive.  Thus the data can survive N-1 simultaneous failures.


> Two, if I had a copy of the hard drive and I duplicate that to a new drive
> and pop it in where the old/crashed drive used to be would this work?
>

Since that drive's data would have been replicated to other drives, you
should not need a copy of it.  You should just be able to put in a fresh
hard drive, and HDFS will start using it.

Billie


>
> I apologize if this is a really stupid question. But I highly appreciate
> any help, pointers and suggestions! Thanks in advance.
>