You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Brian Bockelman <bb...@cse.unl.edu> on 2009/09/10 04:27:45 UTC

Tracking Replication errors

Hey everyone,

We're going through a review of our usage of HDFS (it's a good thing!  
- we're trying to get "official").  One reviewer asked a good question  
that I don't know the answer too - could you help?  To quote,

"What steps do you take to ensure the block rebalancing produces non- 
corrupted files?  Do you have to wait 2 weeks before you discover this?"

I believe the correct answer is:

"""
When a block is replicated from one node to another, only the  
resulting block size is checked.  The checksums on the source and  
destination are not compared.  Therefore, if there's any corruption  
that occurs, it would take until the next block verification to detect  
it.
"""

If you look at TCP error rates and random memory corruptions, it  
wouldn't be surprising to see silent errors in copying between nodes,  
especially on multi-hundred-TB or PB scale installs.

Any comments?

Brian

Re: Tracking Replication errors

Posted by Dhruba Borthakur <dh...@gmail.com>.
The sender datanode sends the crc along with the data. This allows the
receiver datanode to detect corrupt data. The orignal crc was created by the
client that created the data in the block for the first time. The crc is not
kept in the namenode. To facilitate random access, there is a crc per 512
bytes of data... this is too much of metadata for the NN to hold in memory.

dhruba


On Wed, Sep 9, 2009 at 8:33 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

>
> On Sep 9, 2009, at 10:25 PM, Dhruba Borthakur wrote:
>
>  when a block is being received by a datanode (either because of a
>> replication request or from a client write), the datanode verifies crc.
>>
>
> Ah, so I'm wrong and the answer is better than I expected.  Never have I
> been so happy to be wrong :)
>
> Where is the "master" crc kept?  The sending datanode?  I assume this means
> that the first datanode to write the block is the "master".
>
> It's late, but I vaguely remember that 0.21.0 does some elaborate
> gymnastics to determine who has the master copy of the block.
>
> Is the CRC kept in the NN?  Any specific reason why not, beyond decreasing
> the memory footprint?
>
> Brian
>
>
>  Also, the there is a thread in the datanode that periodically verifies crc
>> of existing blocks.
>>
>> dhruba
>>
>>
>> On Wed, Sep 9, 2009 at 7:27 PM, Brian Bockelman <bbockelm@cse.unl.edu
>> >wrote:
>>
>>  Hey everyone,
>>>
>>> We're going through a review of our usage of HDFS (it's a good thing! -
>>> we're trying to get "official").  One reviewer asked a good question that
>>> I
>>> don't know the answer too - could you help?  To quote,
>>>
>>> "What steps do you take to ensure the block rebalancing produces
>>> non-corrupted files?  Do you have to wait 2 weeks before you discover
>>> this?"
>>>
>>> I believe the correct answer is:
>>>
>>> """
>>> When a block is replicated from one node to another, only the resulting
>>> block size is checked.  The checksums on the source and destination are
>>> not
>>> compared.  Therefore, if there's any corruption that occurs, it would
>>> take
>>> until the next block verification to detect it.
>>> """
>>>
>>> If you look at TCP error rates and random memory corruptions, it wouldn't
>>> be surprising to see silent errors in copying between nodes, especially
>>> on
>>> multi-hundred-TB or PB scale installs.
>>>
>>> Any comments?
>>>
>>> Brian
>>>
>>
>

Re: Tracking Replication errors

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Sep 9, 2009, at 10:25 PM, Dhruba Borthakur wrote:

> when a block is being received by a datanode (either because of a
> replication request or from a client write), the datanode verifies  
> crc.

Ah, so I'm wrong and the answer is better than I expected.  Never have  
I been so happy to be wrong :)

Where is the "master" crc kept?  The sending datanode?  I assume this  
means that the first datanode to write the block is the "master".

It's late, but I vaguely remember that 0.21.0 does some elaborate  
gymnastics to determine who has the master copy of the block.

Is the CRC kept in the NN?  Any specific reason why not, beyond  
decreasing the memory footprint?

Brian

> Also, the there is a thread in the datanode that periodically  
> verifies crc
> of existing blocks.
>
> dhruba
>
>
> On Wed, Sep 9, 2009 at 7:27 PM, Brian Bockelman  
> <bb...@cse.unl.edu>wrote:
>
>> Hey everyone,
>>
>> We're going through a review of our usage of HDFS (it's a good  
>> thing! -
>> we're trying to get "official").  One reviewer asked a good  
>> question that I
>> don't know the answer too - could you help?  To quote,
>>
>> "What steps do you take to ensure the block rebalancing produces
>> non-corrupted files?  Do you have to wait 2 weeks before you  
>> discover this?"
>>
>> I believe the correct answer is:
>>
>> """
>> When a block is replicated from one node to another, only the  
>> resulting
>> block size is checked.  The checksums on the source and destination  
>> are not
>> compared.  Therefore, if there's any corruption that occurs, it  
>> would take
>> until the next block verification to detect it.
>> """
>>
>> If you look at TCP error rates and random memory corruptions, it  
>> wouldn't
>> be surprising to see silent errors in copying between nodes,  
>> especially on
>> multi-hundred-TB or PB scale installs.
>>
>> Any comments?
>>
>> Brian


Re: Tracking Replication errors

Posted by Dhruba Borthakur <dh...@gmail.com>.
when a block is being received by a datanode (either because of a
replication request or from a client write), the datanode verifies crc.
Also, the there is a thread in the datanode that periodically verifies crc
of existing blocks.

dhruba


On Wed, Sep 9, 2009 at 7:27 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> Hey everyone,
>
> We're going through a review of our usage of HDFS (it's a good thing! -
> we're trying to get "official").  One reviewer asked a good question that I
> don't know the answer too - could you help?  To quote,
>
> "What steps do you take to ensure the block rebalancing produces
> non-corrupted files?  Do you have to wait 2 weeks before you discover this?"
>
> I believe the correct answer is:
>
> """
> When a block is replicated from one node to another, only the resulting
> block size is checked.  The checksums on the source and destination are not
> compared.  Therefore, if there's any corruption that occurs, it would take
> until the next block verification to detect it.
> """
>
> If you look at TCP error rates and random memory corruptions, it wouldn't
> be surprising to see silent errors in copying between nodes, especially on
> multi-hundred-TB or PB scale installs.
>
> Any comments?
>
> Brian