You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Koji Noguchi <kn...@apache.org> on 2020/10/21 14:51:03 UTC

Re: [E] Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Datanode's BlockScanner periodically verifies checksum of all the blocks on
each datanode.

Koji

On Wed, Oct 21, 2020 at 10:26 AM संजीव (Sanjeev Tripurari) <
sanjeevtripurari@gmail.com> wrote:

> Hi Tom
>
> Therefore, if I write a file to HDFS but access it two years later, then
> the checksum will be computed only twice, at the beginning of the two years
> and again at the end when a client connects?  Correct?  As long as no
> process ever accesses the file between now and two years from now, the
> checksum is never redone and compared to the two year old checksum in the
> fsimage?
>
> yes, Exactly unless data is read checksum is not verified. (when data is
> written and when the data is read),
> if checksum is mismatched, there is no way to correct it, you will have to
> re-write that file.
>
> When  datanode is added back in, there is no real read operation on the
> files themselves.  The datanode just reports the blocks but doesn't really
> read the blocks that are there to re-verify the files and ensure
> consistency?
>
> yes, Exactly, datanode maintains list of files and their blocks, which it
> reports, along with total disk size and used size.
> Namenode only has list of blocks, unless datanodes is connected it wont
> know where the blocks are stored.
>
> Regards
> -Sanjeev
>
>
> On Wed, 21 Oct 2020 at 18:31, TomK <to...@mdevsys.com> wrote:
>
>> Hey Sanjeev,
>>
>> Thank you very much again.  This confirms my suspision.
>>
>> Therefore, if I write a file to HDFS but access it two years later, then
>> the checksum will be computed only twice, at the beginning of the two years
>> and again at the end when a client connects?  Correct?  As long as no
>> process ever accesses the file between now and two years from now, the
>> checksum is never redone and compared to the two year old checksum in the
>> fsimage?
>>
>> When  datanode is added back in, there is no real read operation on the
>> files themselves.  The datanode just reports the blocks but doesn't really
>> read the blocks that are there to re-verify the files and ensure
>> consistency?
>>
>> Thx,
>> TK
>>
>>
>>
>> On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:
>>
>> Hi Tom,
>>
>> Every datanode sends heartbeat to namenode, on its list of blocks it has.
>>
>> When a datanode which is disconnected for a while, after connecting will
>> send heartbeat to namenode, with list of blocks it has (till then namenode
>> will have under-replicated blocks).
>> As soon as the datanode is connected to namenode, it will clear
>> under-replicatred blocks.
>>
>> *When a client connects to read or write a file, it will run checksum to
>> validate the file.*
>>
>> There is no independent process running to do checksum, as it will be
>> heavy process on each node.
>>
>> Regards
>> -Sanjeev
>>
>> On Wed, 21 Oct 2020 at 00:18, Tom <tk...@mdevsys.com> wrote:
>>
>>> Thank you.  That part I understand and am Ok with it.
>>>
>>> What I would like to know next is when again the CRC32C checksum is ran
>>> and checked against the fsimage that the block file has not changed or
>>> become corrupted?
>>>
>>> For example, if I take a datanode out, and within 15 minutes, plug it
>>> back in, does HDF rerun the CRC 32C on all data disks on that node to make
>>> sure blocks are ok?
>>>
>>> Cheers,
>>> TK
>>>
>>> Sent from my iPhone
>>>
>>> On Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev Tripurari) <
>>> sanjeevtripurari@gmail.com> wrote:
>>>
>>> its done as sson as  a file is stored on disk..
>>>
>>> Sanjeev
>>>
>>> On Tuesday, 20 October 2020, TomK <to...@mdevsys.com> wrote:
>>>
>>>> Thanks again.
>>>>
>>>> At what points is the checksum validated (checked) after that?  For
>>>> example, is it done on a daily basis or is it done only when the file is
>>>> accessed?
>>>>
>>>> Thx,
>>>> TK
>>>>
>>>> On 10/20/2020 10:18 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>
>>>> As soon as the file is written first time checksum is calculated and
>>>> updated in fsimage (first in edit logs), and same is replicated other
>>>> replicas.
>>>>
>>>>
>>>>
>>>> On Tue, 20 Oct 2020 at 19:15, TomK <to...@mdevsys.com> wrote:
>>>>
>>>>> Hi Sanjeev,
>>>>>
>>>>> Thank you.  It does help.
>>>>>
>>>>> At what points is the checksum calculated?
>>>>>
>>>>> Thx,
>>>>> TK
>>>>>
>>>>> On 10/20/2020 3:03 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>
>>>>> For Missing blocks and corrupted blocks, do check if all the datanode
>>>>> services are up, non of the disks where hdfs data is stored is accessible
>>>>> and have no issues, hosts are reachable from namenode,
>>>>>
>>>>> If you are able to re-generate the data and write its great, otherwise
>>>>> hadoop cannot correct itself.
>>>>>
>>>>> Could you please elaborate on this?  Does it mean I have to
>>>>> continuously access a file for HDFS to be able to detect corrupt blocks and
>>>>> correct itself?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *"Does HDFS check that the data node is up, data disk is mounted, path
>>>>> to the file exists and file can be read?"*
>>>>> -- yes, only after it fails it will say missing blocks.
>>>>>
>>>>>
>>>>> *Or does it also do a filesystem check on that data disk as well as
>>>>> perhaps a checksum to ensure block integrity?*
>>>>> -- yes, every file cheksum is maintained and cross checked, if it
>>>>> fails it will say corrupted blocks.
>>>>>
>>>>> hope this helps.
>>>>>
>>>>> -Sanjeev
>>>>>
>>>>>
>>>>> On Tue, 20 Oct 2020 at 09:52, TomK <to...@mdevsys.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> HDFS Missing Blocks / Corrupt Blocks Logic:  What are the specific
>>>>>> checks done to determine a block is bad and needs to be replicated?
>>>>>>
>>>>>> Does HDFS check that the data node is up, data disk is mounted, path
>>>>>> to
>>>>>> the file exists and file can be read?
>>>>>>
>>>>>> Or does it also do a filesystem check on that data disk as well as
>>>>>> perhaps a checksum to ensure block integrity?
>>>>>>
>>>>>> I've googled on this quite a bit.  I don't see the exact answer I'm
>>>>>> looking for.  I would like to know exactly what happens during file
>>>>>> integrity verification that then constitutes missing blocks or
>>>>>> corrupt
>>>>>> blocks in the reports.
>>>>>>
>>>>>> --
>>>>>> Thank  You,
>>>>>> TK.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>>>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> Thx,
>>>> TK.
>>>>
>>>
>> --
>> Thx,
>> TK.
>>
>