You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Vinayak Borkar <vi...@gmail.com> on 2014/10/05 05:45:01 UTC

HDFS openforwrite CORRUPT -> HEALTHY

Hi,


I was experimenting with HDFS to push its boundaries on fault tolerance. 
Here is what I observed.

I am using HDFS from Hadoop 2.2. I started the NameNode and then a 
single DataNode. I started writing to a DFS file from a Java client 
periodically calling hsync(). After some time, I powered off the machine 
that was running this test (not shutdown, just abruptly powered off).

When the system came back up, and HDFS processes were up and HDFS was 
out of safe mode, I ran fsck on the DFS filesystem (with  -openforwrite 
-files -blocks) options and here is the output:


/test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks 
of total size 388970 B
0. 
BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]} 
len=388970 MISSING!

Status: CORRUPT
  Total size:	7214119 B
  Total dirs:	54
  Total files:	232
  Total symlinks:		0
  Total blocks (validated):	214 (avg. block size 33710 B)
   ********************************
   CORRUPT FILES:	1
   MISSING BLOCKS:	1
   MISSING SIZE:		388970 B
   ********************************
  Minimally replicated blocks:	213 (99.53271 %)
  Over-replicated blocks:	0 (0.0 %)
  Under-replicated blocks:	213 (99.53271 %)
  Mis-replicated blocks:		0 (0.0 %)
  Default replication factor:	3
  Average block replication:	0.9953271
  Corrupt blocks:		0
  Missing replicas:		426 (66.35514 %)
  Number of data-nodes:		1
  Number of racks:		1
FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds


I just let the system sit for some time and reran fsck (after about 
15-20 mins) and surprisingly the output was very different. The 
corruption was magically gone:

/test/test.log 1859584 bytes, 1 block(s):  Under replicated 
BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target 
Replicas is 3 but found 1 replica(s).
0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421 
len=1859584 repl=1

Status: HEALTHY
  Total size:	8684733 B
  Total dirs:	54
  Total files:	232
  Total symlinks:		0
  Total blocks (validated):	214 (avg. block size 40582 B)
  Minimally replicated blocks:	214 (100.0 %)
  Over-replicated blocks:	0 (0.0 %)
  Under-replicated blocks:	214 (100.0 %)
  Mis-replicated blocks:		0 (0.0 %)
  Default replication factor:	3
  Average block replication:	1.0
  Corrupt blocks:		0
  Missing replicas:		428 (66.666664 %)
  Number of data-nodes:		1
  Number of racks:		1
FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds


The filesystem under path '/' is HEALTHY



So my question is this: What just happened? How did the NameNode recover 
that missing block and why did it take 15 mins or so? Is there some kind 
of a lease on the file (because of the open nature) that expired after 
the 15-20 mins? Can someone with knowledge of HDFS internals please shed 
some light on what could possibly be going on or point me to sections of 
the code that could answer my questions? Also is there a way to speed 
this process up? Like say trigger the expiration of the lease (assuming 
it is a lease).

Thanks,
Vinayak

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Hi Ulul,

Thanks for trying. I will try the dev list to see if they can help me 
with this.

Thanks,
Vinayak


On 10/11/14, 5:33 AM, Ulul wrote:
> Hi Vinayak,
> Sorry this is beyond my understanding. I would need to test furthet to
> try and understand the problem.
> Hope you'll find help from someone else
> Ulul
>
> Le 08/10/2014 07:18, Vinayak Borkar a écrit :
>> Hi Ulul,
>>
>> I think I can explain why the sizes differ and the block names vary.
>> There is no client interaction. My client writes data and calls hsync,
>> and then writes more data to the same file. My understanding is that
>> under such circumstances, the file size is not reflected accurately in
>> HDFS until the file is actually closed. So the namenode's view of the
>> file size will be lower than the actual size of the data in the block.
>> If you look at the block closely, you will see that the block number
>> is the same for the two blocks. The part that is different is the
>> version number - this is consistent with HDFS's behavior when hsyncing
>> the output stream and then continuing to write more. It looks like the
>> name node is informed much later about the last block that the
>> datanode actually wrote.
>>
>> My client was not started when the machine came back up. So all
>> changes seen in the FSCK output were owing to HDFS.
>>
>>
>> Vinayak
>>
>>
>> On 10/7/14, 2:37 PM, Ulul wrote:
>>>
>>> Hi Vinayak
>>>
>>> I find strange that the file should have a different size and the block
>>> a different name.
>>> Are you sure your writing client wasn't interfering ?
>>>
>>> Ulul
>>>
>>> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>>>> Trying again since I did not get a reply. Please let me know if I
>>>> should use a different forum to ask this question.
>>>>
>>>> Thanks,
>>>> Vinayak
>>>>
>>>>
>>>>
>>>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> I was experimenting with HDFS to push its boundaries on fault
>>>>> tolerance.
>>>>> Here is what I observed.
>>>>>
>>>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>>>> single DataNode. I started writing to a DFS file from a Java client
>>>>> periodically calling hsync(). After some time, I powered off the
>>>>> machine
>>>>> that was running this test (not shutdown, just abruptly powered off).
>>>>>
>>>>> When the system came back up, and HDFS processes were up and HDFS was
>>>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>>>> -files -blocks) options and here is the output:
>>>>>
>>>>>
>>>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE: MISSING 1
>>>>> blocks
>>>>> of total size 388970 B
>>>>> 0.
>>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
>>>>>
>>>>>
>>>>> primaryNodeIndex=-1,
>>>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
>>>>>
>>>>>
>>>>> len=388970 MISSING!
>>>>>
>>>>> Status: CORRUPT
>>>>>   Total size:    7214119 B
>>>>>   Total dirs:    54
>>>>>   Total files:    232
>>>>>   Total symlinks:        0
>>>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>>>    ********************************
>>>>>    CORRUPT FILES:    1
>>>>>    MISSING BLOCKS:    1
>>>>>    MISSING SIZE:        388970 B
>>>>>    ********************************
>>>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>>   Under-replicated blocks:    213 (99.53271 %)
>>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>>   Default replication factor:    3
>>>>>   Average block replication:    0.9953271
>>>>>   Corrupt blocks:        0
>>>>>   Missing replicas:        426 (66.35514 %)
>>>>>   Number of data-nodes:        1
>>>>>   Number of racks:        1
>>>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>>>
>>>>>
>>>>> I just let the system sit for some time and reran fsck (after about
>>>>> 15-20 mins) and surprisingly the output was very different. The
>>>>> corruption was magically gone:
>>>>>
>>>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>>>> Replicas is 3 but found 1 replica(s).
>>>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>>>> len=1859584 repl=1
>>>>>
>>>>> Status: HEALTHY
>>>>>   Total size:    8684733 B
>>>>>   Total dirs:    54
>>>>>   Total files:    232
>>>>>   Total symlinks:        0
>>>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>>>   Minimally replicated blocks:    214 (100.0 %)
>>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>>   Under-replicated blocks:    214 (100.0 %)
>>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>>   Default replication factor:    3
>>>>>   Average block replication:    1.0
>>>>>   Corrupt blocks:        0
>>>>>   Missing replicas:        428 (66.666664 %)
>>>>>   Number of data-nodes:        1
>>>>>   Number of racks:        1
>>>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>>>
>>>>>
>>>>> The filesystem under path '/' is HEALTHY
>>>>>
>>>>>
>>>>>
>>>>> So my question is this: What just happened? How did the NameNode
>>>>> recover
>>>>> that missing block and why did it take 15 mins or so? Is there some
>>>>> kind
>>>>> of a lease on the file (because of the open nature) that expired after
>>>>> the 15-20 mins? Can someone with knowledge of HDFS internals please
>>>>> shed
>>>>> some light on what could possibly be going on or point me to
>>>>> sections of
>>>>> the code that could answer my questions? Also is there a way to speed
>>>>> this process up? Like say trigger the expiration of the lease
>>>>> (assuming
>>>>> it is a lease).
>>>>>
>>>>> Thanks,
>>>>> Vinayak
>>>>
>>>
>>>
>>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Hi Ulul,

Thanks for trying. I will try the dev list to see if they can help me 
with this.

Thanks,
Vinayak


On 10/11/14, 5:33 AM, Ulul wrote:
> Hi Vinayak,
> Sorry this is beyond my understanding. I would need to test furthet to
> try and understand the problem.
> Hope you'll find help from someone else
> Ulul
>
> Le 08/10/2014 07:18, Vinayak Borkar a écrit :
>> Hi Ulul,
>>
>> I think I can explain why the sizes differ and the block names vary.
>> There is no client interaction. My client writes data and calls hsync,
>> and then writes more data to the same file. My understanding is that
>> under such circumstances, the file size is not reflected accurately in
>> HDFS until the file is actually closed. So the namenode's view of the
>> file size will be lower than the actual size of the data in the block.
>> If you look at the block closely, you will see that the block number
>> is the same for the two blocks. The part that is different is the
>> version number - this is consistent with HDFS's behavior when hsyncing
>> the output stream and then continuing to write more. It looks like the
>> name node is informed much later about the last block that the
>> datanode actually wrote.
>>
>> My client was not started when the machine came back up. So all
>> changes seen in the FSCK output were owing to HDFS.
>>
>>
>> Vinayak
>>
>>
>> On 10/7/14, 2:37 PM, Ulul wrote:
>>>
>>> Hi Vinayak
>>>
>>> I find strange that the file should have a different size and the block
>>> a different name.
>>> Are you sure your writing client wasn't interfering ?
>>>
>>> Ulul
>>>
>>> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>>>> Trying again since I did not get a reply. Please let me know if I
>>>> should use a different forum to ask this question.
>>>>
>>>> Thanks,
>>>> Vinayak
>>>>
>>>>
>>>>
>>>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> I was experimenting with HDFS to push its boundaries on fault
>>>>> tolerance.
>>>>> Here is what I observed.
>>>>>
>>>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>>>> single DataNode. I started writing to a DFS file from a Java client
>>>>> periodically calling hsync(). After some time, I powered off the
>>>>> machine
>>>>> that was running this test (not shutdown, just abruptly powered off).
>>>>>
>>>>> When the system came back up, and HDFS processes were up and HDFS was
>>>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>>>> -files -blocks) options and here is the output:
>>>>>
>>>>>
>>>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE: MISSING 1
>>>>> blocks
>>>>> of total size 388970 B
>>>>> 0.
>>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
>>>>>
>>>>>
>>>>> primaryNodeIndex=-1,
>>>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
>>>>>
>>>>>
>>>>> len=388970 MISSING!
>>>>>
>>>>> Status: CORRUPT
>>>>>   Total size:    7214119 B
>>>>>   Total dirs:    54
>>>>>   Total files:    232
>>>>>   Total symlinks:        0
>>>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>>>    ********************************
>>>>>    CORRUPT FILES:    1
>>>>>    MISSING BLOCKS:    1
>>>>>    MISSING SIZE:        388970 B
>>>>>    ********************************
>>>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>>   Under-replicated blocks:    213 (99.53271 %)
>>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>>   Default replication factor:    3
>>>>>   Average block replication:    0.9953271
>>>>>   Corrupt blocks:        0
>>>>>   Missing replicas:        426 (66.35514 %)
>>>>>   Number of data-nodes:        1
>>>>>   Number of racks:        1
>>>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>>>
>>>>>
>>>>> I just let the system sit for some time and reran fsck (after about
>>>>> 15-20 mins) and surprisingly the output was very different. The
>>>>> corruption was magically gone:
>>>>>
>>>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>>>> Replicas is 3 but found 1 replica(s).
>>>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>>>> len=1859584 repl=1
>>>>>
>>>>> Status: HEALTHY
>>>>>   Total size:    8684733 B
>>>>>   Total dirs:    54
>>>>>   Total files:    232
>>>>>   Total symlinks:        0
>>>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>>>   Minimally replicated blocks:    214 (100.0 %)
>>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>>   Under-replicated blocks:    214 (100.0 %)
>>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>>   Default replication factor:    3
>>>>>   Average block replication:    1.0
>>>>>   Corrupt blocks:        0
>>>>>   Missing replicas:        428 (66.666664 %)
>>>>>   Number of data-nodes:        1
>>>>>   Number of racks:        1
>>>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>>>
>>>>>
>>>>> The filesystem under path '/' is HEALTHY
>>>>>
>>>>>
>>>>>
>>>>> So my question is this: What just happened? How did the NameNode
>>>>> recover
>>>>> that missing block and why did it take 15 mins or so? Is there some
>>>>> kind
>>>>> of a lease on the file (because of the open nature) that expired after
>>>>> the 15-20 mins? Can someone with knowledge of HDFS internals please
>>>>> shed
>>>>> some light on what could possibly be going on or point me to
>>>>> sections of
>>>>> the code that could answer my questions? Also is there a way to speed
>>>>> this process up? Like say trigger the expiration of the lease
>>>>> (assuming
>>>>> it is a lease).
>>>>>
>>>>> Thanks,
>>>>> Vinayak
>>>>
>>>
>>>
>>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Hi Ulul,

Thanks for trying. I will try the dev list to see if they can help me 
with this.

Thanks,
Vinayak


On 10/11/14, 5:33 AM, Ulul wrote:
> Hi Vinayak,
> Sorry this is beyond my understanding. I would need to test furthet to
> try and understand the problem.
> Hope you'll find help from someone else
> Ulul
>
> Le 08/10/2014 07:18, Vinayak Borkar a écrit :
>> Hi Ulul,
>>
>> I think I can explain why the sizes differ and the block names vary.
>> There is no client interaction. My client writes data and calls hsync,
>> and then writes more data to the same file. My understanding is that
>> under such circumstances, the file size is not reflected accurately in
>> HDFS until the file is actually closed. So the namenode's view of the
>> file size will be lower than the actual size of the data in the block.
>> If you look at the block closely, you will see that the block number
>> is the same for the two blocks. The part that is different is the
>> version number - this is consistent with HDFS's behavior when hsyncing
>> the output stream and then continuing to write more. It looks like the
>> name node is informed much later about the last block that the
>> datanode actually wrote.
>>
>> My client was not started when the machine came back up. So all
>> changes seen in the FSCK output were owing to HDFS.
>>
>>
>> Vinayak
>>
>>
>> On 10/7/14, 2:37 PM, Ulul wrote:
>>>
>>> Hi Vinayak
>>>
>>> I find strange that the file should have a different size and the block
>>> a different name.
>>> Are you sure your writing client wasn't interfering ?
>>>
>>> Ulul
>>>
>>> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>>>> Trying again since I did not get a reply. Please let me know if I
>>>> should use a different forum to ask this question.
>>>>
>>>> Thanks,
>>>> Vinayak
>>>>
>>>>
>>>>
>>>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> I was experimenting with HDFS to push its boundaries on fault
>>>>> tolerance.
>>>>> Here is what I observed.
>>>>>
>>>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>>>> single DataNode. I started writing to a DFS file from a Java client
>>>>> periodically calling hsync(). After some time, I powered off the
>>>>> machine
>>>>> that was running this test (not shutdown, just abruptly powered off).
>>>>>
>>>>> When the system came back up, and HDFS processes were up and HDFS was
>>>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>>>> -files -blocks) options and here is the output:
>>>>>
>>>>>
>>>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE: MISSING 1
>>>>> blocks
>>>>> of total size 388970 B
>>>>> 0.
>>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
>>>>>
>>>>>
>>>>> primaryNodeIndex=-1,
>>>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
>>>>>
>>>>>
>>>>> len=388970 MISSING!
>>>>>
>>>>> Status: CORRUPT
>>>>>   Total size:    7214119 B
>>>>>   Total dirs:    54
>>>>>   Total files:    232
>>>>>   Total symlinks:        0
>>>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>>>    ********************************
>>>>>    CORRUPT FILES:    1
>>>>>    MISSING BLOCKS:    1
>>>>>    MISSING SIZE:        388970 B
>>>>>    ********************************
>>>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>>   Under-replicated blocks:    213 (99.53271 %)
>>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>>   Default replication factor:    3
>>>>>   Average block replication:    0.9953271
>>>>>   Corrupt blocks:        0
>>>>>   Missing replicas:        426 (66.35514 %)
>>>>>   Number of data-nodes:        1
>>>>>   Number of racks:        1
>>>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>>>
>>>>>
>>>>> I just let the system sit for some time and reran fsck (after about
>>>>> 15-20 mins) and surprisingly the output was very different. The
>>>>> corruption was magically gone:
>>>>>
>>>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>>>> Replicas is 3 but found 1 replica(s).
>>>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>>>> len=1859584 repl=1
>>>>>
>>>>> Status: HEALTHY
>>>>>   Total size:    8684733 B
>>>>>   Total dirs:    54
>>>>>   Total files:    232
>>>>>   Total symlinks:        0
>>>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>>>   Minimally replicated blocks:    214 (100.0 %)
>>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>>   Under-replicated blocks:    214 (100.0 %)
>>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>>   Default replication factor:    3
>>>>>   Average block replication:    1.0
>>>>>   Corrupt blocks:        0
>>>>>   Missing replicas:        428 (66.666664 %)
>>>>>   Number of data-nodes:        1
>>>>>   Number of racks:        1
>>>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>>>
>>>>>
>>>>> The filesystem under path '/' is HEALTHY
>>>>>
>>>>>
>>>>>
>>>>> So my question is this: What just happened? How did the NameNode
>>>>> recover
>>>>> that missing block and why did it take 15 mins or so? Is there some
>>>>> kind
>>>>> of a lease on the file (because of the open nature) that expired after
>>>>> the 15-20 mins? Can someone with knowledge of HDFS internals please
>>>>> shed
>>>>> some light on what could possibly be going on or point me to
>>>>> sections of
>>>>> the code that could answer my questions? Also is there a way to speed
>>>>> this process up? Like say trigger the expiration of the lease
>>>>> (assuming
>>>>> it is a lease).
>>>>>
>>>>> Thanks,
>>>>> Vinayak
>>>>
>>>
>>>
>>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Hi Ulul,

Thanks for trying. I will try the dev list to see if they can help me 
with this.

Thanks,
Vinayak


On 10/11/14, 5:33 AM, Ulul wrote:
> Hi Vinayak,
> Sorry this is beyond my understanding. I would need to test furthet to
> try and understand the problem.
> Hope you'll find help from someone else
> Ulul
>
> Le 08/10/2014 07:18, Vinayak Borkar a écrit :
>> Hi Ulul,
>>
>> I think I can explain why the sizes differ and the block names vary.
>> There is no client interaction. My client writes data and calls hsync,
>> and then writes more data to the same file. My understanding is that
>> under such circumstances, the file size is not reflected accurately in
>> HDFS until the file is actually closed. So the namenode's view of the
>> file size will be lower than the actual size of the data in the block.
>> If you look at the block closely, you will see that the block number
>> is the same for the two blocks. The part that is different is the
>> version number - this is consistent with HDFS's behavior when hsyncing
>> the output stream and then continuing to write more. It looks like the
>> name node is informed much later about the last block that the
>> datanode actually wrote.
>>
>> My client was not started when the machine came back up. So all
>> changes seen in the FSCK output were owing to HDFS.
>>
>>
>> Vinayak
>>
>>
>> On 10/7/14, 2:37 PM, Ulul wrote:
>>>
>>> Hi Vinayak
>>>
>>> I find strange that the file should have a different size and the block
>>> a different name.
>>> Are you sure your writing client wasn't interfering ?
>>>
>>> Ulul
>>>
>>> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>>>> Trying again since I did not get a reply. Please let me know if I
>>>> should use a different forum to ask this question.
>>>>
>>>> Thanks,
>>>> Vinayak
>>>>
>>>>
>>>>
>>>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> I was experimenting with HDFS to push its boundaries on fault
>>>>> tolerance.
>>>>> Here is what I observed.
>>>>>
>>>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>>>> single DataNode. I started writing to a DFS file from a Java client
>>>>> periodically calling hsync(). After some time, I powered off the
>>>>> machine
>>>>> that was running this test (not shutdown, just abruptly powered off).
>>>>>
>>>>> When the system came back up, and HDFS processes were up and HDFS was
>>>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>>>> -files -blocks) options and here is the output:
>>>>>
>>>>>
>>>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE: MISSING 1
>>>>> blocks
>>>>> of total size 388970 B
>>>>> 0.
>>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
>>>>>
>>>>>
>>>>> primaryNodeIndex=-1,
>>>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
>>>>>
>>>>>
>>>>> len=388970 MISSING!
>>>>>
>>>>> Status: CORRUPT
>>>>>   Total size:    7214119 B
>>>>>   Total dirs:    54
>>>>>   Total files:    232
>>>>>   Total symlinks:        0
>>>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>>>    ********************************
>>>>>    CORRUPT FILES:    1
>>>>>    MISSING BLOCKS:    1
>>>>>    MISSING SIZE:        388970 B
>>>>>    ********************************
>>>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>>   Under-replicated blocks:    213 (99.53271 %)
>>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>>   Default replication factor:    3
>>>>>   Average block replication:    0.9953271
>>>>>   Corrupt blocks:        0
>>>>>   Missing replicas:        426 (66.35514 %)
>>>>>   Number of data-nodes:        1
>>>>>   Number of racks:        1
>>>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>>>
>>>>>
>>>>> I just let the system sit for some time and reran fsck (after about
>>>>> 15-20 mins) and surprisingly the output was very different. The
>>>>> corruption was magically gone:
>>>>>
>>>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>>>> Replicas is 3 but found 1 replica(s).
>>>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>>>> len=1859584 repl=1
>>>>>
>>>>> Status: HEALTHY
>>>>>   Total size:    8684733 B
>>>>>   Total dirs:    54
>>>>>   Total files:    232
>>>>>   Total symlinks:        0
>>>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>>>   Minimally replicated blocks:    214 (100.0 %)
>>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>>   Under-replicated blocks:    214 (100.0 %)
>>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>>   Default replication factor:    3
>>>>>   Average block replication:    1.0
>>>>>   Corrupt blocks:        0
>>>>>   Missing replicas:        428 (66.666664 %)
>>>>>   Number of data-nodes:        1
>>>>>   Number of racks:        1
>>>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>>>
>>>>>
>>>>> The filesystem under path '/' is HEALTHY
>>>>>
>>>>>
>>>>>
>>>>> So my question is this: What just happened? How did the NameNode
>>>>> recover
>>>>> that missing block and why did it take 15 mins or so? Is there some
>>>>> kind
>>>>> of a lease on the file (because of the open nature) that expired after
>>>>> the 15-20 mins? Can someone with knowledge of HDFS internals please
>>>>> shed
>>>>> some light on what could possibly be going on or point me to
>>>>> sections of
>>>>> the code that could answer my questions? Also is there a way to speed
>>>>> this process up? Like say trigger the expiration of the lease
>>>>> (assuming
>>>>> it is a lease).
>>>>>
>>>>> Thanks,
>>>>> Vinayak
>>>>
>>>
>>>
>>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Ulul <ha...@ulul.org>.

Hi Vinayak,
Sorry this is beyond my understanding. I would need to test furthet to 
try and understand the problem.
Hope you'll find help from someone else
Ulul

Le 08/10/2014 07:18, Vinayak Borkar a écrit :
> Hi Ulul,
>
> I think I can explain why the sizes differ and the block names vary. 
> There is no client interaction. My client writes data and calls hsync, 
> and then writes more data to the same file. My understanding is that 
> under such circumstances, the file size is not reflected accurately in 
> HDFS until the file is actually closed. So the namenode's view of the 
> file size will be lower than the actual size of the data in the block. 
> If you look at the block closely, you will see that the block number 
> is the same for the two blocks. The part that is different is the 
> version number - this is consistent with HDFS's behavior when hsyncing 
> the output stream and then continuing to write more. It looks like the 
> name node is informed much later about the last block that the 
> datanode actually wrote.
>
> My client was not started when the machine came back up. So all 
> changes seen in the FSCK output were owing to HDFS.
>
>
> Vinayak
>
>
> On 10/7/14, 2:37 PM, Ulul wrote:
>>
>> Hi Vinayak
>>
>> I find strange that the file should have a different size and the block
>> a different name.
>> Are you sure your writing client wasn't interfering ?
>>
>> Ulul
>>
>> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>>> Trying again since I did not get a reply. Please let me know if I
>>> should use a different forum to ask this question.
>>>
>>> Thanks,
>>> Vinayak
>>>
>>>
>>>
>>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>>> Hi,
>>>>
>>>>
>>>> I was experimenting with HDFS to push its boundaries on fault 
>>>> tolerance.
>>>> Here is what I observed.
>>>>
>>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>>> single DataNode. I started writing to a DFS file from a Java client
>>>> periodically calling hsync(). After some time, I powered off the 
>>>> machine
>>>> that was running this test (not shutdown, just abruptly powered off).
>>>>
>>>> When the system came back up, and HDFS processes were up and HDFS was
>>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>>> -files -blocks) options and here is the output:
>>>>
>>>>
>>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE: MISSING 1 
>>>> blocks
>>>> of total size 388970 B
>>>> 0.
>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION, 
>>>>
>>>>
>>>> primaryNodeIndex=-1,
>>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]} 
>>>>
>>>>
>>>> len=388970 MISSING!
>>>>
>>>> Status: CORRUPT
>>>>   Total size:    7214119 B
>>>>   Total dirs:    54
>>>>   Total files:    232
>>>>   Total symlinks:        0
>>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>>    ********************************
>>>>    CORRUPT FILES:    1
>>>>    MISSING BLOCKS:    1
>>>>    MISSING SIZE:        388970 B
>>>>    ********************************
>>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>   Under-replicated blocks:    213 (99.53271 %)
>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>   Default replication factor:    3
>>>>   Average block replication:    0.9953271
>>>>   Corrupt blocks:        0
>>>>   Missing replicas:        426 (66.35514 %)
>>>>   Number of data-nodes:        1
>>>>   Number of racks:        1
>>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>>
>>>>
>>>> I just let the system sit for some time and reran fsck (after about
>>>> 15-20 mins) and surprisingly the output was very different. The
>>>> corruption was magically gone:
>>>>
>>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>>> Replicas is 3 but found 1 replica(s).
>>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>>> len=1859584 repl=1
>>>>
>>>> Status: HEALTHY
>>>>   Total size:    8684733 B
>>>>   Total dirs:    54
>>>>   Total files:    232
>>>>   Total symlinks:        0
>>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>>   Minimally replicated blocks:    214 (100.0 %)
>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>   Under-replicated blocks:    214 (100.0 %)
>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>   Default replication factor:    3
>>>>   Average block replication:    1.0
>>>>   Corrupt blocks:        0
>>>>   Missing replicas:        428 (66.666664 %)
>>>>   Number of data-nodes:        1
>>>>   Number of racks:        1
>>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>>
>>>>
>>>> The filesystem under path '/' is HEALTHY
>>>>
>>>>
>>>>
>>>> So my question is this: What just happened? How did the NameNode 
>>>> recover
>>>> that missing block and why did it take 15 mins or so? Is there some 
>>>> kind
>>>> of a lease on the file (because of the open nature) that expired after
>>>> the 15-20 mins? Can someone with knowledge of HDFS internals please 
>>>> shed
>>>> some light on what could possibly be going on or point me to 
>>>> sections of
>>>> the code that could answer my questions? Also is there a way to speed
>>>> this process up? Like say trigger the expiration of the lease 
>>>> (assuming
>>>> it is a lease).
>>>>
>>>> Thanks,
>>>> Vinayak
>>>
>>
>>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Ulul <ha...@ulul.org>.

Hi Vinayak,
Sorry this is beyond my understanding. I would need to test furthet to 
try and understand the problem.
Hope you'll find help from someone else
Ulul

Le 08/10/2014 07:18, Vinayak Borkar a écrit :
> Hi Ulul,
>
> I think I can explain why the sizes differ and the block names vary. 
> There is no client interaction. My client writes data and calls hsync, 
> and then writes more data to the same file. My understanding is that 
> under such circumstances, the file size is not reflected accurately in 
> HDFS until the file is actually closed. So the namenode's view of the 
> file size will be lower than the actual size of the data in the block. 
> If you look at the block closely, you will see that the block number 
> is the same for the two blocks. The part that is different is the 
> version number - this is consistent with HDFS's behavior when hsyncing 
> the output stream and then continuing to write more. It looks like the 
> name node is informed much later about the last block that the 
> datanode actually wrote.
>
> My client was not started when the machine came back up. So all 
> changes seen in the FSCK output were owing to HDFS.
>
>
> Vinayak
>
>
> On 10/7/14, 2:37 PM, Ulul wrote:
>>
>> Hi Vinayak
>>
>> I find strange that the file should have a different size and the block
>> a different name.
>> Are you sure your writing client wasn't interfering ?
>>
>> Ulul
>>
>> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>>> Trying again since I did not get a reply. Please let me know if I
>>> should use a different forum to ask this question.
>>>
>>> Thanks,
>>> Vinayak
>>>
>>>
>>>
>>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>>> Hi,
>>>>
>>>>
>>>> I was experimenting with HDFS to push its boundaries on fault 
>>>> tolerance.
>>>> Here is what I observed.
>>>>
>>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>>> single DataNode. I started writing to a DFS file from a Java client
>>>> periodically calling hsync(). After some time, I powered off the 
>>>> machine
>>>> that was running this test (not shutdown, just abruptly powered off).
>>>>
>>>> When the system came back up, and HDFS processes were up and HDFS was
>>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>>> -files -blocks) options and here is the output:
>>>>
>>>>
>>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE: MISSING 1 
>>>> blocks
>>>> of total size 388970 B
>>>> 0.
>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION, 
>>>>
>>>>
>>>> primaryNodeIndex=-1,
>>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]} 
>>>>
>>>>
>>>> len=388970 MISSING!
>>>>
>>>> Status: CORRUPT
>>>>   Total size:    7214119 B
>>>>   Total dirs:    54
>>>>   Total files:    232
>>>>   Total symlinks:        0
>>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>>    ********************************
>>>>    CORRUPT FILES:    1
>>>>    MISSING BLOCKS:    1
>>>>    MISSING SIZE:        388970 B
>>>>    ********************************
>>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>   Under-replicated blocks:    213 (99.53271 %)
>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>   Default replication factor:    3
>>>>   Average block replication:    0.9953271
>>>>   Corrupt blocks:        0
>>>>   Missing replicas:        426 (66.35514 %)
>>>>   Number of data-nodes:        1
>>>>   Number of racks:        1
>>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>>
>>>>
>>>> I just let the system sit for some time and reran fsck (after about
>>>> 15-20 mins) and surprisingly the output was very different. The
>>>> corruption was magically gone:
>>>>
>>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>>> Replicas is 3 but found 1 replica(s).
>>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>>> len=1859584 repl=1
>>>>
>>>> Status: HEALTHY
>>>>   Total size:    8684733 B
>>>>   Total dirs:    54
>>>>   Total files:    232
>>>>   Total symlinks:        0
>>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>>   Minimally replicated blocks:    214 (100.0 %)
>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>   Under-replicated blocks:    214 (100.0 %)
>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>   Default replication factor:    3
>>>>   Average block replication:    1.0
>>>>   Corrupt blocks:        0
>>>>   Missing replicas:        428 (66.666664 %)
>>>>   Number of data-nodes:        1
>>>>   Number of racks:        1
>>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>>
>>>>
>>>> The filesystem under path '/' is HEALTHY
>>>>
>>>>
>>>>
>>>> So my question is this: What just happened? How did the NameNode 
>>>> recover
>>>> that missing block and why did it take 15 mins or so? Is there some 
>>>> kind
>>>> of a lease on the file (because of the open nature) that expired after
>>>> the 15-20 mins? Can someone with knowledge of HDFS internals please 
>>>> shed
>>>> some light on what could possibly be going on or point me to 
>>>> sections of
>>>> the code that could answer my questions? Also is there a way to speed
>>>> this process up? Like say trigger the expiration of the lease 
>>>> (assuming
>>>> it is a lease).
>>>>
>>>> Thanks,
>>>> Vinayak
>>>
>>
>>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Ulul <ha...@ulul.org>.

Hi Vinayak,
Sorry this is beyond my understanding. I would need to test furthet to 
try and understand the problem.
Hope you'll find help from someone else
Ulul

Le 08/10/2014 07:18, Vinayak Borkar a écrit :
> Hi Ulul,
>
> I think I can explain why the sizes differ and the block names vary. 
> There is no client interaction. My client writes data and calls hsync, 
> and then writes more data to the same file. My understanding is that 
> under such circumstances, the file size is not reflected accurately in 
> HDFS until the file is actually closed. So the namenode's view of the 
> file size will be lower than the actual size of the data in the block. 
> If you look at the block closely, you will see that the block number 
> is the same for the two blocks. The part that is different is the 
> version number - this is consistent with HDFS's behavior when hsyncing 
> the output stream and then continuing to write more. It looks like the 
> name node is informed much later about the last block that the 
> datanode actually wrote.
>
> My client was not started when the machine came back up. So all 
> changes seen in the FSCK output were owing to HDFS.
>
>
> Vinayak
>
>
> On 10/7/14, 2:37 PM, Ulul wrote:
>>
>> Hi Vinayak
>>
>> I find strange that the file should have a different size and the block
>> a different name.
>> Are you sure your writing client wasn't interfering ?
>>
>> Ulul
>>
>> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>>> Trying again since I did not get a reply. Please let me know if I
>>> should use a different forum to ask this question.
>>>
>>> Thanks,
>>> Vinayak
>>>
>>>
>>>
>>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>>> Hi,
>>>>
>>>>
>>>> I was experimenting with HDFS to push its boundaries on fault 
>>>> tolerance.
>>>> Here is what I observed.
>>>>
>>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>>> single DataNode. I started writing to a DFS file from a Java client
>>>> periodically calling hsync(). After some time, I powered off the 
>>>> machine
>>>> that was running this test (not shutdown, just abruptly powered off).
>>>>
>>>> When the system came back up, and HDFS processes were up and HDFS was
>>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>>> -files -blocks) options and here is the output:
>>>>
>>>>
>>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE: MISSING 1 
>>>> blocks
>>>> of total size 388970 B
>>>> 0.
>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION, 
>>>>
>>>>
>>>> primaryNodeIndex=-1,
>>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]} 
>>>>
>>>>
>>>> len=388970 MISSING!
>>>>
>>>> Status: CORRUPT
>>>>   Total size:    7214119 B
>>>>   Total dirs:    54
>>>>   Total files:    232
>>>>   Total symlinks:        0
>>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>>    ********************************
>>>>    CORRUPT FILES:    1
>>>>    MISSING BLOCKS:    1
>>>>    MISSING SIZE:        388970 B
>>>>    ********************************
>>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>   Under-replicated blocks:    213 (99.53271 %)
>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>   Default replication factor:    3
>>>>   Average block replication:    0.9953271
>>>>   Corrupt blocks:        0
>>>>   Missing replicas:        426 (66.35514 %)
>>>>   Number of data-nodes:        1
>>>>   Number of racks:        1
>>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>>
>>>>
>>>> I just let the system sit for some time and reran fsck (after about
>>>> 15-20 mins) and surprisingly the output was very different. The
>>>> corruption was magically gone:
>>>>
>>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>>> Replicas is 3 but found 1 replica(s).
>>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>>> len=1859584 repl=1
>>>>
>>>> Status: HEALTHY
>>>>   Total size:    8684733 B
>>>>   Total dirs:    54
>>>>   Total files:    232
>>>>   Total symlinks:        0
>>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>>   Minimally replicated blocks:    214 (100.0 %)
>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>   Under-replicated blocks:    214 (100.0 %)
>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>   Default replication factor:    3
>>>>   Average block replication:    1.0
>>>>   Corrupt blocks:        0
>>>>   Missing replicas:        428 (66.666664 %)
>>>>   Number of data-nodes:        1
>>>>   Number of racks:        1
>>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>>
>>>>
>>>> The filesystem under path '/' is HEALTHY
>>>>
>>>>
>>>>
>>>> So my question is this: What just happened? How did the NameNode 
>>>> recover
>>>> that missing block and why did it take 15 mins or so? Is there some 
>>>> kind
>>>> of a lease on the file (because of the open nature) that expired after
>>>> the 15-20 mins? Can someone with knowledge of HDFS internals please 
>>>> shed
>>>> some light on what could possibly be going on or point me to 
>>>> sections of
>>>> the code that could answer my questions? Also is there a way to speed
>>>> this process up? Like say trigger the expiration of the lease 
>>>> (assuming
>>>> it is a lease).
>>>>
>>>> Thanks,
>>>> Vinayak
>>>
>>
>>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Ulul <ha...@ulul.org>.

Hi Vinayak,
Sorry this is beyond my understanding. I would need to test furthet to 
try and understand the problem.
Hope you'll find help from someone else
Ulul

Le 08/10/2014 07:18, Vinayak Borkar a écrit :
> Hi Ulul,
>
> I think I can explain why the sizes differ and the block names vary. 
> There is no client interaction. My client writes data and calls hsync, 
> and then writes more data to the same file. My understanding is that 
> under such circumstances, the file size is not reflected accurately in 
> HDFS until the file is actually closed. So the namenode's view of the 
> file size will be lower than the actual size of the data in the block. 
> If you look at the block closely, you will see that the block number 
> is the same for the two blocks. The part that is different is the 
> version number - this is consistent with HDFS's behavior when hsyncing 
> the output stream and then continuing to write more. It looks like the 
> name node is informed much later about the last block that the 
> datanode actually wrote.
>
> My client was not started when the machine came back up. So all 
> changes seen in the FSCK output were owing to HDFS.
>
>
> Vinayak
>
>
> On 10/7/14, 2:37 PM, Ulul wrote:
>>
>> Hi Vinayak
>>
>> I find strange that the file should have a different size and the block
>> a different name.
>> Are you sure your writing client wasn't interfering ?
>>
>> Ulul
>>
>> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>>> Trying again since I did not get a reply. Please let me know if I
>>> should use a different forum to ask this question.
>>>
>>> Thanks,
>>> Vinayak
>>>
>>>
>>>
>>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>>> Hi,
>>>>
>>>>
>>>> I was experimenting with HDFS to push its boundaries on fault 
>>>> tolerance.
>>>> Here is what I observed.
>>>>
>>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>>> single DataNode. I started writing to a DFS file from a Java client
>>>> periodically calling hsync(). After some time, I powered off the 
>>>> machine
>>>> that was running this test (not shutdown, just abruptly powered off).
>>>>
>>>> When the system came back up, and HDFS processes were up and HDFS was
>>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>>> -files -blocks) options and here is the output:
>>>>
>>>>
>>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE: MISSING 1 
>>>> blocks
>>>> of total size 388970 B
>>>> 0.
>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION, 
>>>>
>>>>
>>>> primaryNodeIndex=-1,
>>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]} 
>>>>
>>>>
>>>> len=388970 MISSING!
>>>>
>>>> Status: CORRUPT
>>>>   Total size:    7214119 B
>>>>   Total dirs:    54
>>>>   Total files:    232
>>>>   Total symlinks:        0
>>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>>    ********************************
>>>>    CORRUPT FILES:    1
>>>>    MISSING BLOCKS:    1
>>>>    MISSING SIZE:        388970 B
>>>>    ********************************
>>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>   Under-replicated blocks:    213 (99.53271 %)
>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>   Default replication factor:    3
>>>>   Average block replication:    0.9953271
>>>>   Corrupt blocks:        0
>>>>   Missing replicas:        426 (66.35514 %)
>>>>   Number of data-nodes:        1
>>>>   Number of racks:        1
>>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>>
>>>>
>>>> I just let the system sit for some time and reran fsck (after about
>>>> 15-20 mins) and surprisingly the output was very different. The
>>>> corruption was magically gone:
>>>>
>>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>>> Replicas is 3 but found 1 replica(s).
>>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>>> len=1859584 repl=1
>>>>
>>>> Status: HEALTHY
>>>>   Total size:    8684733 B
>>>>   Total dirs:    54
>>>>   Total files:    232
>>>>   Total symlinks:        0
>>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>>   Minimally replicated blocks:    214 (100.0 %)
>>>>   Over-replicated blocks:    0 (0.0 %)
>>>>   Under-replicated blocks:    214 (100.0 %)
>>>>   Mis-replicated blocks:        0 (0.0 %)
>>>>   Default replication factor:    3
>>>>   Average block replication:    1.0
>>>>   Corrupt blocks:        0
>>>>   Missing replicas:        428 (66.666664 %)
>>>>   Number of data-nodes:        1
>>>>   Number of racks:        1
>>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>>
>>>>
>>>> The filesystem under path '/' is HEALTHY
>>>>
>>>>
>>>>
>>>> So my question is this: What just happened? How did the NameNode 
>>>> recover
>>>> that missing block and why did it take 15 mins or so? Is there some 
>>>> kind
>>>> of a lease on the file (because of the open nature) that expired after
>>>> the 15-20 mins? Can someone with knowledge of HDFS internals please 
>>>> shed
>>>> some light on what could possibly be going on or point me to 
>>>> sections of
>>>> the code that could answer my questions? Also is there a way to speed
>>>> this process up? Like say trigger the expiration of the lease 
>>>> (assuming
>>>> it is a lease).
>>>>
>>>> Thanks,
>>>> Vinayak
>>>
>>
>>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Hi Ulul,

I think I can explain why the sizes differ and the block names vary. 
There is no client interaction. My client writes data and calls hsync, 
and then writes more data to the same file. My understanding is that 
under such circumstances, the file size is not reflected accurately in 
HDFS until the file is actually closed. So the namenode's view of the 
file size will be lower than the actual size of the data in the block. 
If you look at the block closely, you will see that the block number is 
the same for the two blocks. The part that is different is the version 
number - this is consistent with HDFS's behavior when hsyncing the 
output stream and then continuing to write more. It looks like the name 
node is informed much later about the last block that the datanode 
actually wrote.

My client was not started when the machine came back up. So all changes 
seen in the FSCK output were owing to HDFS.


Vinayak


On 10/7/14, 2:37 PM, Ulul wrote:
>
> Hi Vinayak
>
> I find strange that the file should have a different size and the block
> a different name.
> Are you sure your writing client wasn't interfering ?
>
> Ulul
>
> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>> Trying again since I did not get a reply. Please let me know if I
>> should use a different forum to ask this question.
>>
>> Thanks,
>> Vinayak
>>
>>
>>
>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>> Hi,
>>>
>>>
>>> I was experimenting with HDFS to push its boundaries on fault tolerance.
>>> Here is what I observed.
>>>
>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>> single DataNode. I started writing to a DFS file from a Java client
>>> periodically calling hsync(). After some time, I powered off the machine
>>> that was running this test (not shutdown, just abruptly powered off).
>>>
>>> When the system came back up, and HDFS processes were up and HDFS was
>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>> -files -blocks) options and here is the output:
>>>
>>>
>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
>>> of total size 388970 B
>>> 0.
>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
>>>
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
>>>
>>> len=388970 MISSING!
>>>
>>> Status: CORRUPT
>>>   Total size:    7214119 B
>>>   Total dirs:    54
>>>   Total files:    232
>>>   Total symlinks:        0
>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>    ********************************
>>>    CORRUPT FILES:    1
>>>    MISSING BLOCKS:    1
>>>    MISSING SIZE:        388970 B
>>>    ********************************
>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>   Over-replicated blocks:    0 (0.0 %)
>>>   Under-replicated blocks:    213 (99.53271 %)
>>>   Mis-replicated blocks:        0 (0.0 %)
>>>   Default replication factor:    3
>>>   Average block replication:    0.9953271
>>>   Corrupt blocks:        0
>>>   Missing replicas:        426 (66.35514 %)
>>>   Number of data-nodes:        1
>>>   Number of racks:        1
>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>
>>>
>>> I just let the system sit for some time and reran fsck (after about
>>> 15-20 mins) and surprisingly the output was very different. The
>>> corruption was magically gone:
>>>
>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>> Replicas is 3 but found 1 replica(s).
>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>> len=1859584 repl=1
>>>
>>> Status: HEALTHY
>>>   Total size:    8684733 B
>>>   Total dirs:    54
>>>   Total files:    232
>>>   Total symlinks:        0
>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>   Minimally replicated blocks:    214 (100.0 %)
>>>   Over-replicated blocks:    0 (0.0 %)
>>>   Under-replicated blocks:    214 (100.0 %)
>>>   Mis-replicated blocks:        0 (0.0 %)
>>>   Default replication factor:    3
>>>   Average block replication:    1.0
>>>   Corrupt blocks:        0
>>>   Missing replicas:        428 (66.666664 %)
>>>   Number of data-nodes:        1
>>>   Number of racks:        1
>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>
>>>
>>> The filesystem under path '/' is HEALTHY
>>>
>>>
>>>
>>> So my question is this: What just happened? How did the NameNode recover
>>> that missing block and why did it take 15 mins or so? Is there some kind
>>> of a lease on the file (because of the open nature) that expired after
>>> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
>>> some light on what could possibly be going on or point me to sections of
>>> the code that could answer my questions? Also is there a way to speed
>>> this process up? Like say trigger the expiration of the lease (assuming
>>> it is a lease).
>>>
>>> Thanks,
>>> Vinayak
>>
>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Hi Ulul,

I think I can explain why the sizes differ and the block names vary. 
There is no client interaction. My client writes data and calls hsync, 
and then writes more data to the same file. My understanding is that 
under such circumstances, the file size is not reflected accurately in 
HDFS until the file is actually closed. So the namenode's view of the 
file size will be lower than the actual size of the data in the block. 
If you look at the block closely, you will see that the block number is 
the same for the two blocks. The part that is different is the version 
number - this is consistent with HDFS's behavior when hsyncing the 
output stream and then continuing to write more. It looks like the name 
node is informed much later about the last block that the datanode 
actually wrote.

My client was not started when the machine came back up. So all changes 
seen in the FSCK output were owing to HDFS.


Vinayak


On 10/7/14, 2:37 PM, Ulul wrote:
>
> Hi Vinayak
>
> I find strange that the file should have a different size and the block
> a different name.
> Are you sure your writing client wasn't interfering ?
>
> Ulul
>
> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>> Trying again since I did not get a reply. Please let me know if I
>> should use a different forum to ask this question.
>>
>> Thanks,
>> Vinayak
>>
>>
>>
>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>> Hi,
>>>
>>>
>>> I was experimenting with HDFS to push its boundaries on fault tolerance.
>>> Here is what I observed.
>>>
>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>> single DataNode. I started writing to a DFS file from a Java client
>>> periodically calling hsync(). After some time, I powered off the machine
>>> that was running this test (not shutdown, just abruptly powered off).
>>>
>>> When the system came back up, and HDFS processes were up and HDFS was
>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>> -files -blocks) options and here is the output:
>>>
>>>
>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
>>> of total size 388970 B
>>> 0.
>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
>>>
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
>>>
>>> len=388970 MISSING!
>>>
>>> Status: CORRUPT
>>>   Total size:    7214119 B
>>>   Total dirs:    54
>>>   Total files:    232
>>>   Total symlinks:        0
>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>    ********************************
>>>    CORRUPT FILES:    1
>>>    MISSING BLOCKS:    1
>>>    MISSING SIZE:        388970 B
>>>    ********************************
>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>   Over-replicated blocks:    0 (0.0 %)
>>>   Under-replicated blocks:    213 (99.53271 %)
>>>   Mis-replicated blocks:        0 (0.0 %)
>>>   Default replication factor:    3
>>>   Average block replication:    0.9953271
>>>   Corrupt blocks:        0
>>>   Missing replicas:        426 (66.35514 %)
>>>   Number of data-nodes:        1
>>>   Number of racks:        1
>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>
>>>
>>> I just let the system sit for some time and reran fsck (after about
>>> 15-20 mins) and surprisingly the output was very different. The
>>> corruption was magically gone:
>>>
>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>> Replicas is 3 but found 1 replica(s).
>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>> len=1859584 repl=1
>>>
>>> Status: HEALTHY
>>>   Total size:    8684733 B
>>>   Total dirs:    54
>>>   Total files:    232
>>>   Total symlinks:        0
>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>   Minimally replicated blocks:    214 (100.0 %)
>>>   Over-replicated blocks:    0 (0.0 %)
>>>   Under-replicated blocks:    214 (100.0 %)
>>>   Mis-replicated blocks:        0 (0.0 %)
>>>   Default replication factor:    3
>>>   Average block replication:    1.0
>>>   Corrupt blocks:        0
>>>   Missing replicas:        428 (66.666664 %)
>>>   Number of data-nodes:        1
>>>   Number of racks:        1
>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>
>>>
>>> The filesystem under path '/' is HEALTHY
>>>
>>>
>>>
>>> So my question is this: What just happened? How did the NameNode recover
>>> that missing block and why did it take 15 mins or so? Is there some kind
>>> of a lease on the file (because of the open nature) that expired after
>>> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
>>> some light on what could possibly be going on or point me to sections of
>>> the code that could answer my questions? Also is there a way to speed
>>> this process up? Like say trigger the expiration of the lease (assuming
>>> it is a lease).
>>>
>>> Thanks,
>>> Vinayak
>>
>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Hi Ulul,

I think I can explain why the sizes differ and the block names vary. 
There is no client interaction. My client writes data and calls hsync, 
and then writes more data to the same file. My understanding is that 
under such circumstances, the file size is not reflected accurately in 
HDFS until the file is actually closed. So the namenode's view of the 
file size will be lower than the actual size of the data in the block. 
If you look at the block closely, you will see that the block number is 
the same for the two blocks. The part that is different is the version 
number - this is consistent with HDFS's behavior when hsyncing the 
output stream and then continuing to write more. It looks like the name 
node is informed much later about the last block that the datanode 
actually wrote.

My client was not started when the machine came back up. So all changes 
seen in the FSCK output were owing to HDFS.


Vinayak


On 10/7/14, 2:37 PM, Ulul wrote:
>
> Hi Vinayak
>
> I find strange that the file should have a different size and the block
> a different name.
> Are you sure your writing client wasn't interfering ?
>
> Ulul
>
> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>> Trying again since I did not get a reply. Please let me know if I
>> should use a different forum to ask this question.
>>
>> Thanks,
>> Vinayak
>>
>>
>>
>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>> Hi,
>>>
>>>
>>> I was experimenting with HDFS to push its boundaries on fault tolerance.
>>> Here is what I observed.
>>>
>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>> single DataNode. I started writing to a DFS file from a Java client
>>> periodically calling hsync(). After some time, I powered off the machine
>>> that was running this test (not shutdown, just abruptly powered off).
>>>
>>> When the system came back up, and HDFS processes were up and HDFS was
>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>> -files -blocks) options and here is the output:
>>>
>>>
>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
>>> of total size 388970 B
>>> 0.
>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
>>>
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
>>>
>>> len=388970 MISSING!
>>>
>>> Status: CORRUPT
>>>   Total size:    7214119 B
>>>   Total dirs:    54
>>>   Total files:    232
>>>   Total symlinks:        0
>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>    ********************************
>>>    CORRUPT FILES:    1
>>>    MISSING BLOCKS:    1
>>>    MISSING SIZE:        388970 B
>>>    ********************************
>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>   Over-replicated blocks:    0 (0.0 %)
>>>   Under-replicated blocks:    213 (99.53271 %)
>>>   Mis-replicated blocks:        0 (0.0 %)
>>>   Default replication factor:    3
>>>   Average block replication:    0.9953271
>>>   Corrupt blocks:        0
>>>   Missing replicas:        426 (66.35514 %)
>>>   Number of data-nodes:        1
>>>   Number of racks:        1
>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>
>>>
>>> I just let the system sit for some time and reran fsck (after about
>>> 15-20 mins) and surprisingly the output was very different. The
>>> corruption was magically gone:
>>>
>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>> Replicas is 3 but found 1 replica(s).
>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>> len=1859584 repl=1
>>>
>>> Status: HEALTHY
>>>   Total size:    8684733 B
>>>   Total dirs:    54
>>>   Total files:    232
>>>   Total symlinks:        0
>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>   Minimally replicated blocks:    214 (100.0 %)
>>>   Over-replicated blocks:    0 (0.0 %)
>>>   Under-replicated blocks:    214 (100.0 %)
>>>   Mis-replicated blocks:        0 (0.0 %)
>>>   Default replication factor:    3
>>>   Average block replication:    1.0
>>>   Corrupt blocks:        0
>>>   Missing replicas:        428 (66.666664 %)
>>>   Number of data-nodes:        1
>>>   Number of racks:        1
>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>
>>>
>>> The filesystem under path '/' is HEALTHY
>>>
>>>
>>>
>>> So my question is this: What just happened? How did the NameNode recover
>>> that missing block and why did it take 15 mins or so? Is there some kind
>>> of a lease on the file (because of the open nature) that expired after
>>> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
>>> some light on what could possibly be going on or point me to sections of
>>> the code that could answer my questions? Also is there a way to speed
>>> this process up? Like say trigger the expiration of the lease (assuming
>>> it is a lease).
>>>
>>> Thanks,
>>> Vinayak
>>
>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Hi Ulul,

I think I can explain why the sizes differ and the block names vary. 
There is no client interaction. My client writes data and calls hsync, 
and then writes more data to the same file. My understanding is that 
under such circumstances, the file size is not reflected accurately in 
HDFS until the file is actually closed. So the namenode's view of the 
file size will be lower than the actual size of the data in the block. 
If you look at the block closely, you will see that the block number is 
the same for the two blocks. The part that is different is the version 
number - this is consistent with HDFS's behavior when hsyncing the 
output stream and then continuing to write more. It looks like the name 
node is informed much later about the last block that the datanode 
actually wrote.

My client was not started when the machine came back up. So all changes 
seen in the FSCK output were owing to HDFS.


Vinayak


On 10/7/14, 2:37 PM, Ulul wrote:
>
> Hi Vinayak
>
> I find strange that the file should have a different size and the block
> a different name.
> Are you sure your writing client wasn't interfering ?
>
> Ulul
>
> Le 07/10/2014 19:41, Vinayak Borkar a écrit :
>> Trying again since I did not get a reply. Please let me know if I
>> should use a different forum to ask this question.
>>
>> Thanks,
>> Vinayak
>>
>>
>>
>> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>>> Hi,
>>>
>>>
>>> I was experimenting with HDFS to push its boundaries on fault tolerance.
>>> Here is what I observed.
>>>
>>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>>> single DataNode. I started writing to a DFS file from a Java client
>>> periodically calling hsync(). After some time, I powered off the machine
>>> that was running this test (not shutdown, just abruptly powered off).
>>>
>>> When the system came back up, and HDFS processes were up and HDFS was
>>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>>> -files -blocks) options and here is the output:
>>>
>>>
>>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
>>> of total size 388970 B
>>> 0.
>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
>>>
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
>>>
>>> len=388970 MISSING!
>>>
>>> Status: CORRUPT
>>>   Total size:    7214119 B
>>>   Total dirs:    54
>>>   Total files:    232
>>>   Total symlinks:        0
>>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>>    ********************************
>>>    CORRUPT FILES:    1
>>>    MISSING BLOCKS:    1
>>>    MISSING SIZE:        388970 B
>>>    ********************************
>>>   Minimally replicated blocks:    213 (99.53271 %)
>>>   Over-replicated blocks:    0 (0.0 %)
>>>   Under-replicated blocks:    213 (99.53271 %)
>>>   Mis-replicated blocks:        0 (0.0 %)
>>>   Default replication factor:    3
>>>   Average block replication:    0.9953271
>>>   Corrupt blocks:        0
>>>   Missing replicas:        426 (66.35514 %)
>>>   Number of data-nodes:        1
>>>   Number of racks:        1
>>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>>
>>>
>>> I just let the system sit for some time and reran fsck (after about
>>> 15-20 mins) and surprisingly the output was very different. The
>>> corruption was magically gone:
>>>
>>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>>> Replicas is 3 but found 1 replica(s).
>>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>>> len=1859584 repl=1
>>>
>>> Status: HEALTHY
>>>   Total size:    8684733 B
>>>   Total dirs:    54
>>>   Total files:    232
>>>   Total symlinks:        0
>>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>>   Minimally replicated blocks:    214 (100.0 %)
>>>   Over-replicated blocks:    0 (0.0 %)
>>>   Under-replicated blocks:    214 (100.0 %)
>>>   Mis-replicated blocks:        0 (0.0 %)
>>>   Default replication factor:    3
>>>   Average block replication:    1.0
>>>   Corrupt blocks:        0
>>>   Missing replicas:        428 (66.666664 %)
>>>   Number of data-nodes:        1
>>>   Number of racks:        1
>>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>>
>>>
>>> The filesystem under path '/' is HEALTHY
>>>
>>>
>>>
>>> So my question is this: What just happened? How did the NameNode recover
>>> that missing block and why did it take 15 mins or so? Is there some kind
>>> of a lease on the file (because of the open nature) that expired after
>>> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
>>> some light on what could possibly be going on or point me to sections of
>>> the code that could answer my questions? Also is there a way to speed
>>> this process up? Like say trigger the expiration of the lease (assuming
>>> it is a lease).
>>>
>>> Thanks,
>>> Vinayak
>>
>
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Ulul <ha...@ulul.org>.

Hi Vinayak

I find strange that the file should have a different size and the block 
a different name.
Are you sure your writing client wasn't interfering ?

Ulul

Le 07/10/2014 19:41, Vinayak Borkar a écrit :
> Trying again since I did not get a reply. Please let me know if I 
> should use a different forum to ask this question.
>
> Thanks,
> Vinayak
>
>
>
> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>> Hi,
>>
>>
>> I was experimenting with HDFS to push its boundaries on fault tolerance.
>> Here is what I observed.
>>
>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>> single DataNode. I started writing to a DFS file from a Java client
>> periodically calling hsync(). After some time, I powered off the machine
>> that was running this test (not shutdown, just abruptly powered off).
>>
>> When the system came back up, and HDFS processes were up and HDFS was
>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>> -files -blocks) options and here is the output:
>>
>>
>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
>> of total size 388970 B
>> 0.
>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION, 
>>
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]} 
>>
>> len=388970 MISSING!
>>
>> Status: CORRUPT
>>   Total size:    7214119 B
>>   Total dirs:    54
>>   Total files:    232
>>   Total symlinks:        0
>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>    ********************************
>>    CORRUPT FILES:    1
>>    MISSING BLOCKS:    1
>>    MISSING SIZE:        388970 B
>>    ********************************
>>   Minimally replicated blocks:    213 (99.53271 %)
>>   Over-replicated blocks:    0 (0.0 %)
>>   Under-replicated blocks:    213 (99.53271 %)
>>   Mis-replicated blocks:        0 (0.0 %)
>>   Default replication factor:    3
>>   Average block replication:    0.9953271
>>   Corrupt blocks:        0
>>   Missing replicas:        426 (66.35514 %)
>>   Number of data-nodes:        1
>>   Number of racks:        1
>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>
>>
>> I just let the system sit for some time and reran fsck (after about
>> 15-20 mins) and surprisingly the output was very different. The
>> corruption was magically gone:
>>
>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>> Replicas is 3 but found 1 replica(s).
>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>> len=1859584 repl=1
>>
>> Status: HEALTHY
>>   Total size:    8684733 B
>>   Total dirs:    54
>>   Total files:    232
>>   Total symlinks:        0
>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>   Minimally replicated blocks:    214 (100.0 %)
>>   Over-replicated blocks:    0 (0.0 %)
>>   Under-replicated blocks:    214 (100.0 %)
>>   Mis-replicated blocks:        0 (0.0 %)
>>   Default replication factor:    3
>>   Average block replication:    1.0
>>   Corrupt blocks:        0
>>   Missing replicas:        428 (66.666664 %)
>>   Number of data-nodes:        1
>>   Number of racks:        1
>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>
>>
>> The filesystem under path '/' is HEALTHY
>>
>>
>>
>> So my question is this: What just happened? How did the NameNode recover
>> that missing block and why did it take 15 mins or so? Is there some kind
>> of a lease on the file (because of the open nature) that expired after
>> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
>> some light on what could possibly be going on or point me to sections of
>> the code that could answer my questions? Also is there a way to speed
>> this process up? Like say trigger the expiration of the lease (assuming
>> it is a lease).
>>
>> Thanks,
>> Vinayak
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Ulul <ha...@ulul.org>.

Hi Vinayak

I find strange that the file should have a different size and the block 
a different name.
Are you sure your writing client wasn't interfering ?

Ulul

Le 07/10/2014 19:41, Vinayak Borkar a écrit :
> Trying again since I did not get a reply. Please let me know if I 
> should use a different forum to ask this question.
>
> Thanks,
> Vinayak
>
>
>
> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>> Hi,
>>
>>
>> I was experimenting with HDFS to push its boundaries on fault tolerance.
>> Here is what I observed.
>>
>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>> single DataNode. I started writing to a DFS file from a Java client
>> periodically calling hsync(). After some time, I powered off the machine
>> that was running this test (not shutdown, just abruptly powered off).
>>
>> When the system came back up, and HDFS processes were up and HDFS was
>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>> -files -blocks) options and here is the output:
>>
>>
>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
>> of total size 388970 B
>> 0.
>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION, 
>>
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]} 
>>
>> len=388970 MISSING!
>>
>> Status: CORRUPT
>>   Total size:    7214119 B
>>   Total dirs:    54
>>   Total files:    232
>>   Total symlinks:        0
>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>    ********************************
>>    CORRUPT FILES:    1
>>    MISSING BLOCKS:    1
>>    MISSING SIZE:        388970 B
>>    ********************************
>>   Minimally replicated blocks:    213 (99.53271 %)
>>   Over-replicated blocks:    0 (0.0 %)
>>   Under-replicated blocks:    213 (99.53271 %)
>>   Mis-replicated blocks:        0 (0.0 %)
>>   Default replication factor:    3
>>   Average block replication:    0.9953271
>>   Corrupt blocks:        0
>>   Missing replicas:        426 (66.35514 %)
>>   Number of data-nodes:        1
>>   Number of racks:        1
>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>
>>
>> I just let the system sit for some time and reran fsck (after about
>> 15-20 mins) and surprisingly the output was very different. The
>> corruption was magically gone:
>>
>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>> Replicas is 3 but found 1 replica(s).
>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>> len=1859584 repl=1
>>
>> Status: HEALTHY
>>   Total size:    8684733 B
>>   Total dirs:    54
>>   Total files:    232
>>   Total symlinks:        0
>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>   Minimally replicated blocks:    214 (100.0 %)
>>   Over-replicated blocks:    0 (0.0 %)
>>   Under-replicated blocks:    214 (100.0 %)
>>   Mis-replicated blocks:        0 (0.0 %)
>>   Default replication factor:    3
>>   Average block replication:    1.0
>>   Corrupt blocks:        0
>>   Missing replicas:        428 (66.666664 %)
>>   Number of data-nodes:        1
>>   Number of racks:        1
>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>
>>
>> The filesystem under path '/' is HEALTHY
>>
>>
>>
>> So my question is this: What just happened? How did the NameNode recover
>> that missing block and why did it take 15 mins or so? Is there some kind
>> of a lease on the file (because of the open nature) that expired after
>> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
>> some light on what could possibly be going on or point me to sections of
>> the code that could answer my questions? Also is there a way to speed
>> this process up? Like say trigger the expiration of the lease (assuming
>> it is a lease).
>>
>> Thanks,
>> Vinayak
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Ulul <ha...@ulul.org>.

Hi Vinayak

I find strange that the file should have a different size and the block 
a different name.
Are you sure your writing client wasn't interfering ?

Ulul

Le 07/10/2014 19:41, Vinayak Borkar a écrit :
> Trying again since I did not get a reply. Please let me know if I 
> should use a different forum to ask this question.
>
> Thanks,
> Vinayak
>
>
>
> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>> Hi,
>>
>>
>> I was experimenting with HDFS to push its boundaries on fault tolerance.
>> Here is what I observed.
>>
>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>> single DataNode. I started writing to a DFS file from a Java client
>> periodically calling hsync(). After some time, I powered off the machine
>> that was running this test (not shutdown, just abruptly powered off).
>>
>> When the system came back up, and HDFS processes were up and HDFS was
>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>> -files -blocks) options and here is the output:
>>
>>
>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
>> of total size 388970 B
>> 0.
>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION, 
>>
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]} 
>>
>> len=388970 MISSING!
>>
>> Status: CORRUPT
>>   Total size:    7214119 B
>>   Total dirs:    54
>>   Total files:    232
>>   Total symlinks:        0
>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>    ********************************
>>    CORRUPT FILES:    1
>>    MISSING BLOCKS:    1
>>    MISSING SIZE:        388970 B
>>    ********************************
>>   Minimally replicated blocks:    213 (99.53271 %)
>>   Over-replicated blocks:    0 (0.0 %)
>>   Under-replicated blocks:    213 (99.53271 %)
>>   Mis-replicated blocks:        0 (0.0 %)
>>   Default replication factor:    3
>>   Average block replication:    0.9953271
>>   Corrupt blocks:        0
>>   Missing replicas:        426 (66.35514 %)
>>   Number of data-nodes:        1
>>   Number of racks:        1
>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>
>>
>> I just let the system sit for some time and reran fsck (after about
>> 15-20 mins) and surprisingly the output was very different. The
>> corruption was magically gone:
>>
>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>> Replicas is 3 but found 1 replica(s).
>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>> len=1859584 repl=1
>>
>> Status: HEALTHY
>>   Total size:    8684733 B
>>   Total dirs:    54
>>   Total files:    232
>>   Total symlinks:        0
>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>   Minimally replicated blocks:    214 (100.0 %)
>>   Over-replicated blocks:    0 (0.0 %)
>>   Under-replicated blocks:    214 (100.0 %)
>>   Mis-replicated blocks:        0 (0.0 %)
>>   Default replication factor:    3
>>   Average block replication:    1.0
>>   Corrupt blocks:        0
>>   Missing replicas:        428 (66.666664 %)
>>   Number of data-nodes:        1
>>   Number of racks:        1
>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>
>>
>> The filesystem under path '/' is HEALTHY
>>
>>
>>
>> So my question is this: What just happened? How did the NameNode recover
>> that missing block and why did it take 15 mins or so? Is there some kind
>> of a lease on the file (because of the open nature) that expired after
>> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
>> some light on what could possibly be going on or point me to sections of
>> the code that could answer my questions? Also is there a way to speed
>> this process up? Like say trigger the expiration of the lease (assuming
>> it is a lease).
>>
>> Thanks,
>> Vinayak
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Ulul <ha...@ulul.org>.

Hi Vinayak

I find strange that the file should have a different size and the block 
a different name.
Are you sure your writing client wasn't interfering ?

Ulul

Le 07/10/2014 19:41, Vinayak Borkar a écrit :
> Trying again since I did not get a reply. Please let me know if I 
> should use a different forum to ask this question.
>
> Thanks,
> Vinayak
>
>
>
> On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
>> Hi,
>>
>>
>> I was experimenting with HDFS to push its boundaries on fault tolerance.
>> Here is what I observed.
>>
>> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
>> single DataNode. I started writing to a DFS file from a Java client
>> periodically calling hsync(). After some time, I powered off the machine
>> that was running this test (not shutdown, just abruptly powered off).
>>
>> When the system came back up, and HDFS processes were up and HDFS was
>> out of safe mode, I ran fsck on the DFS filesystem (with -openforwrite
>> -files -blocks) options and here is the output:
>>
>>
>> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
>> of total size 388970 B
>> 0.
>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION, 
>>
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]} 
>>
>> len=388970 MISSING!
>>
>> Status: CORRUPT
>>   Total size:    7214119 B
>>   Total dirs:    54
>>   Total files:    232
>>   Total symlinks:        0
>>   Total blocks (validated):    214 (avg. block size 33710 B)
>>    ********************************
>>    CORRUPT FILES:    1
>>    MISSING BLOCKS:    1
>>    MISSING SIZE:        388970 B
>>    ********************************
>>   Minimally replicated blocks:    213 (99.53271 %)
>>   Over-replicated blocks:    0 (0.0 %)
>>   Under-replicated blocks:    213 (99.53271 %)
>>   Mis-replicated blocks:        0 (0.0 %)
>>   Default replication factor:    3
>>   Average block replication:    0.9953271
>>   Corrupt blocks:        0
>>   Missing replicas:        426 (66.35514 %)
>>   Number of data-nodes:        1
>>   Number of racks:        1
>> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>>
>>
>> I just let the system sit for some time and reran fsck (after about
>> 15-20 mins) and surprisingly the output was very different. The
>> corruption was magically gone:
>>
>> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
>> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
>> Replicas is 3 but found 1 replica(s).
>> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
>> len=1859584 repl=1
>>
>> Status: HEALTHY
>>   Total size:    8684733 B
>>   Total dirs:    54
>>   Total files:    232
>>   Total symlinks:        0
>>   Total blocks (validated):    214 (avg. block size 40582 B)
>>   Minimally replicated blocks:    214 (100.0 %)
>>   Over-replicated blocks:    0 (0.0 %)
>>   Under-replicated blocks:    214 (100.0 %)
>>   Mis-replicated blocks:        0 (0.0 %)
>>   Default replication factor:    3
>>   Average block replication:    1.0
>>   Corrupt blocks:        0
>>   Missing replicas:        428 (66.666664 %)
>>   Number of data-nodes:        1
>>   Number of racks:        1
>> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>>
>>
>> The filesystem under path '/' is HEALTHY
>>
>>
>>
>> So my question is this: What just happened? How did the NameNode recover
>> that missing block and why did it take 15 mins or so? Is there some kind
>> of a lease on the file (because of the open nature) that expired after
>> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
>> some light on what could possibly be going on or point me to sections of
>> the code that could answer my questions? Also is there a way to speed
>> this process up? Like say trigger the expiration of the lease (assuming
>> it is a lease).
>>
>> Thanks,
>> Vinayak
>

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Trying again since I did not get a reply. Please let me know if I should 
use a different forum to ask this question.

Thanks,
Vinayak



On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
> Hi,
>
>
> I was experimenting with HDFS to push its boundaries on fault tolerance.
> Here is what I observed.
>
> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
> single DataNode. I started writing to a DFS file from a Java client
> periodically calling hsync(). After some time, I powered off the machine
> that was running this test (not shutdown, just abruptly powered off).
>
> When the system came back up, and HDFS processes were up and HDFS was
> out of safe mode, I ran fsck on the DFS filesystem (with  -openforwrite
> -files -blocks) options and here is the output:
>
>
> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
> of total size 388970 B
> 0.
> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
> len=388970 MISSING!
>
> Status: CORRUPT
>   Total size:    7214119 B
>   Total dirs:    54
>   Total files:    232
>   Total symlinks:        0
>   Total blocks (validated):    214 (avg. block size 33710 B)
>    ********************************
>    CORRUPT FILES:    1
>    MISSING BLOCKS:    1
>    MISSING SIZE:        388970 B
>    ********************************
>   Minimally replicated blocks:    213 (99.53271 %)
>   Over-replicated blocks:    0 (0.0 %)
>   Under-replicated blocks:    213 (99.53271 %)
>   Mis-replicated blocks:        0 (0.0 %)
>   Default replication factor:    3
>   Average block replication:    0.9953271
>   Corrupt blocks:        0
>   Missing replicas:        426 (66.35514 %)
>   Number of data-nodes:        1
>   Number of racks:        1
> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>
>
> I just let the system sit for some time and reran fsck (after about
> 15-20 mins) and surprisingly the output was very different. The
> corruption was magically gone:
>
> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
> Replicas is 3 but found 1 replica(s).
> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
> len=1859584 repl=1
>
> Status: HEALTHY
>   Total size:    8684733 B
>   Total dirs:    54
>   Total files:    232
>   Total symlinks:        0
>   Total blocks (validated):    214 (avg. block size 40582 B)
>   Minimally replicated blocks:    214 (100.0 %)
>   Over-replicated blocks:    0 (0.0 %)
>   Under-replicated blocks:    214 (100.0 %)
>   Mis-replicated blocks:        0 (0.0 %)
>   Default replication factor:    3
>   Average block replication:    1.0
>   Corrupt blocks:        0
>   Missing replicas:        428 (66.666664 %)
>   Number of data-nodes:        1
>   Number of racks:        1
> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>
>
> The filesystem under path '/' is HEALTHY
>
>
>
> So my question is this: What just happened? How did the NameNode recover
> that missing block and why did it take 15 mins or so? Is there some kind
> of a lease on the file (because of the open nature) that expired after
> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
> some light on what could possibly be going on or point me to sections of
> the code that could answer my questions? Also is there a way to speed
> this process up? Like say trigger the expiration of the lease (assuming
> it is a lease).
>
> Thanks,
> Vinayak

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Trying again since I did not get a reply. Please let me know if I should 
use a different forum to ask this question.

Thanks,
Vinayak



On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
> Hi,
>
>
> I was experimenting with HDFS to push its boundaries on fault tolerance.
> Here is what I observed.
>
> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
> single DataNode. I started writing to a DFS file from a Java client
> periodically calling hsync(). After some time, I powered off the machine
> that was running this test (not shutdown, just abruptly powered off).
>
> When the system came back up, and HDFS processes were up and HDFS was
> out of safe mode, I ran fsck on the DFS filesystem (with  -openforwrite
> -files -blocks) options and here is the output:
>
>
> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
> of total size 388970 B
> 0.
> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
> len=388970 MISSING!
>
> Status: CORRUPT
>   Total size:    7214119 B
>   Total dirs:    54
>   Total files:    232
>   Total symlinks:        0
>   Total blocks (validated):    214 (avg. block size 33710 B)
>    ********************************
>    CORRUPT FILES:    1
>    MISSING BLOCKS:    1
>    MISSING SIZE:        388970 B
>    ********************************
>   Minimally replicated blocks:    213 (99.53271 %)
>   Over-replicated blocks:    0 (0.0 %)
>   Under-replicated blocks:    213 (99.53271 %)
>   Mis-replicated blocks:        0 (0.0 %)
>   Default replication factor:    3
>   Average block replication:    0.9953271
>   Corrupt blocks:        0
>   Missing replicas:        426 (66.35514 %)
>   Number of data-nodes:        1
>   Number of racks:        1
> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>
>
> I just let the system sit for some time and reran fsck (after about
> 15-20 mins) and surprisingly the output was very different. The
> corruption was magically gone:
>
> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
> Replicas is 3 but found 1 replica(s).
> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
> len=1859584 repl=1
>
> Status: HEALTHY
>   Total size:    8684733 B
>   Total dirs:    54
>   Total files:    232
>   Total symlinks:        0
>   Total blocks (validated):    214 (avg. block size 40582 B)
>   Minimally replicated blocks:    214 (100.0 %)
>   Over-replicated blocks:    0 (0.0 %)
>   Under-replicated blocks:    214 (100.0 %)
>   Mis-replicated blocks:        0 (0.0 %)
>   Default replication factor:    3
>   Average block replication:    1.0
>   Corrupt blocks:        0
>   Missing replicas:        428 (66.666664 %)
>   Number of data-nodes:        1
>   Number of racks:        1
> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>
>
> The filesystem under path '/' is HEALTHY
>
>
>
> So my question is this: What just happened? How did the NameNode recover
> that missing block and why did it take 15 mins or so? Is there some kind
> of a lease on the file (because of the open nature) that expired after
> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
> some light on what could possibly be going on or point me to sections of
> the code that could answer my questions? Also is there a way to speed
> this process up? Like say trigger the expiration of the lease (assuming
> it is a lease).
>
> Thanks,
> Vinayak

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Trying again since I did not get a reply. Please let me know if I should 
use a different forum to ask this question.

Thanks,
Vinayak



On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
> Hi,
>
>
> I was experimenting with HDFS to push its boundaries on fault tolerance.
> Here is what I observed.
>
> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
> single DataNode. I started writing to a DFS file from a Java client
> periodically calling hsync(). After some time, I powered off the machine
> that was running this test (not shutdown, just abruptly powered off).
>
> When the system came back up, and HDFS processes were up and HDFS was
> out of safe mode, I ran fsck on the DFS filesystem (with  -openforwrite
> -files -blocks) options and here is the output:
>
>
> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
> of total size 388970 B
> 0.
> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
> len=388970 MISSING!
>
> Status: CORRUPT
>   Total size:    7214119 B
>   Total dirs:    54
>   Total files:    232
>   Total symlinks:        0
>   Total blocks (validated):    214 (avg. block size 33710 B)
>    ********************************
>    CORRUPT FILES:    1
>    MISSING BLOCKS:    1
>    MISSING SIZE:        388970 B
>    ********************************
>   Minimally replicated blocks:    213 (99.53271 %)
>   Over-replicated blocks:    0 (0.0 %)
>   Under-replicated blocks:    213 (99.53271 %)
>   Mis-replicated blocks:        0 (0.0 %)
>   Default replication factor:    3
>   Average block replication:    0.9953271
>   Corrupt blocks:        0
>   Missing replicas:        426 (66.35514 %)
>   Number of data-nodes:        1
>   Number of racks:        1
> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>
>
> I just let the system sit for some time and reran fsck (after about
> 15-20 mins) and surprisingly the output was very different. The
> corruption was magically gone:
>
> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
> Replicas is 3 but found 1 replica(s).
> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
> len=1859584 repl=1
>
> Status: HEALTHY
>   Total size:    8684733 B
>   Total dirs:    54
>   Total files:    232
>   Total symlinks:        0
>   Total blocks (validated):    214 (avg. block size 40582 B)
>   Minimally replicated blocks:    214 (100.0 %)
>   Over-replicated blocks:    0 (0.0 %)
>   Under-replicated blocks:    214 (100.0 %)
>   Mis-replicated blocks:        0 (0.0 %)
>   Default replication factor:    3
>   Average block replication:    1.0
>   Corrupt blocks:        0
>   Missing replicas:        428 (66.666664 %)
>   Number of data-nodes:        1
>   Number of racks:        1
> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>
>
> The filesystem under path '/' is HEALTHY
>
>
>
> So my question is this: What just happened? How did the NameNode recover
> that missing block and why did it take 15 mins or so? Is there some kind
> of a lease on the file (because of the open nature) that expired after
> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
> some light on what could possibly be going on or point me to sections of
> the code that could answer my questions? Also is there a way to speed
> this process up? Like say trigger the expiration of the lease (assuming
> it is a lease).
>
> Thanks,
> Vinayak

Re: HDFS openforwrite CORRUPT -> HEALTHY

Posted by Vinayak Borkar <vi...@gmail.com>.

Trying again since I did not get a reply. Please let me know if I should 
use a different forum to ask this question.

Thanks,
Vinayak



On 10/4/14, 8:45 PM, Vinayak Borkar wrote:
> Hi,
>
>
> I was experimenting with HDFS to push its boundaries on fault tolerance.
> Here is what I observed.
>
> I am using HDFS from Hadoop 2.2. I started the NameNode and then a
> single DataNode. I started writing to a DFS file from a Java client
> periodically calling hsync(). After some time, I powered off the machine
> that was running this test (not shutdown, just abruptly powered off).
>
> When the system came back up, and HDFS processes were up and HDFS was
> out of safe mode, I ran fsck on the DFS filesystem (with  -openforwrite
> -files -blocks) options and here is the output:
>
>
> /test/test.log 388970 bytes, 1 block(s), OPENFORWRITE:  MISSING 1 blocks
> of total size 388970 B
> 0.
> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2420{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[[DISK]DS-e5bed5ae-1fa9-45ed-8d4c-8006919b4d9c:NORMAL|RWR]]}
> len=388970 MISSING!
>
> Status: CORRUPT
>   Total size:    7214119 B
>   Total dirs:    54
>   Total files:    232
>   Total symlinks:        0
>   Total blocks (validated):    214 (avg. block size 33710 B)
>    ********************************
>    CORRUPT FILES:    1
>    MISSING BLOCKS:    1
>    MISSING SIZE:        388970 B
>    ********************************
>   Minimally replicated blocks:    213 (99.53271 %)
>   Over-replicated blocks:    0 (0.0 %)
>   Under-replicated blocks:    213 (99.53271 %)
>   Mis-replicated blocks:        0 (0.0 %)
>   Default replication factor:    3
>   Average block replication:    0.9953271
>   Corrupt blocks:        0
>   Missing replicas:        426 (66.35514 %)
>   Number of data-nodes:        1
>   Number of racks:        1
> FSCK ended at Sat Oct 04 23:09:40 EDT 2014 in 47 milliseconds
>
>
> I just let the system sit for some time and reran fsck (after about
> 15-20 mins) and surprisingly the output was very different. The
> corruption was magically gone:
>
> /test/test.log 1859584 bytes, 1 block(s):  Under replicated
> BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421. Target
> Replicas is 3 but found 1 replica(s).
> 0. BP-1471648347-10.211.55.100-1412458980748:blk_1073743243_2421
> len=1859584 repl=1
>
> Status: HEALTHY
>   Total size:    8684733 B
>   Total dirs:    54
>   Total files:    232
>   Total symlinks:        0
>   Total blocks (validated):    214 (avg. block size 40582 B)
>   Minimally replicated blocks:    214 (100.0 %)
>   Over-replicated blocks:    0 (0.0 %)
>   Under-replicated blocks:    214 (100.0 %)
>   Mis-replicated blocks:        0 (0.0 %)
>   Default replication factor:    3
>   Average block replication:    1.0
>   Corrupt blocks:        0
>   Missing replicas:        428 (66.666664 %)
>   Number of data-nodes:        1
>   Number of racks:        1
> FSCK ended at Sat Oct 04 23:24:23 EDT 2014 in 63 milliseconds
>
>
> The filesystem under path '/' is HEALTHY
>
>
>
> So my question is this: What just happened? How did the NameNode recover
> that missing block and why did it take 15 mins or so? Is there some kind
> of a lease on the file (because of the open nature) that expired after
> the 15-20 mins? Can someone with knowledge of HDFS internals please shed
> some light on what could possibly be going on or point me to sections of
> the code that could answer my questions? Also is there a way to speed
> this process up? Like say trigger the expiration of the lease (assuming
> it is a lease).
>
> Thanks,
> Vinayak