You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by sam liu <sa...@gmail.com> on 2014/11/13 06:43:16 UTC

Both hadoop fsck and dfsadmin can not detect missing replica in time?

Hi Experts,

In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with 2
replica. The block id is blk_1073742304_1480 and the 2 replica resides on
datanode1 and datanode2.

Today I manually removed the block file on datanode2:
./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
makes sense as I already removed one replica from datanod2.

However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
Even after waiting several minutes(I think datanode will send heartbeats to
namenode to report the recent status), the fsck/dfsadmin tools still did
not find the replica missing. Why?

Thanks!

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by Serge Blazhievsky <ha...@gmail.com>.

It might take sometime for hadoop to realize that blocks are missing. If
you restart the cluster, does it detect that blocks are missing?

On Thu, Nov 13, 2014 at 9:55 PM, sam liu <sa...@gmail.com> wrote:

> I manually removed the block replica file on datanode and the removed file
> path is '${dfs.datanode.data.dir}/current/BP-1640683473-9.181.
> 64.230-1415757100604/current/finalized/subdir52/blk_1073742304'.
>
> 2014-11-14 11:15 GMT+08:00 daemeon reiydelle <da...@gmail.com>:
>
>> Exactly HOW did you manually remove the block?
>>
>> sent from my mobile
>> Daemeon C.M. Reiydelle
>> USA 415.501.0198
>> London +44.0.20.8144.9872
>> On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:
>>
>>> Hi Experts,
>>>
>>> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block
>>> with 2 replica. The block id is blk_1073742304_1480 and the 2 replica
>>> resides on datanode1 and datanode2.
>>>
>>> Today I manually removed the block file on datanode2:
>>> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
>>> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
>>> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
>>> makes sense as I already removed one replica from datanod2.
>>>
>>> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
>>> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
>>> Even after waiting several minutes(I think datanode will send heartbeats to
>>> namenode to report the recent status), the fsck/dfsadmin tools still did
>>> not find the replica missing. Why?
>>>
>>> Thanks!
>>>
>>
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by Serge Blazhievsky <ha...@gmail.com>.

It might take sometime for hadoop to realize that blocks are missing. If
you restart the cluster, does it detect that blocks are missing?

On Thu, Nov 13, 2014 at 9:55 PM, sam liu <sa...@gmail.com> wrote:

> I manually removed the block replica file on datanode and the removed file
> path is '${dfs.datanode.data.dir}/current/BP-1640683473-9.181.
> 64.230-1415757100604/current/finalized/subdir52/blk_1073742304'.
>
> 2014-11-14 11:15 GMT+08:00 daemeon reiydelle <da...@gmail.com>:
>
>> Exactly HOW did you manually remove the block?
>>
>> sent from my mobile
>> Daemeon C.M. Reiydelle
>> USA 415.501.0198
>> London +44.0.20.8144.9872
>> On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:
>>
>>> Hi Experts,
>>>
>>> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block
>>> with 2 replica. The block id is blk_1073742304_1480 and the 2 replica
>>> resides on datanode1 and datanode2.
>>>
>>> Today I manually removed the block file on datanode2:
>>> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
>>> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
>>> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
>>> makes sense as I already removed one replica from datanod2.
>>>
>>> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
>>> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
>>> Even after waiting several minutes(I think datanode will send heartbeats to
>>> namenode to report the recent status), the fsck/dfsadmin tools still did
>>> not find the replica missing. Why?
>>>
>>> Thanks!
>>>
>>
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by Serge Blazhievsky <ha...@gmail.com>.

It might take sometime for hadoop to realize that blocks are missing. If
you restart the cluster, does it detect that blocks are missing?

On Thu, Nov 13, 2014 at 9:55 PM, sam liu <sa...@gmail.com> wrote:

> I manually removed the block replica file on datanode and the removed file
> path is '${dfs.datanode.data.dir}/current/BP-1640683473-9.181.
> 64.230-1415757100604/current/finalized/subdir52/blk_1073742304'.
>
> 2014-11-14 11:15 GMT+08:00 daemeon reiydelle <da...@gmail.com>:
>
>> Exactly HOW did you manually remove the block?
>>
>> sent from my mobile
>> Daemeon C.M. Reiydelle
>> USA 415.501.0198
>> London +44.0.20.8144.9872
>> On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:
>>
>>> Hi Experts,
>>>
>>> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block
>>> with 2 replica. The block id is blk_1073742304_1480 and the 2 replica
>>> resides on datanode1 and datanode2.
>>>
>>> Today I manually removed the block file on datanode2:
>>> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
>>> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
>>> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
>>> makes sense as I already removed one replica from datanod2.
>>>
>>> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
>>> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
>>> Even after waiting several minutes(I think datanode will send heartbeats to
>>> namenode to report the recent status), the fsck/dfsadmin tools still did
>>> not find the replica missing. Why?
>>>
>>> Thanks!
>>>
>>
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by Serge Blazhievsky <ha...@gmail.com>.

It might take sometime for hadoop to realize that blocks are missing. If
you restart the cluster, does it detect that blocks are missing?

On Thu, Nov 13, 2014 at 9:55 PM, sam liu <sa...@gmail.com> wrote:

> I manually removed the block replica file on datanode and the removed file
> path is '${dfs.datanode.data.dir}/current/BP-1640683473-9.181.
> 64.230-1415757100604/current/finalized/subdir52/blk_1073742304'.
>
> 2014-11-14 11:15 GMT+08:00 daemeon reiydelle <da...@gmail.com>:
>
>> Exactly HOW did you manually remove the block?
>>
>> sent from my mobile
>> Daemeon C.M. Reiydelle
>> USA 415.501.0198
>> London +44.0.20.8144.9872
>> On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:
>>
>>> Hi Experts,
>>>
>>> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block
>>> with 2 replica. The block id is blk_1073742304_1480 and the 2 replica
>>> resides on datanode1 and datanode2.
>>>
>>> Today I manually removed the block file on datanode2:
>>> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
>>> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
>>> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
>>> makes sense as I already removed one replica from datanod2.
>>>
>>> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
>>> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
>>> Even after waiting several minutes(I think datanode will send heartbeats to
>>> namenode to report the recent status), the fsck/dfsadmin tools still did
>>> not find the replica missing. Why?
>>>
>>> Thanks!
>>>
>>
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by sam liu <sa...@gmail.com>.

I manually removed the block replica file on datanode and the removed file
path is '${dfs.datanode.data.dir}/current/BP-1640683473-9.181.
64.230-1415757100604/current/finalized/subdir52/blk_1073742304'.

2014-11-14 11:15 GMT+08:00 daemeon reiydelle <da...@gmail.com>:

> Exactly HOW did you manually remove the block?
>
> sent from my mobile
> Daemeon C.M. Reiydelle
> USA 415.501.0198
> London +44.0.20.8144.9872
> On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:
>
>> Hi Experts,
>>
>> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
>> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
>> datanode1 and datanode2.
>>
>> Today I manually removed the block file on datanode2:
>> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
>> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
>> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
>> makes sense as I already removed one replica from datanod2.
>>
>> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
>> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
>> Even after waiting several minutes(I think datanode will send heartbeats to
>> namenode to report the recent status), the fsck/dfsadmin tools still did
>> not find the replica missing. Why?
>>
>> Thanks!
>>
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by sam liu <sa...@gmail.com>.

I manually removed the block replica file on datanode and the removed file
path is '${dfs.datanode.data.dir}/current/BP-1640683473-9.181.
64.230-1415757100604/current/finalized/subdir52/blk_1073742304'.

2014-11-14 11:15 GMT+08:00 daemeon reiydelle <da...@gmail.com>:

> Exactly HOW did you manually remove the block?
>
> sent from my mobile
> Daemeon C.M. Reiydelle
> USA 415.501.0198
> London +44.0.20.8144.9872
> On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:
>
>> Hi Experts,
>>
>> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
>> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
>> datanode1 and datanode2.
>>
>> Today I manually removed the block file on datanode2:
>> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
>> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
>> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
>> makes sense as I already removed one replica from datanod2.
>>
>> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
>> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
>> Even after waiting several minutes(I think datanode will send heartbeats to
>> namenode to report the recent status), the fsck/dfsadmin tools still did
>> not find the replica missing. Why?
>>
>> Thanks!
>>
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by sam liu <sa...@gmail.com>.

I manually removed the block replica file on datanode and the removed file
path is '${dfs.datanode.data.dir}/current/BP-1640683473-9.181.
64.230-1415757100604/current/finalized/subdir52/blk_1073742304'.

2014-11-14 11:15 GMT+08:00 daemeon reiydelle <da...@gmail.com>:

> Exactly HOW did you manually remove the block?
>
> sent from my mobile
> Daemeon C.M. Reiydelle
> USA 415.501.0198
> London +44.0.20.8144.9872
> On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:
>
>> Hi Experts,
>>
>> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
>> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
>> datanode1 and datanode2.
>>
>> Today I manually removed the block file on datanode2:
>> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
>> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
>> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
>> makes sense as I already removed one replica from datanod2.
>>
>> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
>> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
>> Even after waiting several minutes(I think datanode will send heartbeats to
>> namenode to report the recent status), the fsck/dfsadmin tools still did
>> not find the replica missing. Why?
>>
>> Thanks!
>>
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by sam liu <sa...@gmail.com>.

I manually removed the block replica file on datanode and the removed file
path is '${dfs.datanode.data.dir}/current/BP-1640683473-9.181.
64.230-1415757100604/current/finalized/subdir52/blk_1073742304'.

2014-11-14 11:15 GMT+08:00 daemeon reiydelle <da...@gmail.com>:

> Exactly HOW did you manually remove the block?
>
> sent from my mobile
> Daemeon C.M. Reiydelle
> USA 415.501.0198
> London +44.0.20.8144.9872
> On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:
>
>> Hi Experts,
>>
>> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
>> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
>> datanode1 and datanode2.
>>
>> Today I manually removed the block file on datanode2:
>> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
>> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
>> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
>> makes sense as I already removed one replica from datanod2.
>>
>> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
>> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
>> Even after waiting several minutes(I think datanode will send heartbeats to
>> namenode to report the recent status), the fsck/dfsadmin tools still did
>> not find the replica missing. Why?
>>
>> Thanks!
>>
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by daemeon reiydelle <da...@gmail.com>.

Exactly HOW did you manually remove the block?

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:

> Hi Experts,
>
> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
> datanode1 and datanode2.
>
> Today I manually removed the block file on datanode2:
> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
> makes sense as I already removed one replica from datanod2.
>
> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
> Even after waiting several minutes(I think datanode will send heartbeats to
> namenode to report the recent status), the fsck/dfsadmin tools still did
> not find the replica missing. Why?
>
> Thanks!
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by Abirami V <ab...@gmail.com>.

Hi,

When hadoop finds a missing replica of a block it will replicate that block
in some other node (it need at least 1 good block to replicate that you
have it in data node 1).  So you can not see that in fsck.  If you want to
test that, fsck should let you know the missing replica. Delete from both
datanode block. That time you will see it in fsck.

Thanks
Abirami

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by Abirami V <ab...@gmail.com>.

Hi,

When hadoop finds a missing replica of a block it will replicate that block
in some other node (it need at least 1 good block to replicate that you
have it in data node 1).  So you can not see that in fsck.  If you want to
test that, fsck should let you know the missing replica. Delete from both
datanode block. That time you will see it in fsck.

Thanks
Abirami

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by Abirami V <ab...@gmail.com>.

Hi,

When hadoop finds a missing replica of a block it will replicate that block
in some other node (it need at least 1 good block to replicate that you
have it in data node 1).  So you can not see that in fsck.  If you want to
test that, fsck should let you know the missing replica. Delete from both
datanode block. That time you will see it in fsck.

Thanks
Abirami

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by Abirami V <ab...@gmail.com>.

Hi,

When hadoop finds a missing replica of a block it will replicate that block
in some other node (it need at least 1 good block to replicate that you
have it in data node 1).  So you can not see that in fsck.  If you want to
test that, fsck should let you know the missing replica. Delete from both
datanode block. That time you will see it in fsck.

Thanks
Abirami

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by sam liu <sa...@gmail.com>.

Is that a bug of hadoop fsck or dfsadmin? As they really did not defect the
missing replica data on a datanode.

2014-11-13 13:43 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
> datanode1 and datanode2.
>
> Today I manually removed the block file on datanode2:
> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
> makes sense as I already removed one replica from datanod2.
>
> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
> Even after waiting several minutes(I think datanode will send heartbeats to
> namenode to report the recent status), the fsck/dfsadmin tools still did
> not find the replica missing. Why?
>
> Thanks!
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by sam liu <sa...@gmail.com>.

Is that a bug of hadoop fsck or dfsadmin? As they really did not defect the
missing replica data on a datanode.

2014-11-13 13:43 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
> datanode1 and datanode2.
>
> Today I manually removed the block file on datanode2:
> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
> makes sense as I already removed one replica from datanod2.
>
> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
> Even after waiting several minutes(I think datanode will send heartbeats to
> namenode to report the recent status), the fsck/dfsadmin tools still did
> not find the replica missing. Why?
>
> Thanks!
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by daemeon reiydelle <da...@gmail.com>.

Exactly HOW did you manually remove the block?

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:

> Hi Experts,
>
> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
> datanode1 and datanode2.
>
> Today I manually removed the block file on datanode2:
> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
> makes sense as I already removed one replica from datanod2.
>
> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
> Even after waiting several minutes(I think datanode will send heartbeats to
> namenode to report the recent status), the fsck/dfsadmin tools still did
> not find the replica missing. Why?
>
> Thanks!
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by sam liu <sa...@gmail.com>.

Is that a bug of hadoop fsck or dfsadmin? As they really did not defect the
missing replica data on a datanode.

2014-11-13 13:43 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
> datanode1 and datanode2.
>
> Today I manually removed the block file on datanode2:
> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
> makes sense as I already removed one replica from datanod2.
>
> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
> Even after waiting several minutes(I think datanode will send heartbeats to
> namenode to report the recent status), the fsck/dfsadmin tools still did
> not find the replica missing. Why?
>
> Thanks!
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by sam liu <sa...@gmail.com>.

Is that a bug of hadoop fsck or dfsadmin? As they really did not defect the
missing replica data on a datanode.

2014-11-13 13:43 GMT+08:00 sam liu <sa...@gmail.com>:

> Hi Experts,
>
> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
> datanode1 and datanode2.
>
> Today I manually removed the block file on datanode2:
> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
> makes sense as I already removed one replica from datanod2.
>
> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
> Even after waiting several minutes(I think datanode will send heartbeats to
> namenode to report the recent status), the fsck/dfsadmin tools still did
> not find the replica missing. Why?
>
> Thanks!
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by daemeon reiydelle <da...@gmail.com>.

Exactly HOW did you manually remove the block?

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:

> Hi Experts,
>
> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
> datanode1 and datanode2.
>
> Today I manually removed the block file on datanode2:
> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
> makes sense as I already removed one replica from datanod2.
>
> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
> Even after waiting several minutes(I think datanode will send heartbeats to
> namenode to report the recent status), the fsck/dfsadmin tools still did
> not find the replica missing. Why?
>
> Thanks!
>

Re: Both hadoop fsck and dfsadmin can not detect missing replica in time?

Posted by daemeon reiydelle <da...@gmail.com>.

Exactly HOW did you manually remove the block?

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Nov 12, 2014 9:45 PM, "sam liu" <sa...@gmail.com> wrote:

> Hi Experts,
>
> In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with
> 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on
> datanode1 and datanode2.
>
> Today I manually removed the block file on datanode2:
> ./current/BP-1640683473-9.181.64.230-1415757100604/current/finalized/subdir52/blk_1073742304.
> And then, I failed to read hdfs /tmp/test.txt file from datanode2, and
> encountered an exception: "IOException: Got error for OP_READ_BLOCK...". It
> makes sense as I already removed one replica from datanod2.
>
> However, both 'hadoop fsck /tmp/test.txt -files -blocks -locations' and
> 'hadoop dfsadmin -report' say hdfs is healthy and no replica is missed.
> Even after waiting several minutes(I think datanode will send heartbeats to
> namenode to report the recent status), the fsck/dfsadmin tools still did
> not find the replica missing. Why?
>
> Thanks!
>