You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Stanley Xu <we...@gmail.com> on 2011/05/16 06:10:07 UTC

How does the region server know if a block is moved from one datanode to another?

Dear all,

We were tracing a issue we have with our hbase cluster. We are almost sure
it is a network issue since the problem seems disappeared after we disabled
the ip_forward on all the machines and configured the route to the same
configuration. But we didn't really know how these configuration might
impact the cluster.

The problem we have met could be found by the following link:
http://search-hadoop.com/m/ZpgJ623GoyU1/.META.+inconsistency&subj=The+META+data+inconsistency+issue
(The title is not proper for the issue in fact.)

And by tracing the logs from region server, data node and name node, I also
found something with doubt after we thought the issue is fixed and before
the issue appeared.

In a region server, I could still find some logs that the RegionServer tried
to get a block from a data node, which is no longer served by the data node.

I see the following log in region server for block 5056551999889621449
http://pastebin.com/epEt37JK

And following log in the data node the region server try to get the block.
http://pastebin.com/pnif75rX

And following log in the name node which let the data node to delete the
block.
http://pastebin.com/rQ4QjUcS

And if I use fsck to check the file on hdfs, it has 4 replications, which
also contains the data node that should have deleted the block.
http://pastebin.com/2DecD9GD

But if I check the data node's local file system, I could see that the block
no longer exist in the local fs.

But after 6-7 hours, when I re-run fsck, the data node which should delete
the block no longer exist.
http://pastebin.com/014h3qNE

I am wondering if is it a correct behavior for hadoop and hbase? I am using
hadoop branch-0.20-append and hbase 0.20.6

I am wondering except reading all the code, if there is a document or
tutorial describe how the hadoop and hbase get the data synchronized in a
more detail level comparing to hbase book or official document?

Best wishes,
Stanley Xu