You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by java8964 <ja...@hotmail.com> on 2014/02/24 19:59:45 UTC

Why my MR job running on HBase all are Rack-local map tasks

Hi, 
I have a 10 nodes cluster with 8 of them are datanode/tasknode/HbaseRegionNode.
I have a HBase table with one column family and 1.5T data, spread across 55 regions on these 8 region servers. When I run a testing scan MR job, it will generate 55 mapper tasks, (Matching with 55 regions), but all of them are rack-local map tasks (Not a single data-local map tasks). The cluster is being running for weeks. I did a major compact before the MR job. I run the MR job for several times, and all I got are 55 rack-local map tasks, not a single data local map tasks. I think something is wrong with my cluster/hbase setting, but not sure why.
All 8 child boxes are running datanode, tasknode and hbase region servers. All 10 boxes are in one rack.
Here is what I observed some difference:
In the MR job running a Hbase table, here is one example:
Task AttemptsMachineStatusProgressStart TimeFinish TimeErrorsTask LogsCountersActionsattempt_201402131137_0469_m_000000_0/default-rack/10.xx.xx.xxSUCCEEDED100.00%24-Feb-2014 09:58:2324-Feb-2014 10:31:41 (33mins, 18sec)Last 4KB
Last 8KB
All
13 Input Split Locations/default-rack/real_hostname.


As you can see, in the input split, it shows the real HOSTNAME of of the box, and in the Task attempts, the machine information is the real IP of the machine running the task, which is NOT the same as the InputSplit Location.
On the other hand, if I running a MR job of the HDFS files in this cluster, I will get 30 of 32 mappers are data local tasks. Here is the output:
All Task AttemptsTask AttemptsMachineStatusProgressStart TimeFinish TimeErrorsTask LogsCountersActionsattempt_201402131137_0467_m_000000_0/default-rack/10.xx.xx.133SUCCEEDED100.00%24-Feb-2014 09:49:5824-Feb-2014 09:50:29 (30sec)Last 4KB
Last 8KB
All
20 Input Split Locations/default-rack/10.xx.xx.133/default-rack/10.xx.xx.135/default-rack/10.xx.xx.140


What difference I saw here is that the InputSplit Location in MR job on HDFS file are shown as real IP address, instead of host name as in Hbase. Could it be the reason I got 0 data local map tasks in Hbase MR job? If not, what could be?
Thanks

Yong