You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Amandeep Khurana (JIRA)" <ji...@apache.org> on 2009/07/19 08:27:14 UTC

[jira] Created: (HBASE-1672) Map tasks not local to RS

Map tasks not local to RS
-------------------------

                 Key: HBASE-1672
                 URL: https://issues.apache.org/jira/browse/HBASE-1672
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.19.3
         Environment: DN, TT and RS running on the same nodes.
            Reporter: Amandeep Khurana


The number of data local map tasks is only about 10% of the total map tasks...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1672) Map tasks not local to RS

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1672.
--------------------------

    Resolution: Cannot Reproduce

I ran a rowcounter job against a 100 region table of ~20M rows.  Cluster was small (4 regionservers).  Tasktrackers ran beside the RS.  Every task was scheduled on the TT that was local to the RS ("Input Split Locations" always had same value as "Machine" in the taskdetails page).

> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.20.0, 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0, 0.19.4
>
>
> The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1672) Map tasks not local to RS

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733774#action_12733774 ] 

stack commented on HBASE-1672:
------------------------------

So, what is the indicator in the MR UI measuring?  TT+DN locality?  Or is it TT+RS?  If the latter, and we are only 10% of the time doing TT mapper local to the region hosting server, then our TT+RS locality would seem to be broke -- or ineffective (either would be good to know).

> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.20.0, 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0, 0.19.4
>
>
> The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1672) Map tasks not local to RS

Posted by "Amandeep Khurana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735302#action_12735302 ] 

Amandeep Khurana commented on HBASE-1672:
-----------------------------------------

I had this issue in 0.19. Not facing the problem in 0.20 though.

> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.20.0, 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0, 0.19.4
>
>
> The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1672) Map tasks not local to RS

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1672:
---------------------------------

    Affects Version/s: 0.20.0
        Fix Version/s: 0.19.4
                       0.20.0

Bringing in to 0.20.0 so someone can verify whether this works in trunk or not.  I can do it later this week if no one else does.

> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.20.0, 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0, 0.19.4
>
>
> The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1672) Map tasks not local to RS

Posted by "Amandeep Khurana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amandeep Khurana updated HBASE-1672:
------------------------------------

    Component/s: regionserver
                 master
                 mapred

> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>
> The number of data local map tasks is only about 10% of the total map tasks...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1672) Map tasks not local to RS

Posted by "Amandeep Khurana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amandeep Khurana updated HBASE-1672:
------------------------------------

    Description: 
The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

  was:The number of data local map tasks is only about 10% of the total map tasks...


> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>
> The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1672) Map tasks not local to RS

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735207#action_12735207 ] 

Jonathan Gray commented on HBASE-1672:
--------------------------------------

Thank you for researching, stack.

Next week we'll have a ton of MR running on trunk so will report if we find anything strange.

> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.20.0, 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0, 0.19.4
>
>
> The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1672) Map tasks not local to RS

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733651#action_12733651 ] 

Jean-Daniel Cryans commented on HBASE-1672:
-------------------------------------------

We already do this inside TableInputFormatBase:

{code}
String regionLocation = table.getRegionLocation(startKeys[startPos]).
  getServerAddress().getHostname(); 
splits[i] = new TableSplit(this.table.getTableName(),
  startKeys[startPos], ((i + 1) < realNumSplits) ? startKeys[lastPos]:
  HConstants.EMPTY_START_ROW, regionLocation);
LOG.info("split: " + i + "->" + splits[i]);
{code}

I don't know if we can do anything more than that. One difference in HBase compared to mapred on HDFS is that a region is only on one node, not 3 which is the default replication factor. So being able to get the right map task on the right RS at the right moment may be difficult for the JobTracker.

> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.20.0, 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0, 0.19.4
>
>
> The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1672) Map tasks not local to RS

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733256#action_12733256 ] 

Jonathan Gray commented on HBASE-1672:
--------------------------------------

This needs to be tested on trunk, thought we had fixed this.

> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.20.0, 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0, 0.19.4
>
>
> The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.