You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hong Tang (JIRA)" <ji...@apache.org> on 2010/07/07 09:55:50 UTC

[jira] Commented: (MAPREDUCE-1922) Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885874#action_12885874 ] 

Hong Tang commented on MAPREDUCE-1922:
--------------------------------------

Thought I should mention this is in some way related to MAPREDUCE-1698 in the sense that the origins of the problem are the same.

> Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack
> -------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1922
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1922
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Arun C Murthy
>
> As more and more applications use combine file input format (to reduce number of mappers), formats with columns groups implemented as different hdfs files (zebra, hbase), composite input formats (map-side joins), data-locality and rack-locality loses its meaning. (A map task reading only one column group, say 20% of its input, locally and 80% remote still gets flagged as data-local map.)
> So, my suggestion is to drop these counters, and instead, replace them with HDFS_LOCAL_BYTES_READ, HDFS_RACK_BYTES_READ, and HDFS_TOTAL_BYTES_READ. These counters will make it easier to reason about read-performance for maps.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.