You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/03/16 07:29:29 UTC

[jira] Commented: (MAPREDUCE-2388) Map tasks with no locality hints should not always be considered off-rack

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007362#comment-13007362 ] 

Todd Lipcon commented on MAPREDUCE-2388:
----------------------------------------

It seems there are a few classes of tasks that have no locality info:

a) tasks that truly have no input at all (eg "sleep job" or teragen)
b) tasks with input splits pointing to an external system (eg S3, or DBInputFormat)
c) tasks pointing to HDFS files where the InputFormat implementer hasn't properly provided split locations

Are there other examples I'm not thinking of?

The tradeoff of considering these types of tasks local is that they might take a slot where some other task could actually get some benefit from being local. But considering them non-local slows down their scheduling, and may even cause starvation if there are always local tasks available as slots open up.


> Map tasks with no locality hints should not always be considered off-rack
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2388
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2388
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Todd Lipcon
>
> When a map task has no locality hints, it's currently always considered "off-rack". This has a couple side effects which aren't great, most notably that most schedulers will only assign one off-rack task per heartbeat. This limits the scheduling throughput of these jobs.
> It's unclear that it's always correct to schedule them as "node-local" either, though most examples I can think of are probably better treated this way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira