You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Lantao Jin (JIRA)" <ji...@apache.org> on 2019/02/28 05:40:00 UTC

[jira] [Comment Edited] (YARN-9332) RackResolver tool should accept multiple hosts

    [ https://issues.apache.org/jira/browse/YARN-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780125#comment-16780125 ] 

Lantao Jin edited comment on YARN-9332 at 2/28/19 5:39 AM:
-----------------------------------------------------------

[~cheersyang] Yes, Spark has to invoke this method in a deep loop.
{code}
for (i <- (0 until numTasks).reverse) {
    addPendingTask(i, true)
}
...
private[spark] def addPendingTask(index: Int) {
    for (loc <- tasks(index).preferredLocations) {
      ...
      for (rack <- sched.getRackForHost(loc.host)) {      //<---- invoke here
        ...
      }
    }
  ...
}
{code}
I am preparing a ticket for Spark, it could save 15~20 seconds when launching big task set.


was (Author: cltlfcjin):
[~cheersyang] Yes, Spark has to invoke this method in a deep loop.
{code}
for (i <- (0 until numTasks).reverse) {
    addPendingTask(i, true)
}
...
private[spark] def addPendingTask(index: Int) {
    for (loc <- tasks(index).preferredLocations) {
      ...
      for (rack <- sched.getRackForHost(loc.host)) {      //<---- invoke here
        ...
      }
    }
  ...
}
{code}
I am preparing a ticket for Spark, it could save 15~20 seconds when launching big task.

> RackResolver tool should accept multiple hosts
> ----------------------------------------------
>
>                 Key: YARN-9332
>                 URL: https://issues.apache.org/jira/browse/YARN-9332
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.9.2, 3.0.3, 2.8.5, 2.7.7, 3.1.2
>            Reporter: Lantao Jin
>            Assignee: Lantao Jin
>            Priority: Minor
>         Attachments: YARN-9332.001.patch
>
>
> RackResolver as a public rack resolver tool only offers a method {{public static Node resolve(String hostName)}} which only accepts one host a time. Actually the internal implementation class {{DNSToSwitchMapping}} always accept a host list as its input and return a list of resolved racks. That's cause the invoker like Spark takes a long time to resolve the rack info when handling abundant tasks (a mass of loops to execute script to resolve rack info).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org