You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Sidharta Seethana (JIRA)" <ji...@apache.org> on 2015/03/10 19:18:47 UTC

[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

    [ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355377#comment-14355377 ] 

Sidharta Seethana commented on YARN-2140:
-----------------------------------------

You are right - there are several areas to think about here and we definitely need to put in more thought w.r.t scheduling. In order to be able to do effective scheduling for network resources, we would need to understand a) the overall network topology in place for the cluster in question - characteristics of the ‘route’ between any two nodes in the cluster - number of hops required and the available/max bandwidth at each point in the route. b) application characteristics w.r.t network utilization - internal/external traffic, latency vs. bandwidth sensitivities etc. With regards to inbound traffic, we currently do not have a good way to do effectively manage traffic - when inbound packets are being ‘examined’ on a given node, they have already consumed bandwidth along the way - and the only option we have is to drop it immediately (we cannot queue on the inbound side) or let it through - the design document mentions these limitations. One possible approach here could be to let the application provide ‘hints’  for inbound network utilization (not all applications might be able to do this) and use this information purely for scheduling purposes. This, of course, adds more complexity to scheduling. 

Needless to say, there are hard problems to solve here - and the (network) scheduling requirements (and potential approaches for implementation) will need further looking into. As a first step, though, I think it makes sense to focus on classification of outbound traffic (net_cls) and maybe basic isolation/enforcement + collection of metrics. Once we have this in place - we could look at real utilization patterns and decide what the next steps should be. 




> Add support for network IO isolation/scheduling for containers
> --------------------------------------------------------------
>
>                 Key: YARN-2140
>                 URL: https://issues.apache.org/jira/browse/YARN-2140
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>         Attachments: NetworkAsAResourceDesign.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)