You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Naganarasimha G R (JIRA)" <ji...@apache.org> on 2016/12/01 19:13:59 UTC
[jira] [Commented] (YARN-3409) Add constraint node labels

    [ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712803#comment-15712803 ] 

Naganarasimha G R commented on YARN-3409:
-----------------------------------------

Thanks for support [~devaraj.k] & [~kkaranasos],
Its encouraging to see this feature would be useful for other scenarios too.
bq. Can NodeManagers have attribute names same as some label/partition name in the cluster? 
this is one of the reason i want to separate this into two different sets(partitions and constraints). So that the even if same name it will not overlap as partition and constraint expression differs. 
bq. Did you think about having one expression(existing) which handles node label expression and constraints expression without delimiter between label and constraints expressions, constraints expression support implementation can be added without any new configurations/interfaces.
Yes Deva, had considered this option and have described in section {{Topics for discussion -> ResourceRequest modifications -> Option 3}}, hope you have taken a look at it, if required we can further discuss more about it.
bq. Can we have some details about how the NodeManager report these attributes to ResourceManager?
Here we are discussing 2 things from NM side, one is supporting  scripts or configs in the NM side which is already being supported as part of YARN-2495, latest documentation you can refer [features|http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-yarn/hadoop-yarn-site/NodeLabel.html#Features].

bq. I also (strongly ) suggest to use these ConstraintNodeLabels in the context of YARN-1042 for supporting (anti-)affinity constraints as well. I think it will greatly avoid duplicate effort and simplify the code.
Agree, i presume it should be pluggable enough, either at the NM or the RM side for these app specific constraints on the nodes. May be when my initial prototype is ready we can further discuss about this. 

bq. On a similar note, can these ContraintNodeLabels be added/removed dynamically? For example, when a container starts its execution, it might contain some attributes (to be added – I know such attributes cannot be specified at the moment). Those attributes will then be added to the node's labels, for the time the container is running. This can be useful for (anti-)affinity constraints.
Yeah we were looking at some scenarios where in NM's side resource stats like loadaverage, swap etc.. for dynamic constraint values. But what you specified is also an interesting use case which can easily fit in the proposed scheme of things (but we need to be careful of the security scenarios for such type of constraints though).

bq. so we might consider using a name that denotes that 
Well actually the labels(partition) could have been named as pools and constraints as labels, but anyway for existing i am fine {{ConstraintExpression}}/{{AttributeExpression}} but would prefer the former as to an extent its been already in usage.

bq.  It might also be that the implementation of ConstraintNodeLabels will be easier at some places than that of NodeLabels/Partitions
Yes agree as there is less modifications in the scheduler node hence relatively simple but may be multiple places we might have to modify to make it usable hence we proposed for branch. Once i upload the WIP patches will create Branch and umbrella. Thought of cloning this jira to continue discussions here and jira tracking in the other.

bq. Can you please give an example of a cluster-level constraint?
Cluster-level constraint is similar to the existing {{ClusterNodeLabels}} which is super set of all the available node labels which will be used by RM for validating the expression and scheduling.  As we plane to support different type of Constraints, we require to maintain this super set, so that constraints reported for one node do not conflict with the other (if the types are different).

bq. Making sure I understand.. Why do we need this constraint? I think they are orthogonal, right? Unless you mean that if the user specifies a constraint, it has to be taken into account too, which I understand.
Agree its orthogonal, but partition is like a logical cluster and when user specifies a partition then nodes under the partition satisfying the constraint labels needs to be picked to ensure the compatibility of the current partition feature (partition capacity planning, headroom etc). Currently we are only supporting partition expression to have only one partition, i envisage like partition1||partition2  with constraint expression as (HAS_GPU && !Windows), so that all the nodes of Partition 1 and Parttion2 satifying the constraint expression will be picked for scheduling of the container.

bq. We assumed that the ConstraintNodeLabels are following the hierarchy of the cluster. That is, a rack was inheriting the ConstraintNodeLabels of all its nodes. A detail here is that we considered only existential ConstraintNodeLabels 
May be i did not get what you mentioned more clearly. What we were discussing was like constraint expression could differ for different ResourceRequests, (node/rack/any) could support different constraint labels. Like if i get locality then i don't want any constraints and if i get rack local nodes pick node say supporting (HAS_SSD || NUM_DISKS>4) and if for any then say pick nodes with different expression say (NUM_SSD_DISK > 4). 
But I was not able to completely understand the need to have constraints for the rack itself. IIUC even now or the global scheduler does scheduling based on node itself and not @ one layer above i.e rack. But yes when selecting the node we do check its rack but we have the constraints of the node we need not worry about the rack right?
bq. For instance, group nodes that belong to the same upgrade domain (being upgraded at the same time – we see this use case a lot in our clusters).
With the proposed approach we could achieve with a new String based label like {{upgrade_group}} and may be have values for the node like *group1, group2 ...*

Sorry for the delay, i have WIP patch with test cases for expression evaluation (with custom Constraint) almost ready. Just final nits of expression doc i am updating, will upload by tomorrow. In next week will upload for WIP on scheduling and configuration. This would help us in clarifying how we view this feature and can think more on further optimizations.



> Add constraint node labels
> --------------------------
>
>                 Key: YARN-3409
>                 URL: https://issues.apache.org/jira/browse/YARN-3409
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, capacityscheduler, client
>            Reporter: Wangda Tan
>            Assignee: Naganarasimha G R
>         Attachments: Constraint-Node-Labels-Requirements-Design-doc_v1.pdf
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to determinate how resources of a special set of nodes could be shared by a group of entities (like teams, departments, etc.). Partitions of a cluster has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Constraints are orthogonal to partition, they’re describing attributes of node’s hardware/software just for affinity. Some example of constraints:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org