You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Tarun Parimi (JIRA)" <ji...@apache.org> on 2019/01/18 06:56:00 UTC
[jira] [Updated] (YARN-9209) When nodePartition is not set in
Placement Constraints, containers are allocated only in default partition
[ https://issues.apache.org/jira/browse/YARN-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tarun Parimi updated YARN-9209:
-------------------------------
Description:
When application sets a placement constraint without specifying a nodePartition, the default partition is always chosen as the constraint when allocating containers. This can be a problem. when an application is submitted to a queue which has doesn't have enough capacity available on the default partition.
This is a common scenario when node labels are configured for a particular queue. The below sample sleeper service cannot get even a single container allocated when it is submitted to a "labeled_queue", even though enough capacity is available on the label/partition configured for the queue. Only the AM container runs.
{code:java}{
"name": "sleeper-service",
"version": "1.0.0",
"queue": "labeled_queue",
"components": [
{
"name": "sleeper",
"number_of_containers": 2,
"launch_command": "sleep 90000",
"resource": {
"cpus": 1,
"memory": "4096"
},
"placement_policy": {
"constraints": [
{
"type": "ANTI_AFFINITY",
"scope": "NODE",
"target_tags": [
"sleeper"
]
}
]
}
}
]
}
{code}
It runs fine if I specify the node_partition explicitly in the constraints like below.
{code:java}
{
"name": "sleeper-service",
"version": "1.0.0",
"queue": "labeled_queue",
"components": [
{
"name": "sleeper",
"number_of_containers": 2,
"launch_command": "sleep 90000",
"resource": {
"cpus": 1,
"memory": "4096"
},
"placement_policy": {
"constraints": [
{
"type": "ANTI_AFFINITY",
"scope": "NODE",
"target_tags": [
"sleeper"
],
"node_partition": [
"label"
]
}
]
}
}
]
}
{code}
The problem seems to be because only the default partition "" is considered when node_partition constraint is not specified as seen in below RM log.
{code:java}
2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367)) - Successfully added SchedulingRequest to app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper]. nodePartition=
{code}
However, I think it makes more sense to consider "*" when no node_partition is specified in the placement constraint. Since not specifying any node_partition should ideally mean we don't enforce placement constraints on any node_partition. However we are enforcing the default partition instead now.
was:
When application sets a placement constraint without specifying a nodePartition, the default partition is always chosen as the constraint when allocating containers. This can be a problem. when an application is submitted to a queue which has doesn't have enough capacity available on the default partition.
This is a common scenario when node labels are configured for a particular queue. The below sample sleeper service cannot get even a single container allocated when it is submitted to a "labeled_queue", even though enough capacity is available on the label/partition configured for the queue. Only the AM container runs.
{code:java} { "name": "sleeper-service", "version": "1.0.0", "queue":"labeled_queue", "components" : [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 90000", "resource": { "cpus": 1, "memory": "4096" }, "placement_policy": { "constraints": [ { "type": "ANTI_AFFINITY", "scope": "NODE", "target_tags": [ "sleeper" ] } ] } } ] } {code}
It runs fine if I specify the node_partition explicitly in the constraints like below.
{code:java} { "name": "sleeper-service", "version": "1.0.0", "queue":"labeled_queue", "components" : [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 90000", "resource": { "cpus": 1, "memory": "4096" }, "placement_policy": { "constraints": [ { "type": "ANTI_AFFINITY", "scope": "NODE", "target_tags": [ "sleeper" ], "node_partition": [ "label" ] } ] } } ] } {code}
The problem seems to be because only the default partition "" is considered when node_partition constraint is not specified as seen in below RM log.
{code:java} 2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367)) - Successfully added SchedulingRequest to app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper]. nodePartition= {code}
However, I think it makes more sense to consider "*" when no node_partition is specified in the placement constraint. Since not specifying any node_partition should ideally mean we don't enforce placement constraints on any node_partition. However we are enforcing the default partition instead now.
> When nodePartition is not set in Placement Constraints, containers are allocated only in default partition
> ----------------------------------------------------------------------------------------------------------
>
> Key: YARN-9209
> URL: https://issues.apache.org/jira/browse/YARN-9209
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, scheduler
> Affects Versions: 3.1.0
> Reporter: Tarun Parimi
> Priority: Major
>
> When application sets a placement constraint without specifying a nodePartition, the default partition is always chosen as the constraint when allocating containers. This can be a problem. when an application is submitted to a queue which has doesn't have enough capacity available on the default partition.
> This is a common scenario when node labels are configured for a particular queue. The below sample sleeper service cannot get even a single container allocated when it is submitted to a "labeled_queue", even though enough capacity is available on the label/partition configured for the queue. Only the AM container runs.
> {code:java}{
> "name": "sleeper-service",
> "version": "1.0.0",
> "queue": "labeled_queue",
> "components": [
> {
> "name": "sleeper",
> "number_of_containers": 2,
> "launch_command": "sleep 90000",
> "resource": {
> "cpus": 1,
> "memory": "4096"
> },
> "placement_policy": {
> "constraints": [
> {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
> "sleeper"
> ]
> }
> ]
> }
> }
> ]
> }
> {code}
> It runs fine if I specify the node_partition explicitly in the constraints like below.
> {code:java}
> {
> "name": "sleeper-service",
> "version": "1.0.0",
> "queue": "labeled_queue",
> "components": [
> {
> "name": "sleeper",
> "number_of_containers": 2,
> "launch_command": "sleep 90000",
> "resource": {
> "cpus": 1,
> "memory": "4096"
> },
> "placement_policy": {
> "constraints": [
> {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
> "sleeper"
> ],
> "node_partition": [
> "label"
> ]
> }
> ]
> }
> }
> ]
> }
> {code}
> The problem seems to be because only the default partition "" is considered when node_partition constraint is not specified as seen in below RM log.
> {code:java}
> 2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367)) - Successfully added SchedulingRequest to app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper]. nodePartition=
> {code}
> However, I think it makes more sense to consider "*" when no node_partition is specified in the placement constraint. Since not specifying any node_partition should ideally mean we don't enforce placement constraints on any node_partition. However we are enforcing the default partition instead now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org