You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/11/03 19:13:00 UTC
[jira] [Commented] (YARN-11608) QueueCapacityVectorInfo NPE when accesible labels config is used

    [ https://issues.apache.org/jira/browse/YARN-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782720#comment-17782720 ] 

ASF GitHub Bot commented on YARN-11608:
---------------------------------------

brumi1024 opened a new pull request, #6250:
URL: https://github.com/apache/hadoop/pull/6250

   ### Description of PR
   
   Added a null check to avoid the NPE when accessible labels config is used.
   
   ### How was this patch tested?
   
   Unit test + brought up a cluster.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   




> QueueCapacityVectorInfo NPE when accesible labels config is used
> ----------------------------------------------------------------
>
>                 Key: YARN-11608
>                 URL: https://issues.apache.org/jira/browse/YARN-11608
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Benjamin Teke
>            Assignee: Benjamin Teke
>            Priority: Major
>
> YARN-11514 extended the REST API to contain CapacityVectors for each configured node label. There is an edgecase however: during the initialization the each queue's capacities map will be filled with 0 capacities for the unconfigured, but accessible labels (where there is no configured capacity for the label, however the queue has access to it based on the accessible-node-labels property). A very basic example configuration for this is the following:
> {code:java}
> "yarn.scheduler.capacity.root.queues": "a, b"
>  "yarn.scheduler.capacity.root.a.capacity": "50");
>  "yarn.scheduler.capacity.root.a.accessible-node-labels": "root-a-default-label"
>  "yarn.scheduler.capacity.root.a.maximum-capacity": "50"
>  "yarn.scheduler.capacity.root.b.capacity": "50"
> {code}
> root.a has access to root-a-default-label, however there is no configured capacity for it. The capacityVectors are parsed based on the configuredCapacity map (created from the "accessible-node-labels.<label>.capacity" configs). When the scheduler info is requested the capacityVectors are collected per label, and the labels used for this are the keySet of the capacity map:
> {code:java}
>     for (String partitionName : capacities.getExistingNodeLabels()) {
>       QueueCapacityVector queueCapacityVector = 
>           queue.getConfiguredCapacityVector(partitionName);
>       queueCapacityVectorInfo = queueCapacityVector == null ?
>               new QueueCapacityVectorInfo(new QueueCapacityVector()) :
>               new QueueCapacityVectorInfo(queue.getConfiguredCapacityVector(partitionName));
> {code}
> {code:java}
> public Set<String> getExistingNodeLabels() {
>     readLock.lock();
>     try {
>       return new HashSet<String>(capacitiesMap.keySet());
>     } finally {
>       readLock.unlock();
>     }
>   }
> {code}
> If the capacitiesMap contains entries that are not "configured", this will result in an NPE, breaking the UI and the REST API:
> {code:java}
> INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacityVectorInfo.<init>(QueueCapacityVectorInfo.java:39)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacitiesInfo.<init>(QueueCapacitiesInfo.java:61)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.populateQueueCapacities(CapacitySchedulerLeafQueueInfo.java:108)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerQueueInfo.<init>(CapacitySchedulerQueueInfo.java:137)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.<init>(CapacitySchedulerLeafQueueInfo.java:66)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.getQueues(CapacitySchedulerInfo.java:197)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.<init>(CapacitySchedulerInfo.java:94)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getSchedulerInfo(RMWebServices.java:399)
> {code}
> There is no need to create capacityVectors for the unconfigured labels, so a null check should solve this issue on the API side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org