You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Szilard Nemeth (JIRA)" <ji...@apache.org> on 2018/11/20 13:58:01 UTC

[jira] [Commented] (YARN-8951) Defining default queue placement rule in allocations file with create="false" throws an NPE

    [ https://issues.apache.org/jira/browse/YARN-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693266#comment-16693266 ] 

Szilard Nemeth commented on YARN-8951:
--------------------------------------

Thanks [~wilfreds] for your comment!
I debugged the test code a bit more, it turned out that the call of scheduler.init() eventually calls FS.initScheduler() so in terms of this, the scheduler is properly initialized.
With our further offline debugging together, we realized the following: 
1. When {{AllocationFileLoaderService#reloadAllocations}} gets called, it creates the Queue placement policy with calling {{getQueuePlacementPolicy(allocationFileParser, queueProperties, conf)}}, then that calls {{QueuePlacementPolicy.fromXml()}} and eventually creates the QueuePlacementPolicy object. In {{AllocationFileLoaderService#getQueuePlacementPolicy}}, the configured queues are passed in with {{queueProperties.getConfiguredQueues()}}, which means it's just the config file.
So the queues are coming from the config file, regardless what the {{QueueManager}} has. In other words, {{QueuePlacementPolicy}} has a separate (and different) set of queues that the {{QueueManager}} has. 
This could cause several issues.
As [~wilfreds] said, the code changes possibly involve to fix it pretty much in common with YARN-7769 so this is getting on hold until that issue is fixed.

> Defining default queue placement rule in allocations file with create="false" throws an NPE
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-8951
>                 URL: https://issues.apache.org/jira/browse/YARN-8951
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: default-placement-rule-with-create-false.patch
>
>
> If the default queue placement rule is defined with {{create="false"}} and a scheduling request is created for queue {{"root.default"}}, then {{FairScheduler#assignToQueue}} throws an NPE, while trying to construct an error message in the catch block of {{IllegalStateException}}, relying on the fact that the {{rmApp}} is not null but it is.
> Example of such a config file:
> {code:java}
> <?xml version="1.0"?>
> <allocations>
> 	<queue name="parentq" type="parent">
> 		<minResources>1024mb,0vcores</minResources>
> 	</queue>
> 	<queuePlacementPolicy>
> 		<rule name="default" create="false"/>
> 	</queuePlacementPolicy>
> </allocations>
> {code}
> This is suspicious, as there are some null checks for {{rmApp}} in the same method.
>  Not sure if this is a special case for the tests or it is reproducable in a cluster, this needs further investigation.
> In any case, it's not good that we try to dereference the {{rmApp}} that is null.
> On the other hand, I'm not sure if the default queue placement rule with {{create="false"}} makes sense at all. Looking at the documentation ([https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html):]
> {quote}default: the app is placed into the queue specified in the ‘queue’ attribute of the default rule. *If ‘queue’ attribute is not specified, the app is placed into ‘root.default’ queue.*
> A queuePlacementPolicy element: which contains a list of rule elements that tell the scheduler how to place incoming apps into queues. Rules are applied in the order that they are listed. Rules may take arguments. *All rules accept the “create” argument, which indicates whether the rule can create a new queue. “Create” defaults to true; if set to false and the rule would place the app in a queue that is not configured in the allocations file, we continue on to the next rule.* The last rule must be one that can never issue a continue....
> {quote}
> In this case, the rule has the queue property suppressed so the apps should be placed to the {{root.default}} queue (which is an undefined queue according to the config file), and create is false, meaning that the queue {{root.default}} cannot be created at all.
> *This seems to be a case of an invalid queue configuration file for me.*
> [~jlowe], [~leftnoteasy]: What is your take on this?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org