You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Hadoop QA (Jira)" <ji...@apache.org> on 2021/12/03 14:39:00 UTC

[jira] [Commented] (YARN-10787) Queue submit ACL check is wrong when CS queue is ambiguous

    [ https://issues.apache.org/jira/browse/YARN-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453053#comment-17453053 ] 

Hadoop QA commented on YARN-10787:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  7m 27s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} branch-3.3 Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 31s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 52s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 41s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 58s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 24s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 43s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 55s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 49s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 28s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 26s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 26s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 31s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 4 new + 18 unchanged - 2 fixed = 22 total (was 20) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 29s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  5m 10s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/patch-shadedclient.txt{color} | {color:red} patch has errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 35s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} spotbugs {color} | {color:red}  0m 30s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/patch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 31s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 33s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/patch-asflicense-problems.txt{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 34s{color} | {color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/artifact/out/Dockerfile |
| JIRA Issue | YARN-10787 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13026251/YARN-10787.branch-3.3.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle spotbugs |
| uname | Linux 7166ea6332bb 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | branch-3.3 / 67eaf5aa9f5 |
| Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 |
|  Test Results | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/testReport/ |
| Max. process+thread count | 536 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
| Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1256/console |
| versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Queue submit ACL check is wrong when CS queue is ambiguous
> ----------------------------------------------------------
>
>                 Key: YARN-10787
>                 URL: https://issues.apache.org/jira/browse/YARN-10787
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Szilard Nemeth
>            Assignee: Gergely Pollák
>            Priority: Major
>             Fix For: 3.4.0
>
>         Attachments: YARN-10787.001.patch, YARN-10787.branch-3.3.001.patch
>
>
> Let's suppose we have a Capacity Scheduler configuration with 2 or more leaf queues with the same name in the queue hierarchy. That's what we call an ambiguous queue name.
>  Let's also enable ACL checks and define acl_submit_applications / acl_administer_queue configs with the correct value, adding the username to the ACL value there.
> Here's a minimalistic YARN + CS config:
> h2. 1. YARN config snippet:
> {code:java}
> <property><name>yarn.acl.enable</name><value>true</value>
> {code}
> h2. 2. CS config snippet:
> {code:java}
> <property>
> 	<name>yarn.scheduler.capacity.root.someparent1.queues</name>
> 	<value>anyotherqueue1,somequeue,anyotherqueue2</value>
> </property>
> <property>
> 	<name>yarn.scheduler.capacity.root.someparent2.queues</name>
> 	<value>anyotherqueue3,somequeue,anyotherqueue4</value>
> </property>
> <property>
> 	<name>yarn.scheduler.capacity.root.someparent1.somequeue.acl_submit_applications</name>
> 	<value>someuser1 </value>
> </property>
> <property>
> 	<name>yarn.scheduler.capacity.root.someparent2.somequeue.acl_submit_applications</name>
> 	<value>someuser1 </value>
> </property>
> <property>
> 	<name>yarn.scheduler.capacity.root.someparent1.somequeue.acl_administer_queue</name>
> 	<value>someuser1 </value>
> </property>
> <property>
> 	<name>yarn.scheduler.capacity.root.someparent2.somequeue.acl_administer_queue</name>
> 	<value>someuser1 </value>
> </property>
> {code}
> So in this case, we have an ambiguous queue named "somequeue" under 2 different paths:
>  - root.someparent1.somequeue
>  - root.someparent2.somequeue
> When a user submits an application correctly with the full queue path e.g. root.someparent1.somequeue, YARN will still fail to place the application to that queue and will use the short name in case ACL checking is enabled.
> h2. 3. LOG SNIPPET
> {code:java}
> 2021-05-20 22:04:32,031 DEBUG org.apache.hadoop.yarn.server.resourcemanager.placement.CSMappingPlacementRule: Placement final result 'root.someparent1.somequeue' for application 'application_1621540945412_0001'
>  2021-05-20 22:04:32,031 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Placed application with ID application_1621540945412_0001 in queue: somequeue, original submission queue was: root.someparent1.somequeue
>  2021-05-20 22:04:32,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Ambiguous queue reference: somequeue please use full queue path instead.
>  2021-05-20 22:04:32,031 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application 'application_1621540945412_0001' is submitted without priority hence considering default queue/cluster priority: 0
>  2021-05-20 22:04:32,032 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Priority '0' is acceptable in queue : somequeue for application: application_1621540945412_0001
>  2021-05-20 22:04:32,993 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Exception in submitting application_1621540945412_0001
>  org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User someuser1 does not have permission to submit application_1621540945412_0001 to queue somequeue
>  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> {code}
> h2. 4. FULL STACKTRACE:
> {code:java}
>  org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User someuser1 does not have permission to submit application_1621540945412_0001 to queue somequeue
> 	at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> 	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:433)
> 	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:330)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:650)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> 	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> Caused by: org.apache.hadoop.security.AccessControlException: User someuser1 does not have permission to submit application_1621540945412_0001 to queue somequeue
> 	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:436)
> 	... 12 more
> 2021-05-20 22:04:32,994 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=someuser1	IP=172.17.61.133	OPERATION=Submit Application Request	TARGET=ClientRMService	RESULT=FAILURE	DESCRIPTION=Exception in submitting application	PERMISSIONS=org.apache.hadoop.security.AccessControlException: User someuser1 does not have permission to submit application_1621540945412_0001 to queue somequeue	APPID=application_1621540945412_0001	QUEUENAME=somequeue
> {code}
> h1. DETAILS:
> *1. The whole thing happens in RMAppManager#createAndPopulateNewRMApp:*
>  Class / method: org.apache.hadoop.yarn.server.resourcemanager.RMAppManager#createAndPopulateNewRMApp
> [LINK|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L407]
> *2. RMAppManager#copyPlacementQueueToSubmissionContext is called* for applications that are new, meaning we are not recovering, an application is submitted in a normal way:
>  Class / method: org.apache.hadoop.yarn.server.resourcemanager.RMAppManager#copyPlacementQueueToSubmissionContext
> [Called at|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L420]
> [Method link|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L991]
> The problem is that copyPlacementQueueToSubmissionContext sets the queue of context (ApplicationSubmissionContext object) from placementContext.getQueue (ApplicationPlacementContext object). If placementcontext holds the queue name in the short form, this will override the default submission queue value, let's suppose it was the full queue path.
>  An example of a generated log from this method:
> {code:java}
>  2021-05-20 22:04:32,031 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Placed application with ID application_1621540945412_0001 in queue: somequeue, original submission queue was: root.someparent1.somequeue
> {code}
> *3. The problematic code block is here:* [Code block|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L446-L475]
> 3.1 First, the short queuename will be gathered from submissionContext, as it was overridden by 'copyPlacementQueueToSubmissionContext': [Link|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L448]
>  This is a bad design, as here we are relying on the fact that the queue name was overridden in the submission context object.
> 3.2 Since the queue name will be in the short form and it's ambiguous, the call to [scheduler.getQueue()|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L450] will return null, as it's implemented like this by design: If the queue name is ambiguous, it returns null.
> 3.3 The condition of checking if csqueue is null AND placementContext is not null will evaluate to true [here|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L452]
> *3.4. The Parent queue will be queried from CS* by the parent queue name of the placement context: [Link|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L456]
> *3.5 Finally, the ACL check fails* as csqueue is the queue object of the parent queue of the queue 'root.someparent1.somequeue' which will be the queue: 'root.someparent1'.
>  In this case, the user don't have a submission ACL set for the parent queue, but the leaf queue so the ACL check fails.
> h2. LIST OF THINGS TO FIX / DO:
>  - Add a unit testcase that replicates the above config and the issue.
>  - Rename copyPlacementQueueToSubmissionContext: This method not really copies anything, it simply overrides the queue value.
>  - Add Debug log to print csqueue object before the authorization code: [Auth code block|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L459-L475]
>  - Fix log messages: As 'copyPlacementQueueToSubmissionContext' overrides (not copies) the original queue name with the queue name from the PlacementContext, all calls to submissionContext.getQueue() will return the short queue name. This results in very misleading log messages as well, including the exception message itself:
> {code:java}
>  org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User someuser1 does not have permission to submit application_1621540945412_0001 to queue somequeue
> {code}
> All log messages should print the original submission queue, if possible.
>  - Actual code fix for the issue: Use full queue path to get the queue object.
>  Again, this is the code block where the fix should happen: [LINK|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L447-L458]
> 'queueName' should have the value set from: *org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext#getFullQueuePath.*
> The equivalent of this in the linked code block:
> {code:java}
> placementContext.getFullQueuePath()
> {code}
> This should happen only if placementContext is not null.
> h2. LONG TERM FIX:
> Investigate if it's possible to eliminate copyPlacementQueueToSubmissionContext.
>  This could introduce nasty backward incompatible issues with recovery, so it should be thought through really carefully.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org