You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Prabhu Joseph (Jira)" <ji...@apache.org> on 2020/05/22 13:35:00 UTC

[jira] [Updated] (YARN-10287) Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping

     [ https://issues.apache.org/jira/browse/YARN-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prabhu Joseph updated YARN-10287:
---------------------------------
    Description: 
Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping.  The deletion is failed with below error message but the queue got removed and job submission failed and not removed from the ZKConfigurationStore. On subsequent modify using scheduler-conf, the queue appears again from ZKConfigurationStore

{code}
2020-05-22 12:38:38,252 ERROR org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception thrown when modifying configuration.
java.io.IOException: Failed to re-init queues : mapping contains invalid or non-leaf queue Prod
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:478)
	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2389)
	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2377)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.updateSchedulerConfiguration(RMWebServices.java:2377)
{code}

*Repro:*

{code}
1. Setup Queue Mapping

yarn.scheduler.capacity.root.queues=default,dummy
yarn.scheduler.capacity.queue-mappings=g:hadoop:dummy

2. Stop the root.dummy queue

<update-queue>
       <queue-name>root.dummy</queue-name>
       <params>
         <entry>
           <key>state</key>
           <value>STOPPED</value>
         </entry>
       </params>
     </update-queue>
	 
	 
3. Delete the root.dummy queue

curl --negotiate -u : -X PUT -d @abc.xml -H "Content-type: application/xml" 'http://<RM_IP>:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'

<sched-conf>
      <update-queue>
          <queue-name>root.default</queue-name>
          <params>
            <entry>
              <key>capacity</key>
              <value>100</value>
            </entry>
          </params>
        </update-queue>

        <remove-queue>root.dummy</remove-queue>
      </sched-conf>  
{code}



  was:
Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping.  The deletion is failed with below error message but the queue got removed and job submission failed but not removed from the ZKConfigurationStore. On subsequent modify using scheduler-conf, the queue appears again from ZKConfigurationStore

{code}
2020-05-22 12:38:38,252 ERROR org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception thrown when modifying configuration.
java.io.IOException: Failed to re-init queues : mapping contains invalid or non-leaf queue Prod
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:478)
	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2389)
	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2377)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.updateSchedulerConfiguration(RMWebServices.java:2377)
{code}

*Repro:*

{code}
1. Setup Queue Mapping

yarn.scheduler.capacity.root.queues=default,dummy
yarn.scheduler.capacity.queue-mappings=g:hadoop:dummy

2. Stop the root.dummy queue

<update-queue>
       <queue-name>root.dummy</queue-name>
       <params>
         <entry>
           <key>state</key>
           <value>STOPPED</value>
         </entry>
       </params>
     </update-queue>
	 
	 
3. Delete the root.dummy queue

curl --negotiate -u : -X PUT -d @abc.xml -H "Content-type: application/xml" 'http://<RM_IP>:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'

<sched-conf>
      <update-queue>
          <queue-name>root.default</queue-name>
          <params>
            <entry>
              <key>capacity</key>
              <value>100</value>
            </entry>
          </params>
        </update-queue>

        <remove-queue>root.dummy</remove-queue>
      </sched-conf>  
{code}




> Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10287
>                 URL: https://issues.apache.org/jira/browse/YARN-10287
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler
>    Affects Versions: 3.3.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>
> Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping.  The deletion is failed with below error message but the queue got removed and job submission failed and not removed from the ZKConfigurationStore. On subsequent modify using scheduler-conf, the queue appears again from ZKConfigurationStore
> {code}
> 2020-05-22 12:38:38,252 ERROR org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception thrown when modifying configuration.
> java.io.IOException: Failed to re-init queues : mapping contains invalid or non-leaf queue Prod
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:478)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2389)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2377)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> 	at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.updateSchedulerConfiguration(RMWebServices.java:2377)
> {code}
> *Repro:*
> {code}
> 1. Setup Queue Mapping
> yarn.scheduler.capacity.root.queues=default,dummy
> yarn.scheduler.capacity.queue-mappings=g:hadoop:dummy
> 2. Stop the root.dummy queue
> <update-queue>
>        <queue-name>root.dummy</queue-name>
>        <params>
>          <entry>
>            <key>state</key>
>            <value>STOPPED</value>
>          </entry>
>        </params>
>      </update-queue>
> 	 
> 	 
> 3. Delete the root.dummy queue
> curl --negotiate -u : -X PUT -d @abc.xml -H "Content-type: application/xml" 'http://<RM_IP>:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'
> <sched-conf>
>       <update-queue>
>           <queue-name>root.default</queue-name>
>           <params>
>             <entry>
>               <key>capacity</key>
>               <value>100</value>
>             </entry>
>           </params>
>         </update-queue>
>         <remove-queue>root.dummy</remove-queue>
>       </sched-conf>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org