You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jonathan Hung (JIRA)" <ji...@apache.org> on 2016/12/09 03:07:59 UTC
[jira] [Comment Edited] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

    [ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734123#comment-15734123 ] 

Jonathan Hung edited comment on YARN-5734 at 12/9/16 3:07 AM:
--------------------------------------------------------------

Thanks for the detailed points, [~leftnoteasy]. 
bq. How to handle bad configuration update?
The idea of calling scheduler#reinitialize mostly makes sense to me, a couple questions/thoughts:
* If the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we will need to call scheduler.reinitialize(X). In this case we need to call reinitialize twice. Is this acceptable?
* I think we will still need some sort of PluggablePolicy, but in this case it is just an authorization policy so we can leverage YarnAuthorizationProvider.

bq. By using ConfigurationProvider, it can either get a new CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS). 
Not sure if this is what you meant, but we can have MutableConfigurationManager extends ConfigurationProvider? So we would just have MutableConfigurationManager expose the X+1 configuration when validating the configuration, and either un-expose it (if failed to reinitialize) or keep it expose and store in backing store (if reinitialized successfully). 
bq. If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed.
I agree.
bq. So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use.
I am thinking we can add a scheduler specific ConfigurationProvider option in yarn-site.xml. Then we can infer the config source from there. So if the scheduler specific ConfigurationProvider is MutableConfigurationManager, it will use the store. Else, use the file.
bq. If admin want to load configuration file from xml while setting the cluster, or want to switch from xml-file based config to store-based config, we can provide a CLI to load a XML file and save it to store.
Not sure what you mean by loading configuration file from xml while setting the cluster, can you elaborate on that? Do you mean if store is enabled and the admin wants to wipe it and load a new conf from a file into the store? Do we plan on supporting that?
For switching from xml based to store based, I was thinking we could just manually change the scheduler's configuration provider in yarn-site.xml then restart the RM. Otherwise if we allow them to do this via CLI, the yarn-site.xml is not consistent with RM behavior (since yarn-site will still say it is file based but the RM will be store-based).


was (Author: jhung):
Thanks for the detailed points, [~leftnoteasy]. 
bq. How to handle bad configuration update?
The idea of calling scheduler#reinitialize mostly makes sense to me, a couple questions/thoughts:
* If the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we will need to call scheduler.reinitialize(X). In this case we need to call reinitialize twice. Is this acceptable?
* I think we will still need some sort of PluggablePolicy, but in this case it is just an authorization policy so we can leverage YarnAuthorizationProvider.
bq. By using ConfigurationProvider, it can either get a new CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS). 
Not sure if this is what you meant, but we can have MutableConfigurationManager extends ConfigurationProvider? So we would just have MutableConfigurationManager expose the X+1 configuration when validating the configuration, and either un-expose it (if failed to reinitialize) or keep it expose and store in backing store (if reinitialized successfully). 
bq. If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed.
I agree.
bq. So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use.
I am thinking we can add a scheduler specific ConfigurationProvider option in yarn-site.xml. Then we can infer the config source from there. So if the scheduler specific ConfigurationProvider is MutableConfigurationManager, it will use the store. Else, use the file.
bq. If admin want to load configuration file from xml while setting the cluster, or want to switch from xml-file based config to store-based config, we can provide a CLI to load a XML file and save it to store.
Not sure what you mean by loading configuration file from xml while setting the cluster, can you elaborate on that? Do you mean if store is enabled and the admin wants to wipe it and load a new conf from a file into the store? Do we plan on supporting that?
For switching from xml based to store based, I was thinking we could just manually change the scheduler's configuration provider in yarn-site.xml then restart the RM. Otherwise if we allow them to do this via CLI, the yarn-site.xml is not consistent with RM behavior (since yarn-site will still say it is file based but the RM will be store-based).

> OrgQueue for easy CapacityScheduler queue configuration management
> ------------------------------------------------------------------
>
>                 Key: YARN-5734
>                 URL: https://issues.apache.org/jira/browse/YARN-5734
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Min Shen
>            Assignee: Min Shen
>         Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it very inconvenient to apply any changes to the queue configurations. We saw 2 main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. For example, in our cluster setup, we leverage the queue mapping feature from YARN-2411 to route users to their dedicated organization queues. It could be extremely cumbersome to keep updating the config file to manage the very dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is unable to make any queue configuration changes to resize the subqueues, changing queue ACLs, or creating new queues. All these operations need to be performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible configuration mechanism that allows queue configurations to be stored and managed more dynamically. We developed the feature internally at LinkedIn which introduces the concept of MutableConfigurationProvider. What it essentially does is to provide a set of configuration mutation APIs that allows queue configurations to be updated externally with a set of REST APIs. When performing the queue configuration changes, the queue ACLs will be honored, which means only queue administrators can make configuration changes to a given queue. MutableConfigurationProvider is implemented as a pluggable interface, and we have one implementation of this interface which is based on Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, and have gone through several iterations of gathering feedbacks from users and improving accordingly. With this feature, cluster administrators are able to automate lots of thequeue configuration management tasks, such as setting the queue capacities to adjust cluster resources between queues based on established resource consumption patterns, or managing updating the user to queue mappings. We have attached our design documentation with this ticket and would like to receive feedbacks from the community regarding how to best integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org