You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Vivek Ratan (JIRA)" <ji...@apache.org> on 2008/10/27 05:48:44 UTC

[jira] Created: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Capacity Scheduler needs to re-read its configuration
-----------------------------------------------------

                 Key: HADOOP-4522
                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
             Project: Hadoop Core
          Issue Type: New Feature
            Reporter: Vivek Ratan


An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647775#action_12647775 ] 

dhruba borthakur commented on HADOOP-4522:
------------------------------------------

The one other case of re-reading the config file is for JT or NN failover. Suppose, the machine on which my JT was running dies, and I want to make the running TaskTrackers start communicating with the new JT on a different machine. This requires the TTs to re-read the config file.

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647920#action_12647920 ] 

dhruba borthakur commented on HADOOP-4522:
------------------------------------------

The main problem with DNS is that it is not instantaneous (beucase of cahcing) and the DNS operations is usually done by a administrator that is different from a hadoop admin,

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647461#action_12647461 ] 

dhruba borthakur commented on HADOOP-4522:
------------------------------------------

Will the guarantee be that new values of parameters that were re-read from the config file take effect for new job launches (and not for existing jobs)? Also, what if critical parameters (e.g. mapred.system.dir) has a new changed value?

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648093#action_12648093 ] 

Vivek Ratan commented on HADOOP-4522:
-------------------------------------

bq. is there some part of the system other than the scheduler that needs to look at queues? 
bq. Beyond scheduling, has there been a need to have the system re-read its configuration?

Access control for queues (whether queues can accept jobs from particular users) is system-wide and independent of schedulers. I think we'll want to support dynamic updates to access control sooner than later. You'll certainly want to add new users or remove users from a queue's access control fairly often. You probably don't want to restart the JT each time you want to make such a change. 


> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647425#action_12647425 ] 

Owen O'Malley commented on HADOOP-4522:
---------------------------------------

{quote}
We'd thought of the same approach for the Capacity Scheduler, but one big problem is that the Scheduler may read the config file while it is in the middle of being changed.
{quote}

+1

I think the best trade-off is having an admin command that gets the job tracker to re-read all of its config files, including a call down to the scheduler to re-read its config file. Maybe something like:

{code}
bin/hadoop mapred-admin -reconfigure
{code}



> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647910#action_12647910 ] 

Matei Zaharia commented on HADOOP-4522:
---------------------------------------

One other difference between the config in the fair scheduler and the capacity scheduler is that the fair scheduler uses a file different than the Hadoop config file (hadoop-site.xml/hadoop-default.xml), which actually has its own XML format. This means that admins don't have to worry about which parts of hadoop-site.xml they've changed and which they haven't if they want to change the scheduling config. In my opinion it's far easier to understand the system if hadoop-site.xml is read only on startup, rather than having to learn and worry about which parameters get reloaded and which ones don't. Admins may also edit a config file while the system is running in order to plan for the next restart, and reloading some parameters may be confusing. When job persistence is in, there won't even be too much of a cost to restarting the JobTracker - jobs won't be lost, and the restart is very fast anyway.

By the way a note on what to do if the JT machine fails: why not use DNS and assign the same name to a different machine? This seems a lot less complex.

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642850#action_12642850 ] 

Vivek Ratan commented on HADOOP-4522:
-------------------------------------

There seem to be a number of ways to to this. External applications may need to interact with the Schedulers in a generic way. Re-reading of the config file is the only use case for now, but it's not unlikely that there will be other similar needs. It's also likely that different Schedulers will expose different functionality. The ability to re-read configuration, for example, is only applicable to the Capacity Scheduler for now. 
# We can extend the JT RPC interface to allow JT clients to invoke functionality on the Schedulers. Modifying the JT RPC interface each time to support new Scheduler functionality is not good, so we can perhaps have a single generic method that contains an action parameter and other optional parameters. We're basically letting clients invoke functionality on the Schedulers through the JT. 
# Ideally, clients should talk to Schedulers directly. Perhaps the Schedulers can optionally expose their own RPC interface for their own clients. More work to set this up, but it does let each Scheduler interface evolve on its own. 
# Since we don't have other use cases yet, maybe we just add a _RereadSchedulerConfig()_ method to the JT's RPC interface for now. 

I instinctively prefer the second option: letting Schedulers decide how they want external clients to communicate with them (through RPC or some other mechanism). This is good long-term too, since Schedulers may eventually move out of the JT to  run as a separate process. But, since a Scheduler is currently part of the JobTracker application,  this will cause the latter to support more than one RPC interface and may not be very clean. 

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643407#action_12643407 ] 

Vivek Ratan commented on HADOOP-4522:
-------------------------------------

bq. In the curret code, the Fair Share scheduler periodically checks the modification time of its configuration file to detect any changes.

We'd thought of the same approach for the Capacity Scheduler, but one big problem is that the Scheduler may read the config file while it is in the middle of being changed. Changes to the config file need to be atomic for this approach to work correctly, which seemed too severe a restriction to place. 

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643564#action_12643564 ] 

dhruba borthakur commented on HADOOP-4522:
------------------------------------------

> Changes to the config file need to be atomic for this approach to work correctly,

I agree. The FairShare scheduler re-reads the config and if-and-only-if it can parse the entire file does it accept the new configuration. It works for us, but if  a set of new parameters need to take effect atomically, then there could be a problem.

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648075#action_12648075 ] 

Vivek Ratan commented on HADOOP-4522:
-------------------------------------

bq. One other difference between the config in the fair scheduler and the capacity scheduler is that the fair scheduler uses a file different than the Hadoop config file
The Capacity Scheduler also has its own config file. haddop-site.xml, however, contains information about the queues in the system, which is, or can be, used by all schedulers. 

It's certainly clear that schedulers need to re-read their own config files. It's less clear whether we want the core Hadoop system to re-read its configuration. If we decide not to, for the latter, we still need a way for external clients to tell the scheduler re-read its config. 

Maybe the best thing to do is to add the admin command for reconfiguration, and for now, just have the JT ask the scheduler to re-read/reconfigure (via a new method in {{TaskScheduler}}). We could, in separate Jiras, decide whether other config values need to be re-read by the JT, and add them accordingly. 

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647512#action_12647512 ] 

Vivek Ratan commented on HADOOP-4522:
-------------------------------------

Adding to Dhruba's comments. 

I suppose it's up to the component to decide what to re-read and what not to? For example, do we want the QueueManager to re-read the list of all queues in the system so that it can dynamically add new queues or delete existing queues? probably not, as it will affect jobs in current queues (you can't 'delete' a queue if it has jobs running or waiting). No matter what we do, components that depend on the QueueManager need to know that. The CapacityScheduler, for example, needs to know whether new queues are accepted, so that it may, in turn, read (or not read) scheduling information related to the new queues. 

I think that having the system re-read its configuration through a single call is OK, but we may need to make it explicit, in documentation, what config values are re-read and what are ignored. We'd also want admins to be explicitly aware that ALL config files are re-read. OTOH, you way want to break down config information into sub-sections (config for scheduling, for JT, etc), and provide one or more sub-sections as parameters in the call, but this might get messy. 

Beyond scheduling, has there been a need to have the system re-read its configuration? 

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648087#action_12648087 ] 

Matei Zaharia commented on HADOOP-4522:
---------------------------------------

That makes sense. Out of curiosity, is there some part of the system other than the scheduler that needs to look at queues? If not, then maybe those should also be put into the per-scheduler file. I'm assuming that people won't switch between schedulers very often and so incompatibilities between these won't be a problem.

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643363#action_12643363 ] 

dhruba borthakur commented on HADOOP-4522:
------------------------------------------

This command could be used by the FairShare scheduler too. (src/contrib/fairshare)

In the curret code, the Fair Share scheduler periodically checks the modification time of its configuration file to detect any changes.

> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4522) Capacity Scheduler needs to re-read its configuration

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-4522:
--------------------------------

    Attachment: 4522.1.patch

Attaching patch (4522.1.patch) which does the following: 
* Adds a new method to JobSubmissionProtocol.java: 
{code}
  /**
   * Makes the JT re-read (some of) its configuration. 
   * @throws IOException
   */
  public void reconfigure() throws IOException;
{code}
* JT's implementation currently asks the schedulers do re-read their config. 
* Added a corresponding new method to TaskScheduler.java that does nothing by default: 
{code}
  /**
   * Re-read configuration 
   * @throws IOException
   */
  public void reconfigure() throws IOException {
    // do nothing
  }
{code}
* implemented re-reading of config file for Capacity Scheduler. The following config params can be updated for each queue: 
** guaranteed capacity
** user limit
** reclaim time limit
* In addition, the following config params can be updated for the scheduler: 
** reclaim capacity interval
* As discussed earlier, added a command to mapred-admin to force the JT to reconfigure: 
{code}
bin/hadoop mapred-admin -reconfigure
{code}
* Removed code from CapacitySchedulerConf.java that was added earlier to support re-reading
* Added test cases

While the changes are not very substantial in terms of lines of code, they do change the JobSubmission protocol and the TaskScheduler class. 


> Capacity Scheduler needs to re-read its configuration
> -----------------------------------------------------
>
>                 Key: HADOOP-4522
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4522
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vivek Ratan
>         Attachments: 4522.1.patch
>
>
> An external application (an Ops script, or some CLI-based tool) can change the configuration of the Capacity Scheduler (change the capacities of various queues, for example) by updating its config file. This application then needs to tell the Capacity Scheduler that its config has changed, which causes the Scheduler to re-read its configuration. It's possible that the Capacity Scheduler may need to interact with external applications in other similar ways. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.