You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jonathan Hung (JIRA)" <ji...@apache.org> on 2017/02/04 01:53:51 UTC

[jira] [Commented] (YARN-5946) Create YarnConfigurationStore interface and InMemoryConfigurationStore class

    [ https://issues.apache.org/jira/browse/YARN-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852478#comment-15852478 ] 

Jonathan Hung commented on YARN-5946:
-------------------------------------

Discussed offline with [~xgong] and [~leftnoteasy], seems there are some concerns about how this will be handled in the RM HA case. For the derby implementation, we can have a table1 containing the "last good" configuration, a table2 containing the mutations (one per row, basically an audit log), and a table3 with the "last good" transaction id (initialized at 0). Each mutation row in the audit log table will have a distinct and incrementing id. For a mutation, the workflow will be:

# Client issues configuration mutation
# Mutation passed to capacity scheduler, which then passes it to MCM
# MCM logs the change in table2 (atomically, using derby's commit) via the YarnConfigurationStore interface. This change will have txn id one greater than whatever is in table3.
# MCM calls cs.reinitialize with the in-memory CS configuration with the mutation applied
# If success, MCM stores the mutation in table1 and increments the txn id in table3 (both of these are done together atomically). If failure, MCM increments the txn id in table3 but doesn't change table1.

Steps 3-5 should be synchronized to prevent mutations to be logged in a different order than they are applied.

For the failover case, first we can NFS mount the derby DB on both RMs. Now if there is failover anytime after step 3, the new RM will read table3, see if there are any logs in table2 with txn id greater than this value (via YarnConfigurationStore interface), and apply these mutations, calling cs.reinitialize on the current configuration with this mutation applied, and storing if valid.

Note that since steps 3-5 are synchronized, there should be at most one log line from table2 to replay onto table1 when recovering.

Let me know if there are any concerns with this.

> Create YarnConfigurationStore interface and InMemoryConfigurationStore class
> ----------------------------------------------------------------------------
>
>                 Key: YARN-5946
>                 URL: https://issues.apache.org/jira/browse/YARN-5946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>         Attachments: YARN-5946.001.patch, YARN-5946-YARN-5734.002.patch
>
>
> This class provides the interface to persist YARN configurations in a backing store.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org