You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Varun Saxena (JIRA)" <ji...@apache.org> on 2014/12/30 20:25:14 UTC
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily
and wrongly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261385#comment-14261385 ]
Varun Saxena commented on YARN-2958:
------------------------------------
[~jianhe] / [~zjshen], kindly review
> RMStateStore seems to unnecessarily and wrongly store sequence number separately
> --------------------------------------------------------------------------------
>
> Key: YARN-2958
> URL: https://issues.apache.org/jira/browse/YARN-2958
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Zhijie Shen
> Assignee: Varun Saxena
> Priority: Blocker
> Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch
>
>
> It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting.
> First, the current logic seems to be problematic:
> {code}
> public synchronized void updateRMDelegationTokenAndSequenceNumber(
> RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
> int latestSequenceNumber) {
> if(isFencedState()) {
> LOG.info("State store is in Fenced state. Can't update RM Delegation Token.");
> return;
> }
> try {
> updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate,
> latestSequenceNumber);
> } catch (Exception e) {
> notifyStoreOperationFailed(e);
> }
> }
> {code}
> {code}
> @Override
> protected void updateStoredToken(RMDelegationTokenIdentifier id,
> long renewDate) {
> try {
> LOG.info("updating RMDelegation token with sequence number: "
> + id.getSequenceNumber());
> rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
> renewDate, id.getSequenceNumber());
> } catch (Exception e) {
> LOG.error("Error in updating persisted RMDelegationToken with sequence number: "
> + id.getSequenceNumber());
> ExitUtil.terminate(1, e);
> }
> }
> {code}
> According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence:
> 1. Get DT 1 (seq = 1)
> 2. Get DT 2( seq = 2)
> 3. Renew DT 1 (seq = 1)
> 4. Restart RM
> The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num.
> Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one.
> {code}
> public void recover(RMState rmState) throws Exception {
> LOG.info("recovering RMDelegationTokenSecretManager.");
> // recover RMDTMasterKeys
> for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
> .getMasterKeyState()) {
> addKey(dtKey);
> }
> // recover RMDelegationTokens
> Map<RMDelegationTokenIdentifier, Long> rmDelegationTokens =
> rmState.getRMDTSecretManagerState().getTokenState();
> this.delegationTokenSequenceNumber =
> rmState.getRMDTSecretManagerState().getDTSequenceNumber();
> for (Map.Entry<RMDelegationTokenIdentifier, Long> entry : rmDelegationTokens
> .entrySet()) {
> addPersistedDelegationToken(entry.getKey(), entry.getValue());
> }
> }
> {code}
> The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number.
> {code}
> if (identifier.getSequenceNumber() > getDelegationTokenSeqNum()) {
> setDelegationTokenSeqNum(identifier.getSequenceNumber());
> }
> {code}
> All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT.
> To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)