You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Xiaoyu Yao (Jira)" <ji...@apache.org> on 2020/03/02 19:30:00 UTC

[jira] [Commented] (HADOOP-16828) Zookeeper Delegation Token Manager fetch sequence number by batch

    [ https://issues.apache.org/jira/browse/HADOOP-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049589#comment-17049589 ] 

Xiaoyu Yao commented on HADOOP-16828:
-------------------------------------

Thanks [~fengnanli] for reporting the issue and provide the patch. The patch LGTM overall. The performance improvement is impressive. Here are a few minor comments.

ZKDelegationTokenSecretManager.java

Line:100 NIT: can we add a token as part of the prefix for the new key?
i.e. "token.seqnum.batch.size"

Line 559: getDelegationTokenSeqNum() this function needs to be changed as 
the delTokenSeqCounter.getCount() will be updated in batch. We should return currentSeqNum here instead.

TestZKDelegationTokenSecretManager.java
As shown in the test, if the batch size is large, say 1000, this might leave holes in the sequence number
when KMS failover. It might be an acceptable tradeoff. 

Please ensure the DTSM instances (tm1, tm2) are properly destroyed after the test by calling verifyDestroy(). 


> Zookeeper Delegation Token Manager fetch sequence number by batch
> -----------------------------------------------------------------
>
>                 Key: HADOOP-16828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16828
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Fengnan Li
>            Assignee: Fengnan Li
>            Priority: Major
>         Attachments: HADOOP-16828.001.patch, Screen Shot 2020-01-25 at 2.25.06 PM.png, Screen Shot 2020-01-25 at 2.25.16 PM.png, Screen Shot 2020-01-25 at 2.25.24 PM.png
>
>
> Currently in ZKDelegationTokenSecretManager.java the seq number is incremented by 1 each time there is a request for creating new token. This will need to send traffic to Zookeeper server. With multiple managers running, there is data contention going on. Also, since the current logic of incrementing is using tryAndSet which is optimistic concurrency control without locking. This data contention is having performance degradation when the secret manager are under volume of traffic.
> The change here is to fetching this seq number by batch instead of 1, which will reduce the traffic sent to ZK and make many operations inside ZK secret manager's memory.
> After putting this into production we saw huge improvement to the RPC processing latency of get delegationtoken calls. Also, since ZK takes less traffic in this way. Other write calls, like renew and cancel delegation tokens are benefiting from this change.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org