You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "ZanderXu (Jira)" <ji...@apache.org> on 2022/07/20 12:32:00 UTC

[jira] [Created] (HDFS-16671) RBF: RouterRpcFairnessPolicyController supports configurable permit acquire timeout

ZanderXu created HDFS-16671:
-------------------------------

             Summary: RBF: RouterRpcFairnessPolicyController supports configurable permit acquire timeout
                 Key: HDFS-16671
                 URL: https://issues.apache.org/jira/browse/HDFS-16671
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: ZanderXu
            Assignee: ZanderXu


RouterRpcFairnessPolicyController supports configurable permit acquire timeout. Hardcode 1s is very long, and it has causes an incident in our prod environment when one nameserivce is busy.

And the optimal timeout maybe should be less than p50(avgTime).

And all handlers in RBF is waiting to acquire the permit of the busy ns. 

{code:java}
"IPC Server handler 12 on default port 8888" #2370 daemon prio=5 os_prio=0 tid=? nid=?  waiting on condition [?]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <?> (a java.util.concurrent.Semaphore$NonfairSync)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
	at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
	at org.apache.hadoop.hdfs.server.federation.fairness.AbstractRouterRpcFairnessPolicyController.acquirePermit(AbstractRouterRpcFairnessPolicyController.java:56)
	at org.apache.hadoop.hdfs.server.federation.fairness.DynamicRouterRpcFairnessPolicyController.acquirePermit(DynamicRouterRpcFairnessPolicyController.java:123)
	at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.acquirePermit(RouterRpcClient.java:1500)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org