You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Jonathan Hung (JIRA)" <ji...@apache.org> on 2017/09/01 20:45:00 UTC
[jira] [Created] (HADOOP-14828) RetryUpToMaximumTimeWithFixedSleep
is not bounded by maximum time
Jonathan Hung created HADOOP-14828:
--------------------------------------
Summary: RetryUpToMaximumTimeWithFixedSleep is not bounded by maximum time
Key: HADOOP-14828
URL: https://issues.apache.org/jira/browse/HADOOP-14828
Project: Hadoop Common
Issue Type: Bug
Reporter: Jonathan Hung
In RetryPolicies.java, RetryUpToMaximumTimeWithFixedSleep is converted to a RetryUpToMaximumCountWithFixedSleep, whose count is the maxTime / sleepTime: {noformat} public RetryUpToMaximumTimeWithFixedSleep(long maxTime, long sleepTime,
TimeUnit timeUnit) {
super((int) (maxTime / sleepTime), sleepTime, timeUnit);
this.maxTime = maxTime;
this.timeUnit = timeUnit;
}
{noformat}
But if retries take a long time, then the maxTime passed to the RetryUpToMaximumTimeWithFixedSleep is exceeded.
As an example, while doing NM restarts, we saw an issue where the NMProxy creates a retry policy which specifies a maximum wait time of 15 minutes and a 10 sec interval (which is converted to a MaximumCount policy with 15 min / 10 sec = 90 tries). But each NMProxy retry policy invokes o.a.h.ipc.Client's retry policy: {noformat} if (connectionRetryPolicy == null) {
final int max = conf.getInt(
CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_KEY,
CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_DEFAULT);
final int retryInterval = conf.getInt(
CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_RETRY_INTERVAL_KEY,
CommonConfigurationKeysPublic
.IPC_CLIENT_CONNECT_RETRY_INTERVAL_DEFAULT);
connectionRetryPolicy = RetryPolicies.retryUpToMaximumCountWithFixedSleep(
max, retryInterval, TimeUnit.MILLISECONDS);
}{noformat}
So the time it takes the NMProxy to fail is actually (90 retries) * (10 sec NMProxy interval + o.a.h.ipc.Client retry time). In the default case, ipc client retries 10 times with a 1 sec interval, meaning the time it takes for NMProxy to fail is (90)(10 sec + 10 sec) = 30 min instead of the 15 min specified by NMProxy configuration.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org