You are viewing a plain text version of this content. The canonical link for it is here.
Posted to codereview@trafodion.apache.org by mashengchen <gi...@git.apache.org> on 2017/12/14 10:40:34 UTC
[GitHub] incubator-trafodion pull request #1344: TRAFODION-2844 add strategy to dcsse...
GitHub user mashengchen opened a pull request:
https://github.com/apache/incubator-trafodion/pull/1344
TRAFODION-2844 add strategy to dcsserver for restart moxsrvr
when mxosrvr down ,dcsserver will restart it , and if mxosrvr down a lot of times in a period of time , there should reject the restart. if time between 6 times age(default setting) and this time are a very long time, dcsserver should allow the restart
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mashengchen/incubator-trafodion mxosrvrRestart
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/1344.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1344
----
commit d41654fe70fd67740651da503e3c73e1b64e4168
Author: aven <sh...@esgyn.cn>
Date: 2017-12-14T10:35:54Z
TRAFODION-2844 add strategy to dcsserver for restart moxsrvr
----
---
[GitHub] incubator-trafodion pull request #1344: TRAFODION-2844 add strategy to dcsse...
Posted by kevinxu021 <gi...@git.apache.org>.
Github user kevinxu021 commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/1344#discussion_r157166199
--- Diff: dcs/src/main/java/org/trafodion/dcs/util/RetryCounter.java ---
@@ -20,54 +20,79 @@
*/
package org.trafodion.dcs.util;
+import java.util.LinkedList;
+import java.util.Queue;
import java.util.concurrent.TimeUnit;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
public class RetryCounter {
- private static final Log LOG = LogFactory.getLog(RetryCounter.class);
- private final int maxRetries;
- private int retriesRemaining;
- private final int retryIntervalMillis;
- private final TimeUnit timeUnit;
+ private static final Log LOG = LogFactory.getLog(RetryCounter.class);
+ private final int maxRetries;
+ private int retriesRemaining;
+ private int retryInterval;
+ private Queue<Long> queue;
- public RetryCounter(int maxRetries,
- int retryIntervalMillis, TimeUnit timeUnit) {
- this.maxRetries = maxRetries;
- this.retriesRemaining = maxRetries;
- this.retryIntervalMillis = retryIntervalMillis;
- this.timeUnit = timeUnit;
- }
+ public RetryCounter(int maxRetries, int retryInterval) {
--- End diff --
Not very reasonable to remove last parameter.
---
[GitHub] incubator-trafodion pull request #1344: TRAFODION-2844 add strategy to dcsse...
Posted by kevinxu021 <gi...@git.apache.org>.
Github user kevinxu021 commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/1344#discussion_r157153197
--- Diff: dcs/src/main/java/org/trafodion/dcs/Constants.java ---
@@ -201,6 +201,12 @@
/** Default value for user program restart handler retry interval millis */
public static final int DEFAULT_DCS_SERVER_USER_PROGRAM_RESTART_HANDLER_RETRY_INTERVAL_MILLIS = 5000;
+ /** Configuration key for user program restart handler retry interval millis */
+ public static final String DCS_SERVER_USER_PROGRAM_RESTART_HANDLER_RETRY_INTERVAL_MINUTES = "dcs.server.user.program.restart.handler.retry.interval.minutes";
--- End diff --
I would suggest to be "dcs.server.user.program.restart.handler.retry.timeout.minutes"
Add this new setting into dcs-default.xml
---
[GitHub] incubator-trafodion pull request #1344: TRAFODION-2844 add strategy to dcsse...
Posted by hegdean <gi...@git.apache.org>.
Github user hegdean commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/1344#discussion_r157074271
--- Diff: dcs/src/main/java/org/trafodion/dcs/server/ServerManager.java ---
@@ -255,33 +259,17 @@ private void cleanupZk() {
CountDownLatch startSignal = new CountDownLatch(1);
RetryCounter retryCounter;
- public void reset() {
- startSignal.countDown();
- startSignal = new CountDownLatch(1);
- boolean isRunning = this.serverMonitor.monitor();
- String nid = this.serverMonitor.nid;
- String pid = this.serverMonitor.pid;
-
- if (isRunning) {
- LOG.info("mxosrvr " + nid + "," + pid + " still running");
- this.retryCounter.resetAttemptTimes();
- } else {
- LOG.info("mxosrvr " + nid + "," + pid + " exited, restarting, restart attempt time : "
- + this.retryCounter.getAttemptTimes());
- }
- }
-
public ServerHandler(Configuration conf ,int childInstance) {
int maxRestartAttempts = conf.getInt(Constants.DCS_SERVER_USER_PROGRAM_RESTART_HANDLER_ATTEMPTS,
Constants.DEFAULT_DCS_SERVER_USER_PROGRAM_RESTART_HANDLER_ATTEMPTS);
- int retryIntervalMillis = conf.getInt(
- Constants.DCS_SERVER_USER_PROGRAM_RESTART_HANDLER_RETRY_INTERVAL_MILLIS,
- Constants.DEFAULT_DCS_SERVER_USER_PROGRAM_RESTART_HANDLER_RETRY_INTERVAL_MILLIS);
+ int retryIntervalMinutes = conf.getInt(
--- End diff --
Why did we change from millis to minutes. Millis is more granular and you can achieve mins from millis
---
[GitHub] incubator-trafodion pull request #1344: TRAFODION-2844 add strategy to dcsse...
Posted by kevinxu021 <gi...@git.apache.org>.
Github user kevinxu021 commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/1344#discussion_r157156557
--- Diff: dcs/src/main/java/org/trafodion/dcs/server/ServerManager.java ---
@@ -242,7 +246,7 @@ private void cleanupZk() {
zkc.delete(registeredPath, -1);
} catch (Exception e) {
--- End diff --
Make the changes for the exception type which should be real exception instead of a general exception.
---
[GitHub] incubator-trafodion pull request #1344: TRAFODION-2844 add strategy to dcsse...
Posted by kevinxu021 <gi...@git.apache.org>.
Github user kevinxu021 commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/1344#discussion_r157155029
--- Diff: dcs/src/main/java/org/trafodion/dcs/server/ServerManager.java ---
@@ -242,7 +246,7 @@ private void cleanupZk() {
zkc.delete(registeredPath, -1);
} catch (Exception e) {
e.printStackTrace();
- LOG.debug(e);
+ LOG.error(e);
--- End diff --
LOG.error(e.getMessage(), e);
---