You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/12/01 03:08:49 UTC

[GitHub] [dolphinscheduler] JunjianS opened a new issue #7094: [Bug] [Module Name] when the first RM configured down,ds can't get the active RM correctly

JunjianS opened a new issue #7094:
URL: https://github.com/apache/dolphinscheduler/issues/7094


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   when RM HA is enabled,set yarnHaIps="hadoop01,hadoop02", if thadoop01 is down,ds can't get the active RM correctly,  i think this is a bug in 2.0.0 and 1.3.9, 
   
   method getAcitveRMName in 1.3.9
   
   ```
           public static String getAcitveRMName(String rmIds) {
   
               String[] rmIdArr = rmIds.split(Constants.COMMA);
   
               int activeResourceManagerPort = PropertyUtils.getInt(Constants.HADOOP_RESOURCE_MANAGER_HTTPADDRESS_PORT, 8088);
   
               String yarnUrl = "http://%s:" + activeResourceManagerPort + "/ws/v1/cluster/info";
   
               String state = null;
               try {
                   /**
                    * send http get request to rm1
                    */
                   state = getRMState(String.format(yarnUrl, rmIdArr[0]));
   
                   if (Constants.HADOOP_RM_STATE_ACTIVE.equals(state)) {
                       return rmIdArr[0];
                   } else if (Constants.HADOOP_RM_STATE_STANDBY.equals(state)) {
                       state = getRMState(String.format(yarnUrl, rmIdArr[1]));
                       if (Constants.HADOOP_RM_STATE_ACTIVE.equals(state)) {
                           return rmIdArr[1];
                       }
                   } else {
                       return null;
                   }
               } catch (Exception e) {
                   state = getRMState(String.format(yarnUrl, rmIdArr[1]));
                   if (Constants.HADOOP_RM_STATE_ACTIVE.equals(state)) {
                       return rmIdArr[0];
                   }
               }
               return null;
           }
   ```
   
   
   methos getAcitveRMName in 2.0.0
   
   ```
          public static String getAcitveRMName(String rmIds) {
   
               String[] rmIdArr = rmIds.split(Constants.COMMA);
   
               int activeResourceManagerPort = PropertyUtils.getInt(Constants.HADOOP_RESOURCE_MANAGER_HTTPADDRESS_PORT, 8088);
   
               String yarnUrl = "http://%s:" + activeResourceManagerPort + "/ws/v1/cluster/info";
   
               try {
   
                   /**
                    * send http get request to rm
                    */
   
                   for (String rmId : rmIdArr) {
                       String state = getRMState(String.format(yarnUrl, rmId));
                       if (Constants.HADOOP_RM_STATE_ACTIVE.equals(state)) {
                           return rmId;
                       }
                   }
   
               } catch (Exception e) {
                   logger.error("yarn ha application url generation failed, message:{}", e.getMessage());
               }
               return null;
           }
   ```
   
   ### What you expected to happen
   
   when RM HA is enabled,i think ds should get active RM correctly ,as long as one RM works,even when all the others are down .
   
   
   
   ### How to reproduce
   
   when RM HA is enabled,set yarnHaIps="hadoop01,hadoop02", make thadoop01  down,ds can't get the active RM correctly。
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   2.0.0
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] KingSpring commented on issue #7094: [Bug] [dolphinscheduler-common] when the first RM configured down,ds can't get the active RM correctly

Posted by GitBox <gi...@apache.org>.
KingSpring commented on issue #7094:
URL: https://github.com/apache/dolphinscheduler/issues/7094#issuecomment-986577429


   U may refrence this:
   1.env:
   yarn ha :hadoop47,hadoop48
   ds:1.3.9
   
   cat ./conf/common.properties 
   # resourcemanager port, the default value is 8088 if not specified
   resource.manager.httpaddress.port=8088
   
   # if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
   yarn.resourcemanager.ha.rm.ids=hadoop47,hadoop48
   
   # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
   yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
   
   2.problem
   [ERROR] 2021-12-06 10:07:14.343  - [taskAppId=TASK-12-56-262]:[418] - yarn applications: application_1638416574447_0083 , query status failed, exception:{}
   java.lang.NullPointerException: null
       at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423)
       at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404)
       at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230)
       at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101)
       at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   
   
   3.propose solution
   
   org.apache.dolphinscheduler.common.utils.HadoopUtils
   public static String getAcitveRMName(String rmIds) {
   
               String[] rmIdArr = rmIds.split(Constants.COMMA);
   
               int activeResourceManagerPort = PropertyUtils.getInt(Constants.HADOOP_RESOURCE_MANAGER_HTTPADDRESS_PORT, 8088);
   
               String yarnUrl = "http://%s:" + activeResourceManagerPort + "/ws/v1/cluster/info";
   
               String state = null;
               try {
                   /**
                    * send http get request to rm1
                    */
                   state = getRMState(String.format(yarnUrl, rmIdArr[0]));
   
                   if (Constants.HADOOP_RM_STATE_ACTIVE.equals(state)) {
                       return rmIdArr[0];
                   } else  {
                       state = getRMState(String.format(yarnUrl, rmIdArr[1]));
                       if (Constants.HADOOP_RM_STATE_ACTIVE.equals(state)) {
                           return rmIdArr[1];
                       }
                   }
                   return null;
   
               } catch (Exception e) {
                   state = getRMState(String.format(yarnUrl, rmIdArr[1]));
                   if (Constants.HADOOP_RM_STATE_ACTIVE.equals(state)) {
                       return rmIdArr[1];
                   }
               }
               return null;
           }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] JunjianS closed issue #7094: [Bug] [dolphinscheduler-common] when the first RM configured down,ds can't get the active RM correctly

Posted by GitBox <gi...@apache.org>.
JunjianS closed issue #7094:
URL: https://github.com/apache/dolphinscheduler/issues/7094


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #7094: [Bug] [Module Name] when the first RM configured down,ds can't get the active RM correctly

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #7094:
URL: https://github.com/apache/dolphinscheduler/issues/7094#issuecomment-983243536


   Hi:
   * Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can subscribe to the developer's email,Mail subscription steps reference https://dolphinscheduler.apache.org/en-us/community/development/subscribe.html ,Then write the issue URL in the email content and send question to dev@dolphinscheduler.apache.org.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org