You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@celeborn.apache.org by "cfmcgrady (via GitHub)" <gi...@apache.org> on 2023/04/18 10:42:38 UTC

[GitHub] [incubator-celeborn] cfmcgrady opened a new pull request, #1436: [CELEBORN-534] Respect the user's configured master host settings

cfmcgrady opened a new pull request, #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     - Make sure the PR title start w/ a JIRA ticket, e.g. '[CELEBORN-XXXX] Your PR title ...'.
     - Be sure to keep the PR description updated to reflect all changes.
     - Please write your PR title to summarize what this PR proposes.
     - If possible, provide a concise example to reproduce the issue for a faster review.
   -->
   
   ### What changes were proposed in this pull request?
   
   I am looking to deploy Apache Celeborn on Kubernetes in host network mode, with the aim of facilitating communication between Spark and Celeborn through physical IP addresses. Currently, the master node retrieves the container hostname by default, rather than taking into consideration the address settings configured by the user.
   
   
   here are my `celeborn-defaults.conf`
   
   ```
   celeborn.master.endpoints=192.168.1.2,192.168.1.3,192.168.1.4
   celeborn.ha.master.node.0.host=192.168.1.2
   celeborn.ha.master.node.1.host=192.168.1.3
   celeborn.ha.master.node.2.host=192.168.1.4
   celeborn.ha.master.node.0.ratis.host=192.168.1.2
   celeborn.ha.master.node.1.ratis.host=192.168.1.3
   celeborn.ha.master.node.2.ratis.host=192.168.1.4
   ```
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] RexXiong commented on pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "RexXiong (via GitHub)" <gi...@apache.org>.
RexXiong commented on PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#issuecomment-1514104688

   > have verified @RexXiong
   > 
   > ```
   > 23/04/19 11:51:57,095 INFO [main] RssHARetryClient: connect to master 192.168.1.2:9097.
   > 23/04/19 11:51:57,095 INFO [app-heartbeat] RssHARetryClient: connect to master 192.168.1.2:9097.
   > 23/04/19 11:51:57,202 INFO [app-heartbeat] RssHARetryClient: Fail over to master 192.168.1.4:9097.
   > 23/04/19 11:51:57,205 INFO [main] RssHARetryClient: Fail over to master 192.168.1.4:9097.
   > ```
   
   Thanks. LGTM! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] codecov[bot] commented on pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#issuecomment-1512867811

   ## [Codecov](https://codecov.io/gh/apache/incubator-celeborn/pull/1436?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1436](https://codecov.io/gh/apache/incubator-celeborn/pull/1436?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5121bb5) into [main](https://codecov.io/gh/apache/incubator-celeborn/commit/8be82548e1eb502423d305a63021400a8b21c547?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (8be8254) will **increase** coverage by `0.19%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff             @@
   ##             main    #1436      +/-   ##
   ==========================================
   + Coverage   45.54%   45.73%   +0.19%     
   ==========================================
     Files         167      167              
     Lines       10531    10531              
     Branches     1044     1044              
   ==========================================
   + Hits         4795     4815      +20     
   + Misses       5399     5382      -17     
   + Partials      337      334       -3     
   ```
   
   
   [see 4 files with indirect coverage changes](https://codecov.io/gh/apache/incubator-celeborn/pull/1436/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#issuecomment-1512860923

   ping @pan3793 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] cfmcgrady commented on pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "cfmcgrady (via GitHub)" <gi...@apache.org>.
cfmcgrady commented on PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#issuecomment-1514104603

   FYI. cc @RexXiong 
   
   ```diff
   --- a/common/src/main/java/org/apache/celeborn/common/haclient/RssHARetryClient.java
   +++ b/common/src/main/java/org/apache/celeborn/common/haclient/RssHARetryClient.java
   @@ -167,6 +167,7 @@ public class RssHARetryClient {
      private boolean shouldRetry(@Nullable RpcEndpointRef oldRef, Throwable e) {
        // It will always throw rss exception , so we need to get the cause
        // 'RssException: Exception thrown in awaitResult'
   +    e.printStackTrace();
        if (e.getCause() instanceof MasterNotLeaderException) {
          MasterNotLeaderException exception = (MasterNotLeaderException) e.getCause();
          String leaderAddr = exception.getSuggestedLeaderAddress();
   ```
   
   ```
   Caused by: org.apache.celeborn.common.haclient.MasterNotLeaderException: Master:192.168.1.2:9097 is not the leader. Suggested leader is Master:192.168.1.4:9097.
   	at org.apache.celeborn.service.deploy.master.clustermeta.ha.HAHelper.checkShouldProcess(HAHelper.java:47)
   	at org.apache.celeborn.service.deploy.master.Master.executeWithLeaderChecker(Master.scala:203)
   	at org.apache.celeborn.service.deploy.master.Master$$anonfun$receiveAndReply$1.applyOrElse(Master.scala:228)
   	at org.apache.celeborn.common.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
   	at org.apache.celeborn.common.rpc.netty.Inbox.safelyCall(Inbox.scala:222)
   	at org.apache.celeborn.common.rpc.netty.Inbox.process(Inbox.scala:110)
   	at org.apache.celeborn.common.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:229)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:750)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] RexXiong commented on pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "RexXiong (via GitHub)" <gi...@apache.org>.
RexXiong commented on PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#issuecomment-1514081761

   What cachedLeaderPeerRpcEndpoint will return when client connect to the follower? (Perhaps We would better keep them same style) @cfmcgrady 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] cfmcgrady commented on a diff in pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "cfmcgrady (via GitHub)" <gi...@apache.org>.
cfmcgrady commented on code in PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#discussion_r1169995176


##########
master/src/main/scala/org/apache/celeborn/service/deploy/master/clustermeta/ha/MasterNode.scala:
##########
@@ -79,9 +86,22 @@ object MasterNode {
       this
     }
 
-    def build: MasterNode = MasterNode(
-      nodeId,
-      new InetSocketAddress(ratisHost, ratisPort),
-      new InetSocketAddress(rpcHost, rpcPort))
+    def build: MasterNode = MasterNode(nodeId, ratisHost, ratisPort, rpcHost, rpcPort)
+  }
+
+  private def createSocketAddr(host: String, port: Int): InetSocketAddress = {
+    val socketAddr: InetSocketAddress =
+      Try(NetUtils.createSocketAddr(host, port)) match {

Review Comment:
   updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] RexXiong merged pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "RexXiong (via GitHub)" <gi...@apache.org>.
RexXiong merged PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] pan3793 commented on a diff in pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#discussion_r1169974103


##########
master/src/main/scala/org/apache/celeborn/service/deploy/master/clustermeta/ha/MasterNode.scala:
##########
@@ -79,9 +86,22 @@ object MasterNode {
       this
     }
 
-    def build: MasterNode = MasterNode(
-      nodeId,
-      new InetSocketAddress(ratisHost, ratisPort),
-      new InetSocketAddress(rpcHost, rpcPort))
+    def build: MasterNode = MasterNode(nodeId, ratisHost, ratisPort, rpcHost, rpcPort)
+  }
+
+  private def createSocketAddr(host: String, port: Int): InetSocketAddress = {
+    val socketAddr: InetSocketAddress =
+      Try(NetUtils.createSocketAddr(host, port)) match {

Review Comment:
   nit: use `try catch` instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] cfmcgrady commented on pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "cfmcgrady (via GitHub)" <gi...@apache.org>.
cfmcgrady commented on PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#issuecomment-1514089386

   > What cachedLeaderPeerRpcEndpoint will return when client connect to the follower? (Perhaps We would better keep them same style) @cfmcgrady
   
   Agreed. btw, do we have any quick way to check this value?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] RexXiong commented on pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "RexXiong (via GitHub)" <gi...@apache.org>.
RexXiong commented on PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#issuecomment-1514093454

   > > What cachedLeaderPeerRpcEndpoint will return when client connect to the follower? (Perhaps We would better keep them same style) @cfmcgrady
   > 
   > Agreed. btw, do we have any quick way to check this value?
   
   connect to the follower directly, then you would find the suggest leader address return for client 
   
   >  private boolean shouldRetry(@Nullable RpcEndpointRef oldRef, Throwable e) {
       // It will always throw rss exception , so we need to get the cause
       // 'RssException: Exception thrown in awaitResult'
       if (e.getCause() instanceof MasterNotLeaderException) {
         MasterNotLeaderException exception = (MasterNotLeaderException) e.getCause();
         String leaderAddr = exception.getSuggestedLeaderAddress();
         if (!leaderAddr.equals(MasterNotLeaderException.LEADER_NOT_PRESENTED)) {
           setRpcEndpointRef(leaderAddr);
         } else {
           LOG.warn("Master leader is not present currently, please check masters' status!");
         }
         return true;
       } else if (e.getCause() instanceof IOException) {
         resetRpcEndpointRef(oldRef);
         return true;
       }
       return false;
     }
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] cfmcgrady commented on pull request #1436: [CELEBORN-534] Respect the user's configured master host settings

Posted by "cfmcgrady (via GitHub)" <gi...@apache.org>.
cfmcgrady commented on PR #1436:
URL: https://github.com/apache/incubator-celeborn/pull/1436#issuecomment-1514096649

   have verified @RexXiong 
   
   ```
   23/04/19 11:51:57,095 INFO [main] RssHARetryClient: connect to master 192.168.1.2:9097.
   23/04/19 11:51:57,095 INFO [app-heartbeat] RssHARetryClient: connect to master 192.168.1.2:9097.
   23/04/19 11:51:57,202 INFO [app-heartbeat] RssHARetryClient: Fail over to master 192.168.1.4:9097.
   23/04/19 11:51:57,205 INFO [main] RssHARetryClient: Fail over to master 192.168.1.4:9097.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org