You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by GitBox <gi...@apache.org> on 2020/03/12 12:27:07 UTC

[GitHub] [submarine] pingsutw opened a new pull request #226: SUBMARINE-84. The environment variable TF_CONFIG hard coded port number "8000" will cause distributed training job's worker fail to start

pingsutw opened a new pull request #226: SUBMARINE-84. The environment variable TF_CONFIG hard coded port number "8000" will cause distributed training job's worker fail to start
URL: https://github.com/apache/submarine/pull/226
 
 
   ### What is this PR for?
   When there's no network virtualization solution like Calico deployed, the hard-coded 8000 port conflicts may cause worker/ps container gRPC server fails to start if they were allocated to one host and use host network.
   
   ### What type of PR is it?
   [Bug Fix]
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/SUBMARINE-84
   
   ### How should this be tested?
   https://travis-ci.org/github/pingsutw/hadoop-submarine/builds/661489560
   https://github.com/pingsutw/hadoop-submarine/actions/runs/54296809
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * Does the licenses files need update? No
   * Is there breaking changes for older versions? No
   * Does this needs documentation? No
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] pingsutw commented on issue #226: SUBMARINE-84. Set random port in TF_CONFIG

Posted by GitBox <gi...@apache.org>.
pingsutw commented on issue #226: SUBMARINE-84. Set random port in  TF_CONFIG
URL: https://github.com/apache/submarine/pull/226#issuecomment-598194732
 
 
   @liuxunorg Thanks for the reminder. update it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] tangzhankun commented on a change in pull request #226: SUBMARINE-84. Set random port in TF_CONFIG

Posted by GitBox <gi...@apache.org>.
tangzhankun commented on a change in pull request #226: SUBMARINE-84. Set random port in  TF_CONFIG
URL: https://github.com/apache/submarine/pull/226#discussion_r393466840
 
 

 ##########
 File path: submarine-server/server-submitter/submitter-yarnservice/src/main/java/org/apache/submarine/server/submitter/yarnservice/tensorflow/TensorFlowConfigEnvGenerator.java
 ##########
 @@ -19,15 +19,18 @@
 
 package org.apache.submarine.server.submitter.yarnservice.tensorflow;
 
+import java.io.IOException;
+import org.apache.hadoop.net.ServerSocketUtil;
 import org.apache.submarine.commons.runtime.conf.Envs;
 import org.apache.submarine.server.submitter.yarnservice.YarnServiceUtils;
 
 public class TensorFlowConfigEnvGenerator {
 
   public static String getTFConfigEnv(String componentName, int nWorkers,
-      int nPs, String serviceName, String userName, String domain) {
+      int nPs, String serviceName, String userName, String domain) throws IOException {
     String commonEndpointSuffix = YarnServiceUtils
-        .getDNSNameCommonSuffix(serviceName, userName, domain, 8000);
+        .getDNSNameCommonSuffix(serviceName, userName, domain,
 
 Review comment:
   Yeah. I agree that let's leave it and get back to this when needed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [submarine] pingsutw commented on a change in pull request #226: SUBMARINE-84. Set random port in TF_CONFIG

Posted by GitBox <gi...@apache.org>.
pingsutw commented on a change in pull request #226: SUBMARINE-84. Set random port in  TF_CONFIG
URL: https://github.com/apache/submarine/pull/226#discussion_r392250288
 
 

 ##########
 File path: submarine-server/server-submitter/submitter-yarnservice/src/main/java/org/apache/submarine/server/submitter/yarnservice/tensorflow/TensorFlowConfigEnvGenerator.java
 ##########
 @@ -19,15 +19,18 @@
 
 package org.apache.submarine.server.submitter.yarnservice.tensorflow;
 
+import java.io.IOException;
+import org.apache.hadoop.net.ServerSocketUtil;
 import org.apache.submarine.commons.runtime.conf.Envs;
 import org.apache.submarine.server.submitter.yarnservice.YarnServiceUtils;
 
 public class TensorFlowConfigEnvGenerator {
 
   public static String getTFConfigEnv(String componentName, int nWorkers,
-      int nPs, String serviceName, String userName, String domain) {
+      int nPs, String serviceName, String userName, String domain) throws IOException {
     String commonEndpointSuffix = YarnServiceUtils
-        .getDNSNameCommonSuffix(serviceName, userName, domain, 8000);
+        .getDNSNameCommonSuffix(serviceName, userName, domain,
 
 Review comment:
   https://github.com/apache/submarine/blob/ebad135c1e44bc75f76c41fbaaeece87ed65d144/submarine-server/server-submitter/submitter-yarnservice/src/main/java/org/apache/submarine/server/submitter/yarnservice/YarnServiceJobSubmitter.java#L103-L114
   @yuanzac, @liuxunorg 
   
   since we use Yarn Service to submit the job, we couldn't control NM behavior.
   so that we couldn't generate port in the NM node.
   Please correct me if I'm wrong.
   
   Maybe we could just close this pr since yarn service runtime is deprecated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] liuxunorg commented on issue #226: SUBMARINE-84. The environment variable TF_CONFIG hard coded port number "8000" will cause distributed training job's worker fail to start

Posted by GitBox <gi...@apache.org>.
liuxunorg commented on issue #226: SUBMARINE-84. The environment variable TF_CONFIG hard coded port number "8000" will cause distributed training job's worker fail to start
URL: https://github.com/apache/submarine/pull/226#issuecomment-598171034
 
 
   @pingsutw PR title is too long.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] liuxunorg commented on a change in pull request #226: SUBMARINE-84. Set random port in TF_CONFIG

Posted by GitBox <gi...@apache.org>.
liuxunorg commented on a change in pull request #226: SUBMARINE-84. Set random port in  TF_CONFIG
URL: https://github.com/apache/submarine/pull/226#discussion_r392074939
 
 

 ##########
 File path: submarine-server/server-submitter/submitter-yarnservice/src/main/java/org/apache/submarine/server/submitter/yarnservice/tensorflow/TensorFlowConfigEnvGenerator.java
 ##########
 @@ -19,15 +19,18 @@
 
 package org.apache.submarine.server.submitter.yarnservice.tensorflow;
 
+import java.io.IOException;
+import org.apache.hadoop.net.ServerSocketUtil;
 import org.apache.submarine.commons.runtime.conf.Envs;
 import org.apache.submarine.server.submitter.yarnservice.YarnServiceUtils;
 
 public class TensorFlowConfigEnvGenerator {
 
   public static String getTFConfigEnv(String componentName, int nWorkers,
-      int nPs, String serviceName, String userName, String domain) {
+      int nPs, String serviceName, String userName, String domain) throws IOException {
     String commonEndpointSuffix = YarnServiceUtils
-        .getDNSNameCommonSuffix(serviceName, userName, domain, 8000);
+        .getDNSNameCommonSuffix(serviceName, userName, domain,
 
 Review comment:
   Can we limit the port range to avoid this problem?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] pingsutw commented on a change in pull request #226: SUBMARINE-84. Set random port in TF_CONFIG

Posted by GitBox <gi...@apache.org>.
pingsutw commented on a change in pull request #226: SUBMARINE-84. Set random port in  TF_CONFIG
URL: https://github.com/apache/submarine/pull/226#discussion_r392085834
 
 

 ##########
 File path: submarine-server/server-submitter/submitter-yarnservice/src/main/java/org/apache/submarine/server/submitter/yarnservice/tensorflow/TensorFlowConfigEnvGenerator.java
 ##########
 @@ -19,15 +19,18 @@
 
 package org.apache.submarine.server.submitter.yarnservice.tensorflow;
 
+import java.io.IOException;
+import org.apache.hadoop.net.ServerSocketUtil;
 import org.apache.submarine.commons.runtime.conf.Envs;
 import org.apache.submarine.server.submitter.yarnservice.YarnServiceUtils;
 
 public class TensorFlowConfigEnvGenerator {
 
   public static String getTFConfigEnv(String componentName, int nWorkers,
-      int nPs, String serviceName, String userName, String domain) {
+      int nPs, String serviceName, String userName, String domain) throws IOException {
     String commonEndpointSuffix = YarnServiceUtils
-        .getDNSNameCommonSuffix(serviceName, userName, domain, 8000);
+        .getDNSNameCommonSuffix(serviceName, userName, domain,
 
 Review comment:
   Thanks @yuanzac and @LinhaoZhu for the review.
   make sense. I'm going to update it. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] pingsutw commented on a change in pull request #226: SUBMARINE-84. Set random port in TF_CONFIG

Posted by GitBox <gi...@apache.org>.
pingsutw commented on a change in pull request #226: SUBMARINE-84. Set random port in  TF_CONFIG
URL: https://github.com/apache/submarine/pull/226#discussion_r393471189
 
 

 ##########
 File path: submarine-server/server-submitter/submitter-yarnservice/src/main/java/org/apache/submarine/server/submitter/yarnservice/tensorflow/TensorFlowConfigEnvGenerator.java
 ##########
 @@ -19,15 +19,18 @@
 
 package org.apache.submarine.server.submitter.yarnservice.tensorflow;
 
+import java.io.IOException;
+import org.apache.hadoop.net.ServerSocketUtil;
 import org.apache.submarine.commons.runtime.conf.Envs;
 import org.apache.submarine.server.submitter.yarnservice.YarnServiceUtils;
 
 public class TensorFlowConfigEnvGenerator {
 
   public static String getTFConfigEnv(String componentName, int nWorkers,
-      int nPs, String serviceName, String userName, String domain) {
+      int nPs, String serviceName, String userName, String domain) throws IOException {
     String commonEndpointSuffix = YarnServiceUtils
-        .getDNSNameCommonSuffix(serviceName, userName, domain, 8000);
+        .getDNSNameCommonSuffix(serviceName, userName, domain,
 
 Review comment:
   @tangzhankun Thanks for the reply.
   Let me close it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [submarine] pingsutw closed pull request #226: SUBMARINE-84. Set random port in TF_CONFIG

Posted by GitBox <gi...@apache.org>.
pingsutw closed pull request #226: SUBMARINE-84. Set random port in  TF_CONFIG
URL: https://github.com/apache/submarine/pull/226
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [submarine] yuanzac commented on a change in pull request #226: SUBMARINE-84. Set random port in TF_CONFIG

Posted by GitBox <gi...@apache.org>.
yuanzac commented on a change in pull request #226: SUBMARINE-84. Set random port in  TF_CONFIG
URL: https://github.com/apache/submarine/pull/226#discussion_r392007909
 
 

 ##########
 File path: submarine-server/server-submitter/submitter-yarnservice/src/main/java/org/apache/submarine/server/submitter/yarnservice/tensorflow/TensorFlowConfigEnvGenerator.java
 ##########
 @@ -19,15 +19,18 @@
 
 package org.apache.submarine.server.submitter.yarnservice.tensorflow;
 
+import java.io.IOException;
+import org.apache.hadoop.net.ServerSocketUtil;
 import org.apache.submarine.commons.runtime.conf.Envs;
 import org.apache.submarine.server.submitter.yarnservice.YarnServiceUtils;
 
 public class TensorFlowConfigEnvGenerator {
 
   public static String getTFConfigEnv(String componentName, int nWorkers,
-      int nPs, String serviceName, String userName, String domain) {
+      int nPs, String serviceName, String userName, String domain) throws IOException {
     String commonEndpointSuffix = YarnServiceUtils
-        .getDNSNameCommonSuffix(serviceName, userName, domain, 8000);
+        .getDNSNameCommonSuffix(serviceName, userName, domain,
 
 Review comment:
   From my understanding, tensorflow config is generated before NM starts TF job worker containers. So the random port still may conflict with the ports on NM node. If the port is generated by NM, then the port conflictions would be resolved. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org