You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2021/05/03 23:48:30 UTC

[GitHub] [samza] cameronlee314 opened a new pull request #1499: SAMZA-2654: Allow coordinator url port to be configurable

cameronlee314 opened a new pull request #1499:
URL: https://github.com/apache/samza/pull/1499


   Issues: Currently, the port for the job coordinator url (for accessing job model) is always dynamically allocated. In some cases, it is helpful to be able to hardcode a port.
   
   Changes: When `JobModelManager` is setting up the `HttpServer` which serves the coordinator url, read a config to set the port.
   
   Tests:
   1. `./bin/integration-tests.sh /tmp/samza-tests yarn-integration-tests --nopassword`
   2. Ran a job with `ProcessJobFactory` and verified the job started up.
   
   API changes:
   Set the config `cluster-manager.jobcoordinator.url.port` to the port number to use for the coordinator url. This is backwards compatible because the default is to use a value of 0 (to dynamically allocate an unused port), and a value of 0 was the value used before this change was made.
   
   More context about how this can be used: When using Kubernetes, ports only need to be unique within a pod, so port conflict issues are less likely when hardcoding ports. Being able to hardcode a port makes it easier to build the coordinator url since Kubernetes can also set up a "service name" for the coordinator instead of using the physical host name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] kw2542 edited a comment on pull request #1499: SAMZA-2654: Allow coordinator url port to be configurable

Posted by GitBox <gi...@apache.org>.
kw2542 edited a comment on pull request #1499:
URL: https://github.com/apache/samza/pull/1499#issuecomment-831638558


   > More context about how this can be used:
   > When using Kubernetes, we can create a "Service" which provides a consistent coordinator url, even when the job coordinator switches pods. The port is one piece of information needed to create this "Service". If we want to use a Kubernetes controller to create the "Service" (per the Kubernetes operator pattern), then we don't need to retrieve the port from the job coordinator in order to create the "Service".
   
   This may not bring a lot of benefit, especially for portable beam jobs, as there are multiple port numbers need to be communicated between JC Pod and Worker Pod and preconfigure all of them may not be practical.
   
   Is this just for ease of creating "Service" or w/o predefined port number, we cannot create "Service" at all?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] cameronlee314 commented on a change in pull request #1499: SAMZA-2654: Allow coordinator url port to be configurable

Posted by GitBox <gi...@apache.org>.
cameronlee314 commented on a change in pull request #1499:
URL: https://github.com/apache/samza/pull/1499#discussion_r626168702



##########
File path: samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java
##########
@@ -280,4 +289,8 @@ public boolean getJmxEnabledOnJobCoordinator() {
       return true;
     }
   }
+
+  public int getCoordinatorUrlPort() {

Review comment:
       Updated the javadoc to reflect this. Thanks for the suggestion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] cameronlee314 commented on pull request #1499: SAMZA-2654: Allow coordinator url port to be configurable

Posted by GitBox <gi...@apache.org>.
cameronlee314 commented on pull request #1499:
URL: https://github.com/apache/samza/pull/1499#issuecomment-831634062


   > What happens if the port requested is not available on the host? I'd assume the server creation fails which forces us to use any available port on the machine or bail out rendering this configuration a best-effort.
   
   This is a "best-effort" configuration. The startup will fail if the port is already in use. The port uniqueness should be ensured before configuring the port. Many use cases will want to continue to use dynamic port allocation to prevent port conflicts.
   
   I updated the PR description to include more details about the port conflicts and usage in Kubernetes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] kw2542 commented on a change in pull request #1499: SAMZA-2654: Allow coordinator url port to be configurable

Posted by GitBox <gi...@apache.org>.
kw2542 commented on a change in pull request #1499:
URL: https://github.com/apache/samza/pull/1499#discussion_r626144722



##########
File path: samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java
##########
@@ -280,4 +289,8 @@ public boolean getJmxEnabledOnJobCoordinator() {
       return true;
     }
   }
+
+  public int getCoordinatorUrlPort() {

Review comment:
       Can we document that this is for experimental purpose and evolving, therefore, backward compatibility is not guaranteed and maybe removed any time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] kw2542 commented on pull request #1499: SAMZA-2654: Allow coordinator url port to be configurable

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #1499:
URL: https://github.com/apache/samza/pull/1499#issuecomment-831638558


   > More context about how this can be used:
   > When using Kubernetes, we can create a "Service" which provides a consistent coordinator url, even when the job coordinator switches pods. The port is one piece of information needed to create this "Service". If we want to use a Kubernetes controller to create the "Service" (per the Kubernetes operator pattern), then we don't need to retrieve the port from the job coordinator in order to create the "Service".
   
   This may not bring a lot of benefit, especially for portable beam jobs, as there are multiple port numbers need to be communicated between JC Pod and Worker Pod and preconfigure all of them may not be practical.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] cameronlee314 commented on pull request #1499: SAMZA-2654: Allow coordinator url port to be configurable

Posted by GitBox <gi...@apache.org>.
cameronlee314 commented on pull request #1499:
URL: https://github.com/apache/samza/pull/1499#issuecomment-832298098


   > > More context about how this can be used:
   > > When using Kubernetes, we can create a "Service" which provides a consistent coordinator url, even when the job coordinator switches pods. The port is one piece of information needed to create this "Service". If we want to use a Kubernetes controller to create the "Service" (per the Kubernetes operator pattern), then we don't need to retrieve the port from the job coordinator in order to create the "Service".
   > 
   > This may not bring a lot of benefit, especially for portable beam jobs, as there are multiple port numbers need to be communicated between JC Pod and Worker Pod and preconfigure all of them may not be practical.
   > 
   > Is this just for ease of creating "Service" or w/o predefined port number, we cannot create "Service" at all?
   
   It might be hard to statically specify many ports, but this does give a bit more flexibility in some cases. If we only have the option to dynamically allocate ports, then there is an ordering dependency for resource management between the job coordinator and the worker containers, because the worker containers need to know the port that the job coordinator is using. This configuration does need to be used with care, because it does introduce another failure case (i.e. port conflict) if it is not used properly.
   
   If we dynamically allocate the port, then the job coordinator needs to expose the port first before the "Service" can be created. This creates more of an ordering dependency. The job coordinator would need to trigger the creation of the "Service" or the job coordinator needs to pass the port to somewhere (e.g. coordinator stream, config map) before something else can create the "Service".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] cameronlee314 merged pull request #1499: SAMZA-2654: Allow coordinator url port to be configurable

Posted by GitBox <gi...@apache.org>.
cameronlee314 merged pull request #1499:
URL: https://github.com/apache/samza/pull/1499


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org