You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2020/03/31 22:25:46 UTC

[GitHub] [samza] rmatharu opened a new pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

rmatharu opened a new pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#discussion_r401848987
 
 

 ##########
 File path: samza-core/src/main/java/org/apache/samza/startpoint/StartpointManager.java
 ##########
 @@ -158,8 +158,25 @@ public void writeStartpoint(SystemStreamPartition ssp, TaskName taskName, Startp
    * @return {@link Optional} of {@link Startpoint} for the {@link SystemStreamPartition}.
    *         It is empty if it does not exist or if it is too stale.
    */
+  @VisibleForTesting
   public Optional<Startpoint> readStartpoint(SystemStreamPartition ssp) {
-    return readStartpoint(ssp, null);
+    Map<String, byte[]> startpointBytes = readWriteStore.all();
+    // there is no task-name to use as key for the startpoint in this case (only the ssp), so we use a null task-name
+    return readStartpoint(startpointBytes, ssp, null);
+  }
+
+  /**
+   * Returns the last {@link Startpoint} that defines the start position for a {@link SystemStreamPartition} and {@link TaskName}.
+   * @param ssp The {@link SystemStreamPartition} to fetch the {@link Startpoint} for.
+   * @param taskName the {@link TaskName} to fetch the {@link Startpoint} for.
+   * @return {@link Optional} of {@link Startpoint} for the {@link SystemStreamPartition}.
+   *         It is empty if it does not exist or if it is too stale.
+   */
+  @VisibleForTesting
+  public Optional<Startpoint> readStartpoint(SystemStreamPartition ssp, TaskName taskName) {
+    Map<String, byte[]> startpointBytes = readWriteStore.all();
+    // there is no task-name to use as key for the startpoint in this case (only the ssp), so we use a null task-name
 
 Review comment:
   Please correct the comment, task-name can be non-null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#discussion_r401856681
 
 

 ##########
 File path: samza-core/src/main/java/org/apache/samza/startpoint/StartpointManager.java
 ##########
 @@ -158,8 +158,25 @@ public void writeStartpoint(SystemStreamPartition ssp, TaskName taskName, Startp
    * @return {@link Optional} of {@link Startpoint} for the {@link SystemStreamPartition}.
    *         It is empty if it does not exist or if it is too stale.
    */
+  @VisibleForTesting
   public Optional<Startpoint> readStartpoint(SystemStreamPartition ssp) {
-    return readStartpoint(ssp, null);
 
 Review comment:
   You are not changing the function signature, so the test should still work when you invoke `readStartpoint(ssp, null)` from here what am I missing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] rmatharu commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
rmatharu commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#discussion_r401860849
 
 

 ##########
 File path: samza-core/src/main/java/org/apache/samza/startpoint/StartpointManager.java
 ##########
 @@ -158,8 +158,25 @@ public void writeStartpoint(SystemStreamPartition ssp, TaskName taskName, Startp
    * @return {@link Optional} of {@link Startpoint} for the {@link SystemStreamPartition}.
    *         It is empty if it does not exist or if it is too stale.
    */
+  @VisibleForTesting
   public Optional<Startpoint> readStartpoint(SystemStreamPartition ssp) {
-    return readStartpoint(ssp, null);
 
 Review comment:
   The fact that null as task-name maps to "only-ssp" is an internal of the class, i.e., the unit. 
   Unit-tests test the unit not internal nitty gritties of the unit.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] rmatharu commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
rmatharu commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#discussion_r401853294
 
 

 ##########
 File path: samza-core/src/main/java/org/apache/samza/startpoint/StartpointManager.java
 ##########
 @@ -158,8 +158,25 @@ public void writeStartpoint(SystemStreamPartition ssp, TaskName taskName, Startp
    * @return {@link Optional} of {@link Startpoint} for the {@link SystemStreamPartition}.
    *         It is empty if it does not exist or if it is too stale.
    */
+  @VisibleForTesting
   public Optional<Startpoint> readStartpoint(SystemStreamPartition ssp) {
-    return readStartpoint(ssp, null);
 
 Review comment:
   Is used in numerous tests, so need it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#discussion_r401851188
 
 

 ##########
 File path: samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java
 ##########
 @@ -127,6 +127,7 @@
    */
   private static final String AM_JMX_ENABLED = "yarn.am.jmx.enabled";
   private static final String CLUSTER_MANAGER_JMX_ENABLED = "cluster-manager.jobcoordinator.jmx.enabled";
+  private static final String CLUSTER_MANAGER_STARTPOINT_FANOUT_ENABLED = "job.startpoint.fanout.enabled";
 
 Review comment:
   If this config is called `job.startpoint.fanout.enabled`, isn't it better to be a part of JobConfig than ClusterManagerConfig?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] rmatharu commented on issue #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
rmatharu commented on issue #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#issuecomment-607440268
 
 
   +1, simplified and addressed both comments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#discussion_r401854098
 
 

 ##########
 File path: samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java
 ##########
 @@ -127,6 +127,7 @@
    */
   private static final String AM_JMX_ENABLED = "yarn.am.jmx.enabled";
   private static final String CLUSTER_MANAGER_JMX_ENABLED = "cluster-manager.jobcoordinator.jmx.enabled";
+  private static final String CLUSTER_MANAGER_STARTPOINT_FANOUT_ENABLED = "job.startpoint.fanout.enabled";
 
 Review comment:
   Once you decide where it fits, since this is user-facing, please add docs to doc markdown. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#discussion_r401848987
 
 

 ##########
 File path: samza-core/src/main/java/org/apache/samza/startpoint/StartpointManager.java
 ##########
 @@ -158,8 +158,25 @@ public void writeStartpoint(SystemStreamPartition ssp, TaskName taskName, Startp
    * @return {@link Optional} of {@link Startpoint} for the {@link SystemStreamPartition}.
    *         It is empty if it does not exist or if it is too stale.
    */
+  @VisibleForTesting
   public Optional<Startpoint> readStartpoint(SystemStreamPartition ssp) {
-    return readStartpoint(ssp, null);
+    Map<String, byte[]> startpointBytes = readWriteStore.all();
+    // there is no task-name to use as key for the startpoint in this case (only the ssp), so we use a null task-name
+    return readStartpoint(startpointBytes, ssp, null);
+  }
+
+  /**
+   * Returns the last {@link Startpoint} that defines the start position for a {@link SystemStreamPartition} and {@link TaskName}.
+   * @param ssp The {@link SystemStreamPartition} to fetch the {@link Startpoint} for.
+   * @param taskName the {@link TaskName} to fetch the {@link Startpoint} for.
+   * @return {@link Optional} of {@link Startpoint} for the {@link SystemStreamPartition}.
+   *         It is empty if it does not exist or if it is too stale.
+   */
+  @VisibleForTesting
+  public Optional<Startpoint> readStartpoint(SystemStreamPartition ssp, TaskName taskName) {
+    Map<String, byte[]> startpointBytes = readWriteStore.all();
+    // there is no task-name to use as key for the startpoint in this case (only the ssp), so we use a null task-name
 
 Review comment:
   Please correct the comment, task-name can non-null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#discussion_r401848479
 
 

 ##########
 File path: samza-core/src/main/java/org/apache/samza/startpoint/StartpointManager.java
 ##########
 @@ -158,8 +158,25 @@ public void writeStartpoint(SystemStreamPartition ssp, TaskName taskName, Startp
    * @return {@link Optional} of {@link Startpoint} for the {@link SystemStreamPartition}.
    *         It is empty if it does not exist or if it is too stale.
    */
+  @VisibleForTesting
   public Optional<Startpoint> readStartpoint(SystemStreamPartition ssp) {
-    return readStartpoint(ssp, null);
 
 Review comment:
   Revert the change on line line 162 calling `readStartpoint(ssp, null);` will still do the intended, do not need line 163-165

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] mynameborat commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
mynameborat commented on a change in pull request #1335: Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335#discussion_r401354195
 
 

 ##########
 File path: samza-core/src/main/java/org/apache/samza/startpoint/StartpointManager.java
 ##########
 @@ -159,38 +160,63 @@ public void writeStartpoint(SystemStreamPartition ssp, TaskName taskName, Startp
    *         It is empty if it does not exist or if it is too stale.
    */
   public Optional<Startpoint> readStartpoint(SystemStreamPartition ssp) {
-    return readStartpoint(ssp, null);
+    return readStartpointMap(Collections.singletonMap(null, Collections.singleton(ssp)), false).get(null)
 
 Review comment:
   `null` as a key and `.get(null)` looks atypical and not readable on the go. It does require some context for someone to get the background on this.
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] rmatharu merged pull request #1335: SAMZA-2501 : Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store

Posted by GitBox <gi...@apache.org>.
rmatharu merged pull request #1335: SAMZA-2501 : Optimizing startpoint manager to not make successive bootstrapMessage calls to coordinator-store
URL: https://github.com/apache/samza/pull/1335
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services