You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/07 23:14:20 UTC

[GitHub] [beam] thiagotnunes commented on a diff in pull request #23997: Add GetSize implementation for DetectNewPartitions SDF

thiagotnunes commented on code in PR #23997:
URL: https://github.com/apache/beam/pull/23997#discussion_r1015989214


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dofn/DetectNewPartitionsDoFn.java:
##########
@@ -109,9 +118,15 @@ public TimestampRange initialRestriction(@Element PartitionMetadata partition) {
         TimestampUtils.previous(createdAt), com.google.cloud.Timestamp.MAX_VALUE);
   }
 
+  @GetSize
+  public double getSize() {
+    double throughput = throughputEstimator.get();
+    LOG.debug("Reported getSize() - throughput: " + throughput);
+    return throughput;

Review Comment:
   I initially had the `timeGap * throughput`, but I don't think it adds any value here. In fact, the backlog bytes increases as time passes, because the `timeGap` becomes bigger until there is a partition to schedule. This makes it very weird in terms of monitoring as it gives the false sense that the backlog is increasing were it could be constant.
   
   I am a bit worried about always returning a `0` here, as it could have unintended consequences (not sure which). I will consult with other team members and see if it makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org