You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/07 08:01:00 UTC

[GitHub] [beam] hengfengli commented on a diff in pull request #23997: Add GetSize implementation for DetectNewPartitions SDF

hengfengli commented on code in PR #23997:
URL: https://github.com/apache/beam/pull/23997#discussion_r1015102441


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dofn/DetectNewPartitionsDoFn.java:
##########
@@ -109,9 +118,15 @@ public TimestampRange initialRestriction(@Element PartitionMetadata partition) {
         TimestampUtils.previous(createdAt), com.google.cloud.Timestamp.MAX_VALUE);
   }
 
+  @GetSize
+  public double getSize() {
+    double throughput = throughputEstimator.get();
+    LOG.debug("Reported getSize() - throughput: " + throughput);
+    return throughput;

Review Comment:
   This might be confusing as we report `timeGap * throughput` in ReadChangeStreamPartitionDoFn: 
   
   https://github.com/apache/beam/blob/e0e10b9e5432643b884c381d145e5924cc4ef193/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dofn/ReadChangeStreamPartitionDoFn.java#L149-L166
   
   But here, we only report the throughput. 
   
   Just a bold idea. Can we totally rely on ReadChangeStreamPartitionDoFn for autoscaling? In other words, we always return 0 for getSize() in DetectNewPartitionsDoFn. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org