You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/27 19:24:30 UTC

[GitHub] [beam] johnjcasey commented on a diff in pull request #22347: [22188] Set allowed timestamp skew in storage-api transform

johnjcasey commented on code in PR #22347:
URL: https://github.com/apache/beam/pull/22347#discussion_r931504562


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java:
##########
@@ -553,6 +553,11 @@ public void onTimer(
         OutputReceiver<KV<String, Operation>> o,
         BoundedWindow window) {
       // Stream is idle - clear it.
+      // Note: this is best effort. We are explicitly emiting a timestamp that is before
+      // the default output timestamp, which means that in some cases (usually when draining
+      // a pipeline) this finalize element will be dropped as late. This is usually ok as
+      // BigQuery will eventually garbage collect the stream. We attempt to finalize idle streams
+      // merely to remove the pressure of large numbers of orphaned streams from BigQuery.

Review Comment:
   what is the consequence of orphaned streams?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org