You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/28 18:56:55 UTC

[GitHub] [beam] reuvenlax commented on a diff in pull request #22347: [22188] Don't output from a timer for which noOutputTimestamp is set

reuvenlax commented on code in PR #22347:
URL: https://github.com/apache/beam/pull/22347#discussion_r932574031


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java:
##########
@@ -553,6 +553,11 @@ public void onTimer(
         OutputReceiver<KV<String, Operation>> o,
         BoundedWindow window) {
       // Stream is idle - clear it.
+      // Note: this is best effort. We are explicitly emiting a timestamp that is before
+      // the default output timestamp, which means that in some cases (usually when draining
+      // a pipeline) this finalize element will be dropped as late. This is usually ok as
+      // BigQuery will eventually garbage collect the stream. We attempt to finalize idle streams
+      // merely to remove the pressure of large numbers of orphaned streams from BigQuery.

Review Comment:
   if there are too many of them, BigQuery might start rejecting future stream creation requests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org