You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/09/01 20:32:44 UTC

[GitHub] [beam] lukecwik commented on a diff in pull request #22953: Fix for #22951

lukecwik commented on code in PR #22953:
URL: https://github.com/apache/beam/pull/22953#discussion_r961075566


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java:
##########
@@ -117,6 +117,11 @@
    */
   @AutoValue
   public abstract static class BatchingParams<InputT> implements Serializable {

Review Comment:
   It looks like your adding support for GroupIntoBatches to limit on count and byte size at the same time.
   
   Can you add tests that cover this new scenario to:
   * GroupIntoBatchesTest
   * GroupIntoBatchesTranslationTest
   



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java:
##########
@@ -113,6 +113,9 @@
   // If user triggering is supplied, we will trigger the file write after this many records are
   // written.
   static final int FILE_TRIGGERING_RECORD_COUNT = 500000;
+  // If user triggering is supplied, we will trigger the file write after this many bytes are
+  // written.
+  static final long FILE_TRIGGERING_BYTE_COUNT = 100 * (1L << 20); // 100MiB

Review Comment:
   It looks like we already have a memory limit for writing of 20 parallel writers with 64mb buffers. Should we limit this triggering to be 64mbs as well so that it fits in one chunk?
   
   CC: @reuvenlax Any suggestions here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org