You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by pa...@apache.org on 2020/01/10 23:23:57 UTC

[beam] branch master updated: BEAM-8745 More fine-grained controls for the size of a BigQuery Load job

This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 2cd6265  BEAM-8745 More fine-grained controls for the size of a BigQuery Load job
     new a20bd7e  Merge pull request #10500 from [BEAM-8745] More fine-grained controls for the size of a BigQuery Load job
2cd6265 is described below

commit 2cd62653d892ad7826c197fb043169b4a04ce8df
Author: Jeff Klukas <je...@klukas.net>
AuthorDate: Fri Jan 3 15:18:46 2020 -0500

    BEAM-8745 More fine-grained controls for the size of a BigQuery Load job
    
    Users have hit problems where load jobs into very wide tables (100s of columns)
    are very slow, and sometimes fail. Feedback from BigQuery is that for very wide
    tables, smaller load jobs can avoid failures, and slowdowns.
    
    `BigQueryIO` already has the plumbing to support `maxBytesPerPartition`, though
    there is no public interface to change that parameter from the default.
    This PR simply promotes this parameter to be public and adds documentation
    for it.
---
 .../org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
index 3bd9d8c..b6e061b 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
@@ -2302,11 +2302,22 @@ public class BigQueryIO {
       return toBuilder().setMaxFilesPerPartition(maxFilesPerPartition).build();
     }
 
-    @VisibleForTesting
-    Write<T> withMaxBytesPerPartition(long maxBytesPerPartition) {
+    /**
+     * Control how much data will be assigned to a single BigQuery load job. If the amount of data
+     * flowing into one {@code BatchLoads} partition exceeds this value, that partition will be
+     * handled via multiple load jobs.
+     *
+     * <p>The default value (11 TiB) respects BigQuery's maximum size per load job limit and is
+     * appropriate for most use cases. Reducing the value of this parameter can improve stability
+     * when loading to tables with complex schemas containing thousands of fields.
+     *
+     * @see <a href="https://cloud.google.com/bigquery/quotas#load_jobs">BigQuery Load Job
+     *     Limits</a>
+     */
+    public Write<T> withMaxBytesPerPartition(long maxBytesPerPartition) {
       checkArgument(
           maxBytesPerPartition > 0,
-          "maxFilesPerPartition must be > 0, but was: %s",
+          "maxBytesPerPartition must be > 0, but was: %s",
           maxBytesPerPartition);
       return toBuilder().setMaxBytesPerPartition(maxBytesPerPartition).build();
     }