You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by pa...@apache.org on 2020/01/10 23:23:57 UTC
[beam] branch master updated: BEAM-8745 More fine-grained controls
for the size of a BigQuery Load job
This is an automated email from the ASF dual-hosted git repository.
pabloem pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 2cd6265 BEAM-8745 More fine-grained controls for the size of a BigQuery Load job
new a20bd7e Merge pull request #10500 from [BEAM-8745] More fine-grained controls for the size of a BigQuery Load job
2cd6265 is described below
commit 2cd62653d892ad7826c197fb043169b4a04ce8df
Author: Jeff Klukas <je...@klukas.net>
AuthorDate: Fri Jan 3 15:18:46 2020 -0500
BEAM-8745 More fine-grained controls for the size of a BigQuery Load job
Users have hit problems where load jobs into very wide tables (100s of columns)
are very slow, and sometimes fail. Feedback from BigQuery is that for very wide
tables, smaller load jobs can avoid failures, and slowdowns.
`BigQueryIO` already has the plumbing to support `maxBytesPerPartition`, though
there is no public interface to change that parameter from the default.
This PR simply promotes this parameter to be public and adds documentation
for it.
---
.../org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
index 3bd9d8c..b6e061b 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
@@ -2302,11 +2302,22 @@ public class BigQueryIO {
return toBuilder().setMaxFilesPerPartition(maxFilesPerPartition).build();
}
- @VisibleForTesting
- Write<T> withMaxBytesPerPartition(long maxBytesPerPartition) {
+ /**
+ * Control how much data will be assigned to a single BigQuery load job. If the amount of data
+ * flowing into one {@code BatchLoads} partition exceeds this value, that partition will be
+ * handled via multiple load jobs.
+ *
+ * <p>The default value (11 TiB) respects BigQuery's maximum size per load job limit and is
+ * appropriate for most use cases. Reducing the value of this parameter can improve stability
+ * when loading to tables with complex schemas containing thousands of fields.
+ *
+ * @see <a href="https://cloud.google.com/bigquery/quotas#load_jobs">BigQuery Load Job
+ * Limits</a>
+ */
+ public Write<T> withMaxBytesPerPartition(long maxBytesPerPartition) {
checkArgument(
maxBytesPerPartition > 0,
- "maxFilesPerPartition must be > 0, but was: %s",
+ "maxBytesPerPartition must be > 0, but was: %s",
maxBytesPerPartition);
return toBuilder().setMaxBytesPerPartition(maxBytesPerPartition).build();
}