You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/08/11 01:48:15 UTC

[GitHub] [beam] chamikaramj commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance

chamikaramj commented on a change in pull request #12485:
URL: https://github.com/apache/beam/pull/12485#discussion_r468279172



##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -304,6 +308,8 @@ def compute_table_name(row):
 NOTE: This job name template does not have backwards compatibility guarantees.
 """
 BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}"
+"""The number of shards per destination when writing via streaming inserts."""
+DEFAULT_SHARDS_PER_DESTINATION = 500

Review comment:
       I believe Java uses 50 shards. Do we need a larger default for Python ?

##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -1419,7 +1448,18 @@ def __init__(
         Default is to retry always. This means that whenever there are rows
         that fail to be inserted to BigQuery, they will be retried indefinitely.
         Other retry strategy settings will produce a deadletter PCollection
-        as output.
+        as output. Appropriate values are:
+
+        * `RetryStrategy.RETRY_ALWAYS`: retry all rows if
+          there are any kind of errors. Note that this will hold your pipeline
+          back if there are errors until you cancel or update it.

Review comment:
       This is just a documentation update for a already available (and verified) feature ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org