You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/08/11 01:57:26 UTC

[GitHub] [beam] chamikaramj commented on a change in pull request #12489: [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions

chamikaramj commented on a change in pull request #12489:
URL: https://github.com/apache/beam/pull/12489#discussion_r468279982



##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -304,6 +308,8 @@ def compute_table_name(row):
 NOTE: This job name template does not have backwards compatibility guarantees.
 """
 BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}"
+"""The number of shards per destination when writing via streaming inserts."""
+DEFAULT_SHARDS_PER_DESTINATION = 500

Review comment:
       Seems like this will conflict with https://github.com/apache/beam/pull/12485 ?

##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -1048,6 +1055,11 @@ def __init__(
         to be passed when creating a BigQuery table. These are passed when
         triggering a load job for FILE_LOADS, and when creating a new table for
         STREAMING_INSERTS.
+      with_insert_ids: When using the STREAMING_INSERTS method to write data to

Review comment:
       Let's refer to 'https://cloud.google.com/bigquery/streaming-data-into-bigquery#disabling_best_effort_de-duplication' similar to Java.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org