You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 21:55:19 UTC

[GitHub] [beam] kennknowles opened a new issue, #18992: BigQueryIO multi-partitioned write doesn't work for streaming writes

kennknowles opened a new issue, #18992:
URL: https://github.com/apache/beam/issues/18992

   BigQueryIO performes multi-partitioned write (MultiPartitionsWriteTables step) when there's more data than the quota allowed by BigQuery (10k files or 11TB of data) to be written to a single BQ table.
   
    
   
   When writing using load jobs in streaming mode (with a triggering frequency) we hit following location where we set CREATE_DISPOSITION to CREATE_NEVER for all panes other than the first one. This is fine when we are writing a single partition (all panes of a window should write to the same table) but when there are multiple partitions this is incorrect since we need to create temp tables for all panes.
   
   [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L165](https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L165)
   
    
   
   Imported from Jira [BEAM-5216](https://issues.apache.org/jira/browse/BEAM-5216). Original Jira may contain additional context.
   Reported by: chamikara.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org