You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Pablo Estrada (Jira)" <ji...@apache.org> on 2021/11/12 19:31:00 UTC

[jira] [Commented] (BEAM-13088) Load BigQuery temp tables into different dataset

    [ https://issues.apache.org/jira/browse/BEAM-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442914#comment-17442914 ] 

Pablo Estrada commented on BEAM-13088:
--------------------------------------

Large BQ loads (>10k files, >15TB) are done in two steps: First load to temporary tables, and then copy temp tables into the final destination table.

The temp tables are in the same dataset as the final table, and this causes difficulties for some users. The goal is to add a feature to support a temporary dataset for these tables.

Here's where the copy jobs are issued: [https://github.com/apache/beam/blob/735db247f3e03d9fddb9f6d7281c986b60ac683d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java#L175-L215|https://www.google.com/url?q=https://github.com/apache/beam/blob/735db247f3e03d9fddb9f6d7281c986b60ac683d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java%23L175-L215&sa=D&source=docs&ust=1636743136136000&usg=AOvVaw1goDuQ_L5DlGIYdVC2SJ_H]

The idea would be to add a new configuration parameter and plumb it from the public interface in BigQueryIO.Write to the workflow that does this two-step-load job.

> Load BigQuery temp tables into different dataset
> ------------------------------------------------
>
>                 Key: BEAM-13088
>                 URL: https://issues.apache.org/jira/browse/BEAM-13088
>             Project: Beam
>          Issue Type: Task
>          Components: io-java-gcp
>            Reporter: Kiley Sok
>            Priority: P2
>
> When beam loads data into BigQuery, it sometimes creates temporary tables then bq copy into the destination table.
> The tables are created as temporary tables, which are then deleted afterwards. During which time, wildcard queries that run fail due to matching on these tables.
> Either create tables in a different dataset, or hidden altogether.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)