You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Innocent (Jira)" <ji...@apache.org> on 2019/09/25 02:06:00 UTC

[jira] [Commented] (BEAM-6684) BigQueryIO: Unable to create dataset "Location unknown is not yet publicly available

    [ https://issues.apache.org/jira/browse/BEAM-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937337#comment-16937337 ] 

Innocent commented on BEAM-6684:
--------------------------------

[~pabloem] is there any steps to reproduce this issue?

> BigQueryIO: Unable to create dataset "Location unknown is not yet publicly available
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-6684
>                 URL: https://issues.apache.org/jira/browse/BEAM-6684
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>    Affects Versions: 2.10.0
>            Reporter: Pablo Estrada
>            Assignee: Pablo Estrada
>            Priority: Major
>
> My understanding is that BigQueryIO runs the query, writes the output to a temp dataset, and then extracts the temp dataset to GCS. This means the location of the temp dataset (if not manually set) is determined by the tables referenced in the query. This is confirmed in the source code for BigQueryIO: https://github.com/apache/beam/blob/v2.6.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L111
> So I would expect that the temp dataset should also be created in the US location, or default to the US. Instead, it appears to be defaulting to "unknown" (at least some of the time), therefore causing the whole Dataflow job to fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)