You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Eugene Kirpichov (JIRA)" <ji...@apache.org> on 2017/08/15 18:36:00 UTC

[jira] [Commented] (BEAM-2768) Fix bigquery.WriteTables generating non-unique job identifiers

    [ https://issues.apache.org/jira/browse/BEAM-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127687#comment-16127687 ] 

Eugene Kirpichov commented on BEAM-2768:
----------------------------------------

Could you tell more about how you're using BigQueryIO.Write (it has many modes - it would be best if you could show a code snippet where you're applying BigQueryIO.write() in your pipeline, removing all personal data but at least exactly showing all BigQueryIO API methods you're using) and what exact version of Beam SDK you're using? Your links point to the master branch, but the bug description says 2.0.0 - these versions have very different implementations of BigQueryIO.Write.

Looking at the current code, the job id *does* contain a random UUID that comes from https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L348.

> Fix bigquery.WriteTables generating non-unique job identifiers
> --------------------------------------------------------------
>
>                 Key: BEAM-2768
>                 URL: https://issues.apache.org/jira/browse/BEAM-2768
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>    Affects Versions: 2.0.0
>            Reporter: Matti Remes
>            Assignee: Reuven Lax
>
> This is a result of BigQueryIO not creating unique job ids for batch inserts, thus BigQuery API responding with a 409 conflict error:
> {code:java}
> Request failed with code 409, will NOT retry: https://www.googleapis.com/bigquery/v2/projects/<project_id>/jobs
> {code}
> The jobs are initiated in a step BatchLoads/SinglePartitionWriteTables, called by step's WriteTables ParDo:
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L511-L521
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L148
> It would probably be a good idea to append a UUIDs as part of a job id.
> Edit: This is a major bug blocking using BigQuery as a sink for bounded input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)