You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Jose Puertos (Jira)" <ji...@apache.org> on 2019/09/17 19:22:00 UTC

[jira] [Comment Edited] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

    [ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931600#comment-16931600 ] 

Jose Puertos edited comment on BEAM-3772 at 9/17/19 7:21 PM:
-------------------------------------------------------------

Here having the same issue with 2.12.0 and 2.15.0 . When looking into the Big Query Jobs it seems as the code for the next jobs trying to upload partitions after the first day use  CREATE_NEVER even though the code has WRITE_APPEND

 

withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND).

withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)

 

!image-2019-09-17-12-01-42-764.png|width=487,height=116!

 

Something worth mentioning is I'm using dynamic partitions.. Checking the code of BatchLoads.java it seems that expandTriggered  uses a different constructor for WritePartitions that doesn't pass the CreateDisposition

 


was (Author: josepuertos):
Here having the same issue with 2.12.0 and 2.15.0 . When looking into the Big Query Jobs it seems as the code for the next jobs trying to upload partitions after the first day use  CREATE_NEVER even though the code has WRITE_APPEND

 

withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND).

withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)

 

!image-2019-09-17-12-01-42-764.png|width=487,height=116!

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
> --------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-3772
>                 URL: https://issues.apache.org/jira/browse/BEAM-3772
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.2.0, 2.3.0
>         Environment: Dataflow streaming pipeline
>            Reporter: Benjamin BENOIST
>            Assignee: Reuven Lax
>            Priority: Major
>         Attachments: bigquery-fail.png, bigquery-success.png, image-2019-09-17-12-01-42-764.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job with id prefix 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_00001_00023, reached max retries: 3, last failed load job: {
>   "configuration" : {
>     "load" : {
>       "createDisposition" : "CREATE_NEVER",
>       "destinationTable" : {
>         "datasetId" : "dev_mydataset",
>         "projectId" : "myproject-id",
>         "tableId" : "mytable_20180302_16"
>       },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)