You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/09 13:43:18 UTC

[GitHub] [beam] vbshnsk opened a new issue, #22642: [Bug]: Dataflow fails to drain a job when using BigQuery (java sdk v.2.38)

vbshnsk opened a new issue, #22642:
URL: https://github.com/apache/beam/issues/22642

   ### What happened?
   
   Experiencing a weird BigQuery error without any traces of what exactly is a problem. This only occurs when we start draining the job, I don't think that we experience any problems when actually processing the collections.
   
   JSON payload for the error message is as follows (censored some stuff ykyk):
   ```
   exception: "java.lang.NullPointerException: Both parameters are null
   	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects.firstNonNull(MoreObjects.java:61)
   	at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$WriteRecordsDoFn.finalizeStream(StorageApiWritesShardedRecords.java:516)
   	at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$WriteRecordsDoFn.onWindowExpiration(StorageApiWritesShardedRecords.java:546)
   "
   job: "2022-07-25_07_47_46-13052079299893341614"
   logger: "org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker"
   message: "Execution of work for computation 'P33' on key '<tenant_id_here>' failed with uncaught exception. Work will be retried locally."
   stage: "P33"
   thread: "519795"
   work: "22dd1724b5cfa5bb-1085e5c9eb8a571d"
   worker: "<worker_here>"
   ```
   
   I am pretty sure that we are handling the errors when processing, so I am lost with what might happen during the drain :(
   
   ### Issue Priority
   
   Priority: 1
   
   ### Issue Component
   
   Component: io-java-gcp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] vbshnsk commented on issue #22642: [Bug]: Dataflow fails to drain a job when using BigQuery (java sdk v.2.38)

Posted by GitBox <gi...@apache.org>.
vbshnsk commented on issue #22642:
URL: https://github.com/apache/beam/issues/22642#issuecomment-1214720271

   Bump.
   
   Also, this is our BigQueryIO config:
   ```
   BigQueryIO
               .write<T>()
               .to(dynamicDestination)
               .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
               .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
               .withoutValidation()
               .withFormatFunction(formatFunction)
               .optimizedWrites()
               .withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
               .withSchemaUpdateOptions(
                   setOf(
                       BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION,
                       BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_RELAXATION
                   )
               )
               .withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey closed issue #22642: [Bug]: Dataflow fails to drain a job when using BigQuery (java sdk v.2.38)

Posted by GitBox <gi...@apache.org>.
johnjcasey closed issue #22642: [Bug]: Dataflow fails to drain a job when using BigQuery (java sdk v.2.38)
URL: https://github.com/apache/beam/issues/22642


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on issue #22642: [Bug]: Dataflow fails to drain a job when using BigQuery (java sdk v.2.38)

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on issue #22642:
URL: https://github.com/apache/beam/issues/22642#issuecomment-1226001769

   This has been updated in Beam 2.39, please update to that version or higher


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org