You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2021/09/02 22:59:00 UTC

[jira] [Comment Edited] (BEAM-12472) BigQuery streaming writes can be batched beyond request limit with BatchAndInsertElements

    [ https://issues.apache.org/jira/browse/BEAM-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409147#comment-17409147 ] 

Luke Cwik edited comment on BEAM-12472 at 9/2/21, 10:58 PM:
------------------------------------------------------------

This is also an issue for StreamingInserts for Apache Beam 2.25 (b/198464217). I checked how the code has changed and it seems like it is still impacting our latest 2.32.0 release.

{noformat}
2021-08-28 14:51:46.259 EDTError message from worker: java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "Request payload size exceeds the limit: 10485760 bytes.", "reason" : "badRequest" } ], "message" : "Request payload size exceeds the limit: 10485760 bytes.", "status" : "INVALID_ARGUMENT" } org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:908) org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:948) org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:186)
{noformat}



was (Author: lcwik):
This is also an issue for StreamingInserts for Apache Beam 2.25. I checked how the code has changed and it seems like it is still impacting our latest 2.32.0 release.

{noformat}
2021-08-28 14:51:46.259 EDTError message from worker: java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "Request payload size exceeds the limit: 10485760 bytes.", "reason" : "badRequest" } ], "message" : "Request payload size exceeds the limit: 10485760 bytes.", "status" : "INVALID_ARGUMENT" } org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:908) org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:948) org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:186)
{noformat}


> BigQuery streaming writes can be batched beyond request limit with BatchAndInsertElements
> -----------------------------------------------------------------------------------------
>
>                 Key: BEAM-12472
>                 URL: https://issues.apache.org/jira/browse/BEAM-12472
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: Sam Whittle
>            Priority: P2
>
> BatchAndInsertElements accumulates all the input elements and flushes them in finishBundle.
> However if there is enough data the request limit for bigquery can be exceeded causing an exception like the following.  It seems that finishBundle should limit the # of rows and bytes and possibly flush multiple times for a destination.
> Work around would be to use autosharding which uses state that has batching limits or to increase the # of streaming keys to decrease the likelihood of hitting this.
> {code}
> Error while processing a work item: UNKNOWN: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
> POST https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false
> {
>   "code" : 400,
>   "errors" : [ {
>     "domain" : "global",
>     "message" : "Request payload size exceeds the limit: 10485760 bytes.",
>     "reason" : "badRequest"
>   } ],
>   "message" : "Request payload size exceeds the limit: 10485760 bytes.",
>   "status" : "INVALID_ARGUMENT"
> }
> 	at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
> 	at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown Source)
> 	at org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)