You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 18:46:04 UTC
[GitHub] [beam] kennknowles opened a new issue, #18531: BigQueryIO withFailedInsertRetryPolicy is endlessly retrying "invalid" rows
kennknowles opened a new issue, #18531:
URL: https://github.com/apache/beam/issues/18531
Using the InsertRetryPolicy.retryTransientErrors() on streaming data into a BigQuery table is endlessly retrying "invalid" rows.
To quote Eugene Kirpichov [~kirpichov]
bq. Upon talking to the BigQuery team, it became clear that this is indeed a bug in BigQueryIO. This error is not reported via InsertErrors because the InsertAll request specifies the table once rather than per row, and the table is invalid, so all rows in the batch are invalid. Beam should handle this.
```
p.apply(BigQueryIO.writeTableRows()
.to(new DatePartitionedTableSpecifier(tableReference,
"tracking data"))
.withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
// write all failed inserts
to a DMQ
.getFailedInserts().apply(MapElements.via(new SimpleFunction<TableRow, PubsubMessage>()
{
public PubsubMessage apply(final TableRow _row) {
try {
return new PubsubMessage(JacksonFactory.getDefaultInstance().toByteArray(_row),
Collections.<String, String>emptyMap());
} catch (IOException e)
{
throw new RuntimeException("failed to write to DMQ", e);
}
}
})).apply(PubsubIO.writeMessages().to("projects/gameduell-bits-bigquery-poc/topics/dmq"));
```
```
(1a04bdb0d43aca9c): java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException:
400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "The
destination table's partition rum$20170925 is outside the allowed bounds. You can only stream to partitions
within 31 days in the past and 16 days in the future relative to the current date.",
"reason" :
"invalid"
} ],
"message" : "The destination table's partition rum$20170925 is outside the allowed
bounds. You can only stream to partitions within 31 days in the past and 16 days in the future relative
to the current date.",
"status" : "INVALID_ARGUMENT"
}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:774)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:809)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:126)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:96)
Caused
by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code"
: 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition
rum$20170925 is outside the allowed bounds. You can only stream to partitions within 31 days in the
past and 16 days in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The destination table's partition rum$20170925 is outside the allowed bounds. You can only
stream to partitions within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:720)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:712)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition rum$20170925
is outside the allowed bounds. You can only stream to partitions within 31 days in the past and 16 days
in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The
destination table's partition rum$20170925 is outside the allowed bounds. You can only stream to partitions
within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:774)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:809)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:126)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:96)
Caused
by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code"
: 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition
rum$20170925 is outside the allowed bounds. You can only stream to partitions within 31 days in the
past and 16 days in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The destination table's partition rum$20170925 is outside the allowed bounds. You can only
stream to partitions within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:720)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:712)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition rum$20170925
is outside the allowed bounds. You can only stream to partitions within 31 days in the past and 16 days
in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The
destination table's partition rum$20170925 is outside the allowed bounds. You can only stream to partitions
within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:774)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:809)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:126)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:96)
Caused
by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code"
: 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition
rum$20170925 is outside the allowed bounds. You can only stream to partitions within 31 days in the
past and 16 days in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The destination table's partition rum$20170925 is outside the allowed bounds. You can only
stream to partitions within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:720)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:712)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition rum$20170925
is outside the allowed bounds. You can only stream to partitions within 31 days in the past and 16 days
in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The
destination table's partition rum$20170925 is outside the allowed bounds. You can only stream to partitions
within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:774)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:809)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:126)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:96)
Caused
by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code"
: 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition
rum$20170925 is outside the allowed bounds. You can only stream to partitions within 31 days in the
past and 16 days in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The destination table's partition rum$20170925 is outside the allowed bounds. You can only
stream to partitions within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:720)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:712)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition rum$20170925
is outside the allowed bounds. You can only stream to partitions within 31 days in the past and 16 days
in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The
destination table's partition rum$20170925 is outside the allowed bounds. You can only stream to partitions
within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:774)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:809)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:126)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:96)
Caused
by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code"
: 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition
rum$20170925 is outside the allowed bounds. You can only stream to partitions within 31 days in the
past and 16 days in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The destination table's partition rum$20170925 is outside the allowed bounds. You can only
stream to partitions within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:720)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:712)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition rum$20170925
is outside the allowed bounds. You can only stream to partitions within 31 days in the past and 16 days
in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The
destination table's partition rum$20170925 is outside the allowed bounds. You can only stream to partitions
within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:774)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:809)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:126)
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:96)
Caused
by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code"
: 400,
"errors" : [ {
"domain" : "global",
"message" : "The destination table's partition
rum$20170925 is outside the allowed bounds. You can only stream to partitions within 31 days in the
past and 16 days in the future relative to the current date.",
"reason" : "invalid"
} ],
"message" : "The destination table's partition rum$20170925 is outside the allowed bounds. You can only
stream to partitions within 31 days in the past and 16 days in the future relative to the current date.",
"status" : "INVALID_ARGUMENT"
}
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:720)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.call(BigQueryServicesImpl.java:712)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
```
Imported from Jira [BEAM-3271](https://issues.apache.org/jira/browse/BEAM-3271). Original Jira may contain additional context.
Reported by: cw_krebs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org