You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "Abacn (via GitHub)" <gi...@apache.org> on 2023/02/07 04:25:54 UTC

[GitHub] [beam] Abacn opened a new issue, #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Abacn opened a new issue, #25355:
URL: https://github.com/apache/beam/issues/25355

   ### What happened?
   
   This bug is triggered when all of these condition met:
   
   1. Dynamical destination set
   2. The number of gcs file written is greater than 10,000 so that MultiPartitionsWriteTables is invoked.
   3. Final destination table already exists. The report has CREATE_NEVER
   
   Then it may cause the temp table and final table having incompatible schema, regardless the schema is explicitly set or not.
   
   error message:
   ```
   Error message from worker: java.lang.RuntimeException: Failed to create job with prefix beam_bq_job_COPY_***_00000, reached max retries: 3, last failed job: { "configuration" : { "copy" : { "createDisposition" : "CREATE_NEVER", "destinationTable" : { "datasetId" : "***", "projectId" : "***", "tableId" : "***" }, ... "reason" : "invalid" } ], "state" : "DONE" },
   
   org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:200) org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:153) org.apache.beam.sdk.io.gcp.bigquery.WriteRename.finishBundle(WriteRename.java:171)
   ```
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn closed issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn closed issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error
URL: https://github.com/apache/beam/issues/25355


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1423371153

   I think I find the cause of the original issue (that in the issue description):
   
   https://github.com/apache/beam/blob/9fcb3a5b48a6db0dcf57f454d1d1eca10cf1c41b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/UpdateSchemaDestination.java#L125
   
   the processElement here does not consider the case of dynamic destination. It simply gets the first destination in the incoming list of element to setup zeroJob, and the outputs have have same destination.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1423106192

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1426482816

   Regarding above [comment](https://github.com/apache/beam/issues/25355#issuecomment-1423190926) about the following line, just tested it and equality check works fine even though they have a different String representation:
   https://github.com/apache/beam/blob/69ddf44e67db279ab49cc88a18dbf48c965bc669/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/UpdateSchemaDestination.java#L263


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1423372093

   The input of UpdateSchemaDestination should be KV<DestinationT, Iterable<WriteTables.Result>>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1428028689

   > so pass either wrapped or unwrapped dynamic destination to UpdateSchemaDestination is fine.
   
   We still would still want to wrap with match table dynamicdestinations because that's what we're doing when creating temp tables. For a given temp table, we want to pull the same schema consistently for both operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1420977817

   Has same root cause of #22372 and confirmed that the issue did not occur in Beam 2.39.0. While most of the use cases are fixed, this bug remains as of 2.45.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1423190926

   As @ahmedabu98 pointed out the original working example has typo. Initiated another job 2023-02-08_11_47_55-389031392081500435 branch: https://github.com/apache/beam/commit/f446e5c13667ad29544562817279b6650796b970
   
   The problem is that the condition 
   https://github.com/apache/beam/blob/69ddf44e67db279ab49cc88a18dbf48c965bc669/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/UpdateSchemaDestination.java#L263
   is never true. The schema returned by DynamicsDestination object is:
   ```
   GenericData{classInfo=[fields], {fields=[GenericData{classInfo=[categories, collation, defaultValueExpression, description,
   fields, maxLength, mode, name, policyTags, precision, scale, type], {name=id_even, type=STRING}}, GenericData{classInfo=
   [categories, collation, defaultValueExpression, description, fields, maxLength, mode, name, policyTags, precision, scale,
   type], {name=ev_time, type=DATETIME}}]}}
   ```
   schema by `destinationTable.getSchema()` is
   ```
   {"fields":[{"mode":"REQUIRED","name":"id_even","type":"STRING"},
   {"mode":"REQUIRED","name":"ev_time","type":"DATETIME"}]}
   ```
   though they are effectively equivalent, and the temp table generated has the same schema on BigQueryUI, their gson representation is not the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1426511546

   > Regarding above [comment](https://github.com/apache/beam/issues/25355#issuecomment-1423190926) about the following line, just tested it and equality check works fine even though they have a different String representation:
   > 
   > https://github.com/apache/beam/blob/69ddf44e67db279ab49cc88a18dbf48c965bc669/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/UpdateSchemaDestination.java#L263
   
   ah I see, thanks for clarification. so pass either wrapped or unwrapped dynamic destination to UpdateSchemaDestination is fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1421386891

   I think I have reproduced the error: https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV2_PR/151/
   
   run on branch: https://github.com/apache/beam/pull/23785/commits/250488211e5b60ae0f8221b1b3dc6288531f12ee
   
   Example jobId: `2023-02-07_11_45_58-12188672903944898798` in apache-beam-testing gcp project


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25355: [Bug]: BigQuery BatchLoad incompatible table schema error

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25355:
URL: https://github.com/apache/beam/issues/25355#issuecomment-1421398222

   UpdateSchemaDestination created by #17365 has no comment nor doc string. This task should also add necessary comments to that class.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org