You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/20 20:02:37 UTC

[GitHub] [beam] steveniemitz opened a new issue, #22372: [Bug]: PR 17365 breaks BQ loads in some cases

steveniemitz opened a new issue, #22372:
URL: https://github.com/apache/beam/issues/22372

   ### What happened?
   
   In beam 2.39, this used to work:
   ```
   BigQueryIO
      .write()
      .optimizedWrites()
      .to(OutputTable)
      .withAvroFormatFunction(abc)
      .withAvroSchemaFactory(xyz)
      .withCreateDisposition(CreateDisposition.CREATE_NEVER)
      .withWriteDisposition(WriteDisposition.WRITE_TRUNCATE))
   ```
   
   but now throws an exception if you end up in MultiPartitionsWriteTables:
   
   ```
   Caused by: java.lang.IllegalArgumentException: Unless create disposition is CREATE_NEVER, a schema must be specified, i.e. DynamicDestinations.getSchema() may not return null. However, create disposition is CREATE_IF_NEEDED, and org.apache.beam.sdk.io.gcp.bigquery.DynamicDestinationsHelpers$ConstantTableDestinations@52e95e1 returned null for destination tableSpec: <table>
   	at org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull(Preconditions.java:436)
   	at org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.processElement(WriteTables.java:207)
   ```
   
   The reason seems like now, the create disposition is set to `CreateDisposition.CREATE_IF_NEEDED` in WriteTables no matter what the user sets it to.
   
   cc @pabloem @MarcoRob 
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-java-gcp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] MarcoRob commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
MarcoRob commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1227483672

   > Agreed, this looks like that original PR changed how we load dynamic destinations for MultiPartitionsWrite tables. @MarcoRob can you take a look at this?
   
   Sure, let me check, tks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1192748306

   @johnjcasey @chamikaramj could you help look into this? I will be on vacation for a couple weeks from today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1295077816

   Looks like the problem is in this [deleted code block](https://github.com/apache/beam/pull/17365/files#diff-3edb3abd8909e7075e679e8b93420d4119858d6a9418b6ec10a204a592f062abL682-L692). The create disposition in WriteTables has always been `CreateDisposition.CREATE_IF_NEEDED`, because that's necessary when creating temp tables. However, in the case the schema isn't provided, this would cause the problem we're seeing here.
   
   That code block used to handle this case by fetching the schema of the final destination table and using it as the schema for the temp tables. I think bringing that code back would solve this issue, unless there's a particular reason it was deleted @MarcoRob?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] MarcoRob commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
MarcoRob commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1312215843

   > Looks like the problem is in this [deleted code block](https://github.com/apache/beam/pull/17365/files#diff-3edb3abd8909e7075e679e8b93420d4119858d6a9418b6ec10a204a592f062abL682-L692). The create disposition in WriteTables has always been `CreateDisposition.CREATE_IF_NEEDED`, because that's necessary when creating temp tables. However, in the case the schema isn't provided, this would cause the problem we're seeing here.
   > 
   > That code block used to handle this case by fetching the schema of the final destination table and using it as the schema for the temp tables. I think bringing that code back would solve this issue, unless there's a particular reason it was deleted @MarcoRob?
   
   The change was part of a fix made due the following [ticket](https://issues.apache.org/jira/browse/BEAM-12482).
   I am taking a look at this now to validate the issue and fix the bug. I will be updating the bug with more info. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1423311832

   Decide to reopen this issue because there is still more than one breakage remain
   
   #25355
   - incompatible schema error in copy when dynamicDestination
   - check equality between provided schema and final destination schema does not work [(#issuecomment-1423190926)](https://github.com/apache/beam/issues/25355#issuecomment-1423190926)
   
   some fixes already done:
   #22624 (fixes #22543) #24471 #24700
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1332455242

   Hi @MarcoRob, what is the status of this issue? Is it safe to restore the deleted block?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1454183421

   #25355 is fixed and will be made available in upcoming 2.46.0. Close this for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1192780007

   ack


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1279282501

   Hi, just curious about what is the current status of this bug? Is the working example fixed by #22390 ? @steveniemitz @MarcoRob 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] steveniemitz commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
steveniemitz commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1191484093

   Additionally, the new `UpdateSchemaDestination` breaks if the source format is set to avro due to it trying to load an empty file (a 0 length file is not a valid avro file).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn closed issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
Abacn closed issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases
URL: https://github.com/apache/beam/issues/22372


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn closed issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn closed issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases
URL: https://github.com/apache/beam/issues/22372


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1295047595

   Looks like #22390 fixed the Avro source problem referred to in the second comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on issue #22372: [Bug]: PR 17365 breaks BQ loads in some cases

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on issue #22372:
URL: https://github.com/apache/beam/issues/22372#issuecomment-1226251319

   Agreed, this looks like that original PR changed how we load dynamic destinations for MultiPartitionsWrite tables. @MarcoRob can you take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org