You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/20 03:46:49 UTC

[GitHub] [beam] robertwb opened a new issue, #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.

robertwb opened a new issue, #24535:
URL: https://github.com/apache/beam/issues/24535

   ### What happened?
   
   `TriggerCopyJobs` was written to assume all jobs belonging to the same table are processed by the same bundle, but this precondition appears to have been invalidated with future changes to the code. Though this still seems to hold most of the time, it may be violated for large writes, and seems to trigger more on runner v2 than runner v1. 
   
   ### Issue Priority
   
   Priority: 1
   
   ### Issue Component
   
   Component: io-py-gcp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #24535:
URL: https://github.com/apache/beam/issues/24535#issuecomment-1398783325

   reopen for cherry pick


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #24535:
URL: https://github.com/apache/beam/issues/24535#issuecomment-1339548164

   Is this a regression? It sounds severe and worth a cherry pick most likely.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #24535:
URL: https://github.com/apache/beam/issues/24535#issuecomment-1397889109

   The bug can still occur when the number of files are greater than 10,000 then bigquery load jobs will be conducted in multiple partitions: https://github.com/apache/beam/blob/428ec97e30cc6587c2ae0f81d3ba44a8a9b34f93/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L70
   
   Did a test that decrease this number and force the write happens in multiple bigquery load jobs, still see dropping elements


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn closed issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn closed issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.
URL: https://github.com/apache/beam/issues/24535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.

Posted by GitBox <gi...@apache.org>.
robertwb commented on issue #24535:
URL: https://github.com/apache/beam/issues/24535#issuecomment-1339594975

   I don't know at what point this regression was introduced, but it is pretty
   severe and I think worth a cherry-pick.
   
   On Tue, Dec 6, 2022 at 7:24 AM Kenn Knowles ***@***.***>
   wrote:
   
   > Is this a regression? It sounds severe and worth a cherry pick most likely.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/issues/24535#issuecomment-1339548164>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AADWVAO6XBEP3HZUHCHILYTWL5LDLANCNFSM6AAAAAASU3MJEM>
   > .
   > You are receiving this because you were assigned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #24535:
URL: https://github.com/apache/beam/issues/24535#issuecomment-1343494974

   I see it is first reported by a user in #23306. I think keeping the earlier bug is worthwhile so I will dupe this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey closed issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey closed issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.
URL: https://github.com/apache/beam/issues/24535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles closed issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.

Posted by GitBox <gi...@apache.org>.
kennknowles closed issue #24535: [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records.
URL: https://github.com/apache/beam/issues/24535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org