You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/15 22:49:55 UTC

[GitHub] [beam] kileys opened a new issue, #22733: [Feature Request]: Use unique id for Python BigQueryIO

kileys opened a new issue, #22733:
URL: https://github.com/apache/beam/issues/22733

   ### What would you like to happen?
   
   There can be a collision when 2 pipelines using templates are loading to BigQuery at the same time to the same temp_location. 
   
   @baeminbo found there is code to use unique_id [1], but it seems that templates can re-use the same uuid. It is fixed for Java[2] by moving the UUID generation into a DoFn of a ParDo
   
   [1] https://github.com/apache/beam/blob/v2.34.0/sdks/python/apache_beam/io/gcp/bigquery.py#L2399
   [2] https://github.com/apache/beam/blob/v2.34.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1094-L1105
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-py-gcp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar commented on issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #22733:
URL: https://github.com/apache/beam/issues/22733#issuecomment-1488797655

   cc: @baeminbo, @tvalentyn for more context on the issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar commented on issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #22733:
URL: https://github.com/apache/beam/issues/22733#issuecomment-1490451133

   I think this is a non-issue, given this is fixed in Java. IIUC, because templates use only Java IOs under the hood (and the direction of templates seems to be to migrate to flex), a fix here should be unnecessary. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #22733:
URL: https://github.com/apache/beam/issues/22733#issuecomment-1505870945

   >  templates use only Java IOs under the hood 
   
   You refer to Google-provided classic tempates; however Python users can and do create templates, and yes classic templates are still supported. Also, this should be easy to fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar commented on issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #22733:
URL: https://github.com/apache/beam/issues/22733#issuecomment-1488728394

   Given that this collision can occur only when using templates, is this an issue in Python's BQ IO? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #22733:
URL: https://github.com/apache/beam/issues/22733#issuecomment-1505871145

   @svetakvsundhar do you still plan to look into this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #22733:
URL: https://github.com/apache/beam/issues/22733#issuecomment-1489039533

   I suspect this issue was open in regards to Python classic templates, which create a pipeline creation request at template creation. 
   
   cc: @harrisonlimh who I think also looked at or is looking at this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] closed issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #22733: [Feature Request]: Use unique id for Python BigQueryIO
URL: https://github.com/apache/beam/issues/22733


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar commented on issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #22733:
URL: https://github.com/apache/beam/issues/22733#issuecomment-1489124476

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar commented on issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #22733:
URL: https://github.com/apache/beam/issues/22733#issuecomment-1490660189

   .close-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] DerRidda commented on issue #22733: [Feature Request]: Use unique id for Python BigQueryIO

Posted by "DerRidda (via GitHub)" <gi...@apache.org>.
DerRidda commented on issue #22733:
URL: https://github.com/apache/beam/issues/22733#issuecomment-1493863894

   > I think this is a non-issue, given this is fixed in Java. IIUC, because templates use only Java IOs under the hood (and the direction of templates seems to be to migrate to flex), a fix here should be unnecessary.
   
   I think this is a pronounced misinterpretation of the issue. The existing BigQuery IO transforms in the Beam SDK do not use Java at all, only the new Storage API based writer does that, so all unchanged existing code is expected to rely on code that has this issue.
   
   Furthermore while Flex templates exist and it seems to be a desire by the Dataflow maintainers for uses to migrate to them, the classic templates are still fully supported, keep being developed and no deprecation notice has been issued. At my job we have currently no desire to migrate to Flex templates as they solve 0 issues for us and would only introduce unwanted latency in our job orchestration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org