You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "svetakvsundhar (via GitHub)" <gi...@apache.org> on 2023/03/27 22:17:18 UTC

[GitHub] [beam] svetakvsundhar opened a new pull request, #26002: UUID now set in PTransform.

svetakvsundhar opened a new pull request, #26002:
URL: https://github.com/apache/beam/pull/26002

   `fixes #22733 `
   
   The UUID generation is now done upon pipeline creation time in Python BigQuery IO. This should prevent collisions from occuring. 
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar commented on pull request #26002: Moving UUID definition in Python BQ IO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on PR #26002:
URL: https://github.com/apache/beam/pull/26002#issuecomment-1486014219

   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar commented on pull request #26002: Moving UUID definition in Python BQ IO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on PR #26002:
URL: https://github.com/apache/beam/pull/26002#issuecomment-1486999231

   R: @ahmedabu98 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar closed pull request #26002: Moving UUID definition in Python BQ IO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar closed pull request #26002: Moving UUID definition in Python BQ IO
URL: https://github.com/apache/beam/pull/26002


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26002: Moving UUID definition in Python BQ IO

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26002:
URL: https://github.com/apache/beam/pull/26002#issuecomment-1487001461

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26002: Moving UUID definition in Python BQ IO

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26002:
URL: https://github.com/apache/beam/pull/26002#issuecomment-1485997601

   Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #26002: Moving UUID definition in Python BQ IO

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #26002:
URL: https://github.com/apache/beam/pull/26002#discussion_r1150741857


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -2512,7 +2511,7 @@ def file_path_to_remove(unused_elm):
                 method=self.method,
                 job_name=job_name,
                 step_name=step_name,
-                unique_id=unique_id,
+                unique_id=str(uuid.uuid4())[0:10],

Review Comment:
   It's critical that the `unique_id` here is equal to `unique_id` on line 2492. BigQuery Export read generally works in three steps: 
   1. [output to files in GCS](https://github.com/apache/beam/blob/91bfbac7a90af3b2de3f432f5f0589c527025270/sdks/python/apache_beam/io/gcp/bigquery.py#L851)
   2. read from those files
   3. [delete GCS files](https://github.com/apache/beam/blob/91bfbac7a90af3b2de3f432f5f0589c527025270/sdks/python/apache_beam/io/gcp/bigquery.py#L2671)
   
   The unique id is used when creating those file names, and the same unique id is used to generate the name of the directory to delete after the read operation. Unique ID should be the same for both so that we can clean up properly.



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -2674,7 +2672,7 @@ def expand(self, pcoll):
             bigquery_job_labels=self.bigquery_job_labels,
             job_name=job_name,
             step_name=step_name,
-            unique_id=unique_id,
+            unique_id=str(uuid.uuid4())[0:10],

Review Comment:
   Does doing it this way make a difference? I'm not so familiar, would refer to a Python SDK expert for this



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -2512,7 +2511,7 @@ def file_path_to_remove(unused_elm):
                 method=self.method,
                 job_name=job_name,
                 step_name=step_name,
-                unique_id=unique_id,
+                unique_id=str(uuid.uuid4())[0:10],

Review Comment:
   What you could do here is create the `unique_id` beforehand in a DoFn then wrap it in a `pvalue.AsSingleton` and pass it to both the Read and Cleanup operations as a side-input. See an example [here](https://github.com/apache/beam/blob/a4f55afa052b83237387698649b127570e1e5a63/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L1180) that does this.
   
   In the relevant issue, it mentions this is the way Java does it. It [creates the value in a DoFn](https://github.com/apache/beam/blob/b3b184361100492f92fcc51ff82a0fcd962d5ee0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1094-L1106) then uses it as a sideinput for [read](https://github.com/apache/beam/blob/b3b184361100492f92fcc51ff82a0fcd962d5ee0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1150) and [cleanup](https://github.com/apache/beam/blob/b3b184361100492f92fcc51ff82a0fcd962d5ee0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1206).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #26002: Moving UUID definition in Python BQ IO

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #26002:
URL: https://github.com/apache/beam/pull/26002#issuecomment-1485949433

   ## [Codecov](https://codecov.io/gh/apache/beam/pull/26002?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#26002](https://codecov.io/gh/apache/beam/pull/26002?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f79cda8) into [master](https://codecov.io/gh/apache/beam/commit/fe791f61e5c84aad048d69693d788ec4b9bc7647?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fe791f6) will **decrease** coverage by `2.59%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #26002      +/-   ##
   ==========================================
   - Coverage   73.96%   71.37%   -2.59%     
   ==========================================
     Files         706      739      +33     
     Lines       95473    97928    +2455     
   ==========================================
   - Hits        70616    69897     -719     
   - Misses      23541    26715    +3174     
     Partials     1316     1316              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `79.96% <100.00%> (-4.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/26002?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/26002?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `71.46% <100.00%> (+0.11%)` | :arrow_up: |
   
   ... and [155 files with indirect coverage changes](https://codecov.io/gh/apache/beam/pull/26002/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar commented on pull request #26002: Moving UUID definition in Python BQ IO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on PR #26002:
URL: https://github.com/apache/beam/pull/26002#issuecomment-1486022036

   Run Python Examples_Direct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] svetakvsundhar commented on pull request #26002: Moving UUID definition in Python BQ IO

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on PR #26002:
URL: https://github.com/apache/beam/pull/26002#issuecomment-1486014878

   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org