You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "robertwb (via GitHub)" <gi...@apache.org> on 2024/03/19 21:31:43 UTC

[PR] Deduplicate common environments. [beam]

robertwb opened a new pull request, #30681:
URL: https://github.com/apache/beam/pull/30681

   This can be especially useful for those pipelines with many cross-language transforms.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #30681:
URL: https://github.com/apache/beam/pull/30681#issuecomment-2015602340

   ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/30681?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   Attention: Patch coverage is `86.66667%` with `6 lines` in your changes are missing coverage. Please review.
   > Project coverage is 71.46%. Comparing base [(`a3e5ac8`)](https://app.codecov.io/gh/apache/beam/commit/a3e5ac86eeade9fbef391a2c19d67825335938e6?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) to head [(`c99f7b9`)](https://app.codecov.io/gh/apache/beam/pull/30681?dropdown=coverage&src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   > Report is 20 commits behind head on master.
   
   | [Files](https://app.codecov.io/gh/apache/beam/pull/30681?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
   |---|---|---|
   | [sdks/python/apache\_beam/runners/common.py](https://app.codecov.io/gh/apache/beam/pull/30681?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | 84.61% | [6 Missing :warning: ](https://app.codecov.io/gh/apache/beam/pull/30681?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #30681       +/-   ##
   ===========================================
   + Coverage   38.53%   71.46%   +32.93%     
   ===========================================
     Files         698      710       +12     
     Lines      102360   104798     +2438     
   ===========================================
   + Hits        39441    74893    +35452     
   + Misses      61286    28272    -33014     
     Partials     1633     1633               
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/beam/pull/30681/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [python](https://app.codecov.io/gh/apache/beam/pull/30681/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `81.26% <86.66%> (+52.10%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/beam/pull/30681?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb merged PR #30681:
URL: https://github.com/apache/beam/pull/30681


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "aaltay (via GitHub)" <gi...@apache.org>.
aaltay commented on PR #30681:
URL: https://github.com/apache/beam/pull/30681#issuecomment-2020981060

   Is this ready to be merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on code in PR #30681:
URL: https://github.com/apache/beam/pull/30681#discussion_r1534772378


##########
sdks/python/apache_beam/runners/common.py:
##########
@@ -1941,3 +1945,64 @@ def validate_transform(transform_id):
 
   for t in pipeline_proto.root_transform_ids:
     validate_transform(t)
+
+
+def merge_common_environments(pipeline_proto):

Review Comment:
   Added a couple of tests as well as verifying manually on more complex pipelines. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on code in PR #30681:
URL: https://github.com/apache/beam/pull/30681#discussion_r1534772640


##########
sdks/python/apache_beam/runners/common.py:
##########
@@ -1941,3 +1945,64 @@ def validate_transform(transform_id):
 
   for t in pipeline_proto.root_transform_ids:
     validate_transform(t)
+
+
+def merge_common_environments(pipeline_proto):

Review Comment:
   Consolidated. It's still good to have it in both places. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #30681:
URL: https://github.com/apache/beam/pull/30681#issuecomment-2021072479

   Just trying to ensure everything is green. Looks like WordCountIT.test_wordcount_it failed with
   
   `QUOTA_EXCEEDED: Instance 'beamapp-runner-0325174918-03251049-a16z-harness-8857' creation failed: Quota 'IN_USE_ADDRESSES' exceeded.  Limit: 1200.0 in region us-central1.`
   
   which isn't relevant to this change but I'm re-running to be sure it's all OK.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on code in PR #30681:
URL: https://github.com/apache/beam/pull/30681#discussion_r1531155798


##########
sdks/python/apache_beam/runners/common.py:
##########
@@ -1941,3 +1945,64 @@ def validate_transform(transform_id):
 
   for t in pipeline_proto.root_transform_ids:
     validate_transform(t)
+
+
+def merge_common_environments(pipeline_proto):

Review Comment:
   Does this make the merge logic at the following location obsolete ?
   
   https://github.com/apache/beam/blob/fb7ba65e2236f3dd871b6e492afc07249a4a5c49/sdks/python/apache_beam/pipeline.py#L964



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on code in PR #30681:
URL: https://github.com/apache/beam/pull/30681#discussion_r1531179623


##########
sdks/python/apache_beam/runners/common.py:
##########
@@ -1941,3 +1945,64 @@ def validate_transform(transform_id):
 
   for t in pipeline_proto.root_transform_ids:
     validate_transform(t)
+
+
+def merge_common_environments(pipeline_proto):

Review Comment:
   Ah, yes, it looks like it does. (That code didn't seem to be working, as I was definitely seeing environments that needed deduplication, but perhaps I should merge the two.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #30681:
URL: https://github.com/apache/beam/pull/30681#issuecomment-2008172678

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #30681:
URL: https://github.com/apache/beam/pull/30681#issuecomment-2008171114

   R: @chamikaramj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "scwhittle (via GitHub)" <gi...@apache.org>.
scwhittle commented on code in PR #30681:
URL: https://github.com/apache/beam/pull/30681#discussion_r1531597620


##########
sdks/python/apache_beam/runners/common.py:
##########
@@ -1941,3 +1945,64 @@ def validate_transform(transform_id):
 
   for t in pipeline_proto.root_transform_ids:
     validate_transform(t)
+
+
+def merge_common_environments(pipeline_proto):

Review Comment:
   Thanks! It would be good to have a unit test to verify it works as expected



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Deduplicate common environments. [beam]

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on code in PR #30681:
URL: https://github.com/apache/beam/pull/30681#discussion_r1534859758


##########
sdks/python/apache_beam/runners/common_test.py:
##########
@@ -584,5 +587,37 @@ def test_window_observing_split_on_window_boundary_round_down_on_last_window(
     self.assertEqual(stop_index, 2)
 
 
+class UtilitiesTest(unittest.TestCase):
+  def test_equal_environments_merged(self):
+    pipeline_proto = merge_common_environments(
+        beam_runner_api_pb2.Pipeline(
+            components=beam_runner_api_pb2.Components(
+                environments={
+                    'a1': beam_runner_api_pb2.Environment(urn='A'),

Review Comment:
   Also confirm that env_id in transforms and WindowingStrategies get updated ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org