You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/03/10 20:32:15 UTC

[GitHub] [beam] chamikaramj opened a new pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

chamikaramj opened a new pull request #14189:
URL: https://github.com/apache/beam/pull/14189


   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam
 .apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.a
 pache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam
 .apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website | Whitespace | Typescript
   --- | --- | --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/be
 am_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600931278



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +720,28 @@ def _apply_sdk_environment_overrides(
       new_payload.container_image = new_container_image
       environment.payload = new_payload.SerializeToString()
 
+    # De-dup environments that use Java SDK by Docker container image since
+    # currently running multiple Java SDK Harnesses with Dataflow could result
+    # in dependency conflicts.
+    # TODDO(BEAM-9455): remove following restriction when Dataflow supports
+    # environment specific dependency provisioning.
+    container_url_to_env_map = dict()
+    container_url_to_env_id_map = dict()
+    for transform in proto_pipeline.components.transforms.values():
+      environment_id = transform.environment_id
+      if not environment_id:
+        continue
+      environment = proto_pipeline.components.environments[environment_id]
+      docker_payload = proto_utils.parse_Bytes(
+          environment.payload, beam_runner_api_pb2.DockerPayload)
+      image = docker_payload.container_image
+      if is_beam_java_container_name(image):

Review comment:
       Is it possible to do this rewrite later (before submitting the v1beta3 job)? I am seeing that this code is will be called before we translate beam pipeline into v1beta3 Steps. It will be difficult to support resource hints for xlang with this rewrite. You also need to do the same in portable job submission codepath in DF service. I wonder if doing this rewrite on cloud workflow proto level (service change) would be a better path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #14189:
URL: https://github.com/apache/beam/pull/14189#issuecomment-806011302


   # [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=h1) Report
   > Merging [#14189](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=desc) (d9f5567) into [master](https://codecov.io/gh/apache/beam/commit/38c9caa0dcf8cec58dfaee18547d9151fd500f44?el=desc) (38c9caa) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/14189/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #14189   +/-   ##
   =======================================
     Coverage   83.39%   83.40%           
   =======================================
     Files         469      469           
     Lines       58721    58738   +17     
   =======================================
   + Hits        48970    48988   +18     
   + Misses       9751     9750    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ache\_beam/coders/proto2\_coder\_test\_messages\_pb2.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Byb3RvMl9jb2Rlcl90ZXN0X21lc3NhZ2VzX3BiMi5weQ==) | | |
   | [...ers/interactive/display/pipeline\_graph\_renderer.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9kaXNwbGF5L3BpcGVsaW5lX2dyYXBoX3JlbmRlcmVyLnB5) | | |
   | [...ython/apache\_beam/typehints/decorators\_test\_py3.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL2RlY29yYXRvcnNfdGVzdF9weTMucHk=) | | |
   | [...ild/srcs/sdks/python/apache\_beam/io/filesystems.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZmlsZXN5c3RlbXMucHk=) | | |
   | [...am/testing/benchmarks/chicago\_taxi/process\_tfma.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL2NoaWNhZ29fdGF4aS9wcm9jZXNzX3RmbWEucHk=) | | |
   | [...sdks/python/apache\_beam/internal/metrics/metric.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvbWV0cmljcy9tZXRyaWMucHk=) | | |
   | [.../python/apache\_beam/io/gcp/datastore/v1new/util.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2RhdGFzdG9yZS92MW5ldy91dGlsLnB5) | | |
   | [...build/srcs/sdks/python/apache\_beam/io/mongodbio.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbW9uZ29kYmlvLnB5) | | |
   | [...les/complete/juliaset/juliaset/juliaset\_test\_it.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvY29tcGxldGUvanVsaWFzZXQvanVsaWFzZXQvanVsaWFzZXRfdGVzdF9pdC5weQ==) | | |
   | [...sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=) | | |
   | ... and [928 more](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=footer). Last update [38c9caa...d9f5567](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #14189:
URL: https://github.com/apache/beam/pull/14189#issuecomment-806011302


   # [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=h1) Report
   > Merging [#14189](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=desc) (d9f5567) into [master](https://codecov.io/gh/apache/beam/commit/38c9caa0dcf8cec58dfaee18547d9151fd500f44?el=desc) (38c9caa) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/14189/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #14189   +/-   ##
   =======================================
     Coverage   83.39%   83.40%           
   =======================================
     Files         469      469           
     Lines       58721    58738   +17     
   =======================================
   + Hits        48970    48988   +18     
   + Misses       9751     9750    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [.../srcs/sdks/python/apache\_beam/io/range\_trackers.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vcmFuZ2VfdHJhY2tlcnMucHk=) | | |
   | [...s/python/apache\_beam/utils/thread\_pool\_executor.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvdGhyZWFkX3Bvb2xfZXhlY3V0b3IucHk=) | | |
   | [...e\_beam/examples/complete/top\_wikipedia\_sessions.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvY29tcGxldGUvdG9wX3dpa2lwZWRpYV9zZXNzaW9ucy5weQ==) | | |
   | [...ld/srcs/sdks/python/apache\_beam/io/filesystemio.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZmlsZXN5c3RlbWlvLnB5) | | |
   | [.../srcs/sdks/python/apache\_beam/utils/annotations.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvYW5ub3RhdGlvbnMucHk=) | | |
   | [...s/python/apache\_beam/testing/datatype\_inference.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9kYXRhdHlwZV9pbmZlcmVuY2UucHk=) | | |
   | [.../interactive/display/interactive\_pipeline\_graph.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9kaXNwbGF5L2ludGVyYWN0aXZlX3BpcGVsaW5lX2dyYXBoLnB5) | | |
   | [...d/srcs/sdks/python/apache\_beam/internal/pickler.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvcGlja2xlci5weQ==) | | |
   | [...cs/sdks/python/apache\_beam/transforms/userstate.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy91c2Vyc3RhdGUucHk=) | | |
   | [...rcs/sdks/python/apache\_beam/typehints/typecheck.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3R5cGVjaGVjay5weQ==) | | |
   | ... and [928 more](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=footer). Last update [38c9caa...d9f5567](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600959224



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +720,28 @@ def _apply_sdk_environment_overrides(
       new_payload.container_image = new_container_image
       environment.payload = new_payload.SerializeToString()
 
+    # De-dup environments that use Java SDK by Docker container image since
+    # currently running multiple Java SDK Harnesses with Dataflow could result
+    # in dependency conflicts.
+    # TODDO(BEAM-9455): remove following restriction when Dataflow supports
+    # environment specific dependency provisioning.
+    container_url_to_env_map = dict()
+    container_url_to_env_id_map = dict()
+    for transform in proto_pipeline.components.transforms.values():
+      environment_id = transform.environment_id
+      if not environment_id:
+        continue
+      environment = proto_pipeline.components.environments[environment_id]
+      docker_payload = proto_utils.parse_Bytes(
+          environment.payload, beam_runner_api_pb2.DockerPayload)
+      image = docker_payload.container_image
+      if is_beam_java_container_name(image):

Review comment:
       Discussed this offline, I am ok to proceed as is with understanding that resource hint support will not work for xlang until we revert this workaround and fix the rootcause.
   cc: @robertwb 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #14189:
URL: https://github.com/apache/beam/pull/14189#issuecomment-804453851


   cc: @robertwb


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r599687746



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +719,25 @@ def _apply_sdk_environment_overrides(
       new_payload.container_image = new_container_image
       environment.payload = new_payload.SerializeToString()
 
+    # De-dup environments by Docker container image since currently Dataflow

Review comment:
       To clarify, the restriction is already there for Dataflow. We currently start an SDK Harness per container image and de-dup here: https://github.com/apache/beam/blob/83bd5485047373ae0e380c54063e3769874a8b09/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L304
   
   This just moves the de-duping logic from container images started to Dataflow to environments in the proto since I'm trying to update Dataflow to map work items to environments based on the environment ID (not container image).
   
   I can try to reduce de-duping to multi-language Java environments in muti-language pipelines since multiple Python environments do not seem to be running into issues currently. Multiple Java environments in multi-language pipelines run into dependency conflicts. Does that help ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600849871



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -282,45 +283,26 @@ def __init__(
     # Dataflow workers.
     environments_to_use = self._get_environments_from_tranforms()
     if _use_unified_worker(options):
-      # Adding a SDK container image for the pipeline SDKs
-      container_image = dataflow.SdkHarnessContainerImage()
-      pipeline_sdk_container_image = get_container_image_from_options(options)
-      container_image.containerImage = pipeline_sdk_container_image
-      container_image.useSingleCorePerContainer = True  # True for Python SDK.
-      pool.sdkHarnessContainerImages.append(container_image)
-
-      already_added_containers = [pipeline_sdk_container_image]
+      python_sdk_container_image = get_container_image_from_options(options)
 
       # Adding container images for other SDKs that may be needed for
       # cross-language pipelines.
-      for environment in environments_to_use:
+      for id, environment in environments_to_use:
         if environment.urn != common_urns.environments.DOCKER.urn:
           raise Exception(
               'Dataflow can only execute pipeline steps in Docker environments.'
               ' Received %r.' % environment)
         environment_payload = proto_utils.parse_Bytes(
             environment.payload, beam_runner_api_pb2.DockerPayload)
         container_image_url = environment_payload.container_image
-        if container_image_url in already_added_containers:
-          # Do not add the pipeline environment again.
-
-          # Currently, Dataflow uses Docker container images to uniquely
-          # identify execution environments. Hence Dataflow executes all
-          # transforms that specify the the same Docker container image in a
-          # single container instance. Dependencies of all environments that
-          # specify a given container image will be staged in the container
-          # instance for that particular container image.
-          # TODO(BEAM-9455): loosen this restriction to support multiple
-          # environments with the same container image when Dataflow supports
-          # environment specific artifact provisioning.
-          continue
-        already_added_containers.append(container_image_url)
 
         container_image = dataflow.SdkHarnessContainerImage()
         container_image.containerImage = container_image_url
         # Currently we only set following to True for Python SDK.
         # TODO: set this correctly for remote environments that might be Python.
-        container_image.useSingleCorePerContainer = False
+        container_image.useSingleCorePerContainer = (
+            container_image_url == python_sdk_container_image)

Review comment:
       I think this might not work as intended for prebuilding workflow and for custom images. Is there a better way to detect Python? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r599687746



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +719,25 @@ def _apply_sdk_environment_overrides(
       new_payload.container_image = new_container_image
       environment.payload = new_payload.SerializeToString()
 
+    # De-dup environments by Docker container image since currently Dataflow

Review comment:
       To clarify, the restriction is already there for Dataflow. We currently start an SDK Harness per container image an de-dup here: https://github.com/apache/beam/blob/83bd5485047373ae0e380c54063e3769874a8b09/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L304
   
   This just moves the de-duping from container images started to Dataflow to environments in the proto since I'm trying to update Dataflow to map work items to environments based on the environment ID (not container image).
   
   I can try to reduce de-duping to multi-language Java environments in muti-language pipelines since multiple Python environments do not seem to be running into issues currently. Multiple Java environments in multi-language pipelines run into dependency conflicts. Does that help ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] commented on pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #14189:
URL: https://github.com/apache/beam/pull/14189#issuecomment-806011302


   # [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=h1) Report
   > Merging [#14189](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=desc) (d9f5567) into [master](https://codecov.io/gh/apache/beam/commit/38c9caa0dcf8cec58dfaee18547d9151fd500f44?el=desc) (38c9caa) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/14189/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #14189   +/-   ##
   =======================================
     Coverage   83.39%   83.40%           
   =======================================
     Files         469      469           
     Lines       58721    58738   +17     
   =======================================
   + Hits        48970    48988   +18     
   + Misses       9751     9750    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...snippets/transforms/aggregation/combineglobally.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9hZ2dyZWdhdGlvbi9jb21iaW5lZ2xvYmFsbHkucHk=) | | |
   | [...hon/apache\_beam/examples/cookbook/mergecontacts.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvY29va2Jvb2svbWVyZ2Vjb250YWN0cy5weQ==) | | |
   | [...\_beam/io/gcp/internal/clients/bigquery/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2ludGVybmFsL2NsaWVudHMvYmlncXVlcnkvX19pbml0X18ucHk=) | | |
   | [...thon/apache\_beam/io/azure/blobstoragefilesystem.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vYXp1cmUvYmxvYnN0b3JhZ2VmaWxlc3lzdGVtLnB5) | | |
   | [...examples/snippets/transforms/elementwise/kvswap.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9lbGVtZW50d2lzZS9rdnN3YXAucHk=) | | |
   | [.../srcs/sdks/python/apache\_beam/utils/annotations.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvYW5ub3RhdGlvbnMucHk=) | | |
   | [...am/examples/snippets/transforms/elementwise/map.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9lbGVtZW50d2lzZS9tYXAucHk=) | | |
   | [...dks/python/apache\_beam/transforms/create\_source.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jcmVhdGVfc291cmNlLnB5) | | |
   | [...hon/apache\_beam/portability/api/beam\_fn\_api\_pb2.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2JlYW1fZm5fYXBpX3BiMi5weQ==) | | |
   | [.../srcs/sdks/python/apache\_beam/internal/gcp/auth.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvZ2NwL2F1dGgucHk=) | | |
   | ... and [928 more](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=footer). Last update [38c9caa...d9f5567](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600854824



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -282,45 +283,26 @@ def __init__(
     # Dataflow workers.
     environments_to_use = self._get_environments_from_tranforms()
     if _use_unified_worker(options):
-      # Adding a SDK container image for the pipeline SDKs
-      container_image = dataflow.SdkHarnessContainerImage()
-      pipeline_sdk_container_image = get_container_image_from_options(options)
-      container_image.containerImage = pipeline_sdk_container_image
-      container_image.useSingleCorePerContainer = True  # True for Python SDK.
-      pool.sdkHarnessContainerImages.append(container_image)
-
-      already_added_containers = [pipeline_sdk_container_image]
+      python_sdk_container_image = get_container_image_from_options(options)
 
       # Adding container images for other SDKs that may be needed for
       # cross-language pipelines.
-      for environment in environments_to_use:
+      for id, environment in environments_to_use:
         if environment.urn != common_urns.environments.DOCKER.urn:
           raise Exception(
               'Dataflow can only execute pipeline steps in Docker environments.'
               ' Received %r.' % environment)
         environment_payload = proto_utils.parse_Bytes(
             environment.payload, beam_runner_api_pb2.DockerPayload)
         container_image_url = environment_payload.container_image
-        if container_image_url in already_added_containers:
-          # Do not add the pipeline environment again.
-
-          # Currently, Dataflow uses Docker container images to uniquely
-          # identify execution environments. Hence Dataflow executes all
-          # transforms that specify the the same Docker container image in a
-          # single container instance. Dependencies of all environments that
-          # specify a given container image will be staged in the container
-          # instance for that particular container image.
-          # TODO(BEAM-9455): loosen this restriction to support multiple
-          # environments with the same container image when Dataflow supports
-          # environment specific artifact provisioning.
-          continue
-        already_added_containers.append(container_image_url)
 
         container_image = dataflow.SdkHarnessContainerImage()
         container_image.containerImage = container_image_url
         # Currently we only set following to True for Python SDK.
         # TODO: set this correctly for remote environments that might be Python.
-        container_image.useSingleCorePerContainer = False
+        container_image.useSingleCorePerContainer = (
+            container_image_url == python_sdk_container_image)

Review comment:
       Possibly, but this just preserves the current behavior. Currently we only set "useSingleCorePerContainer" for pipeline SDK (Python). get_container_image_from_options(options) should provide the Python SDK container image after considering pipeline options for custom containers etc. Agree that this should be extended for all possible Python SDK containers but I think that's out of scope for this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600960548



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -282,45 +283,26 @@ def __init__(
     # Dataflow workers.
     environments_to_use = self._get_environments_from_tranforms()
     if _use_unified_worker(options):
-      # Adding a SDK container image for the pipeline SDKs
-      container_image = dataflow.SdkHarnessContainerImage()
-      pipeline_sdk_container_image = get_container_image_from_options(options)
-      container_image.containerImage = pipeline_sdk_container_image
-      container_image.useSingleCorePerContainer = True  # True for Python SDK.
-      pool.sdkHarnessContainerImages.append(container_image)
-
-      already_added_containers = [pipeline_sdk_container_image]
+      python_sdk_container_image = get_container_image_from_options(options)
 
       # Adding container images for other SDKs that may be needed for
       # cross-language pipelines.
-      for environment in environments_to_use:
+      for id, environment in environments_to_use:
         if environment.urn != common_urns.environments.DOCKER.urn:
           raise Exception(
               'Dataflow can only execute pipeline steps in Docker environments.'
               ' Received %r.' % environment)
         environment_payload = proto_utils.parse_Bytes(
             environment.payload, beam_runner_api_pb2.DockerPayload)
         container_image_url = environment_payload.container_image
-        if container_image_url in already_added_containers:
-          # Do not add the pipeline environment again.
-
-          # Currently, Dataflow uses Docker container images to uniquely
-          # identify execution environments. Hence Dataflow executes all
-          # transforms that specify the the same Docker container image in a
-          # single container instance. Dependencies of all environments that
-          # specify a given container image will be staged in the container
-          # instance for that particular container image.
-          # TODO(BEAM-9455): loosen this restriction to support multiple
-          # environments with the same container image when Dataflow supports
-          # environment specific artifact provisioning.
-          continue
-        already_added_containers.append(container_image_url)
 
         container_image = dataflow.SdkHarnessContainerImage()
         container_image.containerImage = container_image_url
         # Currently we only set following to True for Python SDK.
         # TODO: set this correctly for remote environments that might be Python.
-        container_image.useSingleCorePerContainer = False
+        container_image.useSingleCorePerContainer = (
+            container_image_url == python_sdk_container_image)

Review comment:
       Thanks.

##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +720,28 @@ def _apply_sdk_environment_overrides(
       new_payload.container_image = new_container_image
       environment.payload = new_payload.SerializeToString()
 
+    # De-dup environments that use Java SDK by Docker container image since
+    # currently running multiple Java SDK Harnesses with Dataflow could result
+    # in dependency conflicts.
+    # TODDO(BEAM-9455): remove following restriction when Dataflow supports
+    # environment specific dependency provisioning.
+    container_url_to_env_map = dict()
+    container_url_to_env_id_map = dict()
+    for transform in proto_pipeline.components.transforms.values():
+      environment_id = transform.environment_id
+      if not environment_id:
+        continue
+      environment = proto_pipeline.components.environments[environment_id]
+      docker_payload = proto_utils.parse_Bytes(
+          environment.payload, beam_runner_api_pb2.DockerPayload)
+      image = docker_payload.container_image
+      if is_beam_java_container_name(image):

Review comment:
       Thanks, we had an offline discussion and seems like this PR does not add any additional restrictions related to resource hints implementation given that Dataflow cannot support multiple Java Environments (with multiple Java SDK Harnesses) till we fix Dataflow Dependencies issue anyways (https://issues.apache.org/jira/browse/BEAM-9455).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #14189:
URL: https://github.com/apache/beam/pull/14189#issuecomment-806011302


   # [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=h1) Report
   > Merging [#14189](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=desc) (7ffa532) into [master](https://codecov.io/gh/apache/beam/commit/38c9caa0dcf8cec58dfaee18547d9151fd500f44?el=desc) (38c9caa) will **decrease** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head 7ffa532 differs from pull request most recent head d9f5567. Consider uploading reports for the commit d9f5567 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/14189/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #14189      +/-   ##
   ==========================================
   - Coverage   83.39%   83.38%   -0.01%     
   ==========================================
     Files         469      469              
     Lines       58721    58727       +6     
   ==========================================
   + Hits        48970    48972       +2     
   - Misses       9751     9755       +4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...sdks/python/apache\_beam/utils/interactive\_utils.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaW50ZXJhY3RpdmVfdXRpbHMucHk=) | `88.09% <0.00%> (-4.77%)` | :arrow_down: |
   | [...pache\_beam/runners/interactive/interactive\_beam.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9iZWFtLnB5) | `73.56% <0.00%> (-1.15%)` | :arrow_down: |
   | [...runners/interactive/display/pcoll\_visualization.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9kaXNwbGF5L3Bjb2xsX3Zpc3VhbGl6YXRpb24ucHk=) | `85.34% <0.00%> (-0.53%)` | :arrow_down: |
   | [...eam/runners/interactive/interactive\_environment.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9lbnZpcm9ubWVudC5weQ==) | `90.37% <0.00%> (-0.38%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.86% <0.00%> (-0.13%)` | :arrow_down: |
   | [.../py38/build/srcs/sdks/python/apache\_beam/pvalue.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcHZhbHVlLnB5) | `92.74% <0.00%> (ø)` | |
   | [...sdks/python/apache\_beam/utils/subprocess\_server.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvc3VicHJvY2Vzc19zZXJ2ZXIucHk=) | `53.14% <0.00%> (ø)` | |
   | [...y38/build/srcs/sdks/python/apache\_beam/pipeline.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcGlwZWxpbmUucHk=) | `91.31% <0.00%> (+0.08%)` | :arrow_up: |
   | [...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==) | `89.69% <0.00%> (+0.15%)` | :arrow_up: |
   | [...rcs/sdks/python/apache\_beam/transforms/external.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9leHRlcm5hbC5weQ==) | `72.48% <0.00%> (+0.38%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=footer). Last update [38c9caa...d9f5567](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #14189:
URL: https://github.com/apache/beam/pull/14189#issuecomment-806011302


   # [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=h1) Report
   > Merging [#14189](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=desc) (d9f5567) into [master](https://codecov.io/gh/apache/beam/commit/38c9caa0dcf8cec58dfaee18547d9151fd500f44?el=desc) (38c9caa) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/14189/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #14189   +/-   ##
   =======================================
     Coverage   83.39%   83.40%           
   =======================================
     Files         469      469           
     Lines       58721    58738   +17     
   =======================================
   + Hits        48970    48988   +18     
   + Misses       9751     9750    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...s/python/apache\_beam/io/gcp/bigquery\_avro\_tools.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2F2cm9fdG9vbHMucHk=) | | |
   | [...sdks/python/apache\_beam/portability/common\_urns.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvY29tbW9uX3VybnMucHk=) | | |
   | [...am/portability/api/external\_transforms\_pb2\_grpc.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2V4dGVybmFsX3RyYW5zZm9ybXNfcGIyX2dycGMucHk=) | | |
   | [...snippets/transforms/aggregation/combineglobally.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9hZ2dyZWdhdGlvbi9jb21iaW5lZ2xvYmFsbHkucHk=) | | |
   | [...he\_beam/portability/api/external\_transforms\_pb2.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2V4dGVybmFsX3RyYW5zZm9ybXNfcGIyLnB5) | | |
   | [...38/build/srcs/sdks/python/apache\_beam/io/textio.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vdGV4dGlvLnB5) | | |
   | [...\_beam/testing/benchmarks/nexmark/queries/query4.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvcXVlcmllcy9xdWVyeTQucHk=) | | |
   | [...build/srcs/sdks/python/apache\_beam/io/parquetio.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vcGFycXVldGlvLnB5) | | |
   | [...ild/srcs/sdks/python/apache\_beam/io/gcp/dicomio.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2RpY29taW8ucHk=) | | |
   | [...d/srcs/sdks/python/apache\_beam/transforms/stats.py](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree#diff-YmVhbV9QcmVDb21taXRfUHl0aG9uX0Nyb24vc3JjL3Nka3MvcHl0aG9uL3Rlc3Qtc3VpdGVzL3RveC9weTM4L2J1aWxkL3NyY3Mvc2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9zdGF0cy5weQ==) | | |
   | ... and [928 more](https://codecov.io/gh/apache/beam/pull/14189/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=footer). Last update [38c9caa...d9f5567](https://codecov.io/gh/apache/beam/pull/14189?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj merged pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
chamikaramj merged pull request #14189:
URL: https://github.com/apache/beam/pull/14189


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600961166



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +720,28 @@ def _apply_sdk_environment_overrides(
       new_payload.container_image = new_container_image
       environment.payload = new_payload.SerializeToString()
 
+    # De-dup environments that use Java SDK by Docker container image since
+    # currently running multiple Java SDK Harnesses with Dataflow could result
+    # in dependency conflicts.
+    # TODDO(BEAM-9455): remove following restriction when Dataflow supports
+    # environment specific dependency provisioning.
+    container_url_to_env_map = dict()
+    container_url_to_env_id_map = dict()
+    for transform in proto_pipeline.components.transforms.values():
+      environment_id = transform.environment_id
+      if not environment_id:
+        continue
+      environment = proto_pipeline.components.environments[environment_id]
+      docker_payload = proto_utils.parse_Bytes(
+          environment.payload, beam_runner_api_pb2.DockerPayload)
+      image = docker_payload.container_image
+      if is_beam_java_container_name(image):

Review comment:
       Ah, sorry didn't notice your comment above. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600933018



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -282,45 +283,26 @@ def __init__(
     # Dataflow workers.
     environments_to_use = self._get_environments_from_tranforms()
     if _use_unified_worker(options):
-      # Adding a SDK container image for the pipeline SDKs
-      container_image = dataflow.SdkHarnessContainerImage()
-      pipeline_sdk_container_image = get_container_image_from_options(options)
-      container_image.containerImage = pipeline_sdk_container_image
-      container_image.useSingleCorePerContainer = True  # True for Python SDK.
-      pool.sdkHarnessContainerImages.append(container_image)
-
-      already_added_containers = [pipeline_sdk_container_image]
+      python_sdk_container_image = get_container_image_from_options(options)
 
       # Adding container images for other SDKs that may be needed for
       # cross-language pipelines.
-      for environment in environments_to_use:
+      for id, environment in environments_to_use:
         if environment.urn != common_urns.environments.DOCKER.urn:
           raise Exception(
               'Dataflow can only execute pipeline steps in Docker environments.'
               ' Received %r.' % environment)
         environment_payload = proto_utils.parse_Bytes(
             environment.payload, beam_runner_api_pb2.DockerPayload)
         container_image_url = environment_payload.container_image
-        if container_image_url in already_added_containers:
-          # Do not add the pipeline environment again.
-
-          # Currently, Dataflow uses Docker container images to uniquely
-          # identify execution environments. Hence Dataflow executes all
-          # transforms that specify the the same Docker container image in a
-          # single container instance. Dependencies of all environments that
-          # specify a given container image will be staged in the container
-          # instance for that particular container image.
-          # TODO(BEAM-9455): loosen this restriction to support multiple
-          # environments with the same container image when Dataflow supports
-          # environment specific artifact provisioning.
-          continue
-        already_added_containers.append(container_image_url)
 
         container_image = dataflow.SdkHarnessContainerImage()
         container_image.containerImage = container_image_url
         # Currently we only set following to True for Python SDK.
         # TODO: set this correctly for remote environments that might be Python.
-        container_image.useSingleCorePerContainer = False
+        container_image.useSingleCorePerContainer = (
+            container_image_url == python_sdk_container_image)

Review comment:
       Can you please file a bug to follow up on this? it sounds like it may be a blocker prebuilding workflow with portable job submission. cc: @y1chi - do you know if we verified this scenario? We need to make sure that prebuilding flow might does not interfere with launch 1-container-per-core Python behavior.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600958482



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -282,45 +283,26 @@ def __init__(
     # Dataflow workers.
     environments_to_use = self._get_environments_from_tranforms()
     if _use_unified_worker(options):
-      # Adding a SDK container image for the pipeline SDKs
-      container_image = dataflow.SdkHarnessContainerImage()
-      pipeline_sdk_container_image = get_container_image_from_options(options)
-      container_image.containerImage = pipeline_sdk_container_image
-      container_image.useSingleCorePerContainer = True  # True for Python SDK.
-      pool.sdkHarnessContainerImages.append(container_image)
-
-      already_added_containers = [pipeline_sdk_container_image]
+      python_sdk_container_image = get_container_image_from_options(options)
 
       # Adding container images for other SDKs that may be needed for
       # cross-language pipelines.
-      for environment in environments_to_use:
+      for id, environment in environments_to_use:
         if environment.urn != common_urns.environments.DOCKER.urn:
           raise Exception(
               'Dataflow can only execute pipeline steps in Docker environments.'
               ' Received %r.' % environment)
         environment_payload = proto_utils.parse_Bytes(
             environment.payload, beam_runner_api_pb2.DockerPayload)
         container_image_url = environment_payload.container_image
-        if container_image_url in already_added_containers:
-          # Do not add the pipeline environment again.
-
-          # Currently, Dataflow uses Docker container images to uniquely
-          # identify execution environments. Hence Dataflow executes all
-          # transforms that specify the the same Docker container image in a
-          # single container instance. Dependencies of all environments that
-          # specify a given container image will be staged in the container
-          # instance for that particular container image.
-          # TODO(BEAM-9455): loosen this restriction to support multiple
-          # environments with the same container image when Dataflow supports
-          # environment specific artifact provisioning.
-          continue
-        already_added_containers.append(container_image_url)
 
         container_image = dataflow.SdkHarnessContainerImage()
         container_image.containerImage = container_image_url
         # Currently we only set following to True for Python SDK.
         # TODO: set this correctly for remote environments that might be Python.
-        container_image.useSingleCorePerContainer = False
+        container_image.useSingleCorePerContainer = (
+            container_image_url == python_sdk_container_image)

Review comment:
       Ok, thanks, then it should not be an issue 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600110189



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +719,25 @@ def _apply_sdk_environment_overrides(
       new_payload.container_image = new_container_image
       environment.payload = new_payload.SerializeToString()
 
+    # De-dup environments by Docker container image since currently Dataflow

Review comment:
       Restricted de-duping to multi-language Java environments.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r599651690



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +719,25 @@ def _apply_sdk_environment_overrides(
       new_payload.container_image = new_container_image
       environment.payload = new_payload.SerializeToString()
 
+    # De-dup environments by Docker container image since currently Dataflow

Review comment:
       Thanks for tagging me on this change. I am working on a related change to reflect pipeline resource hints in portable pipeline representation. Hints are defined in `Environment.resource_hints`. Transforms are mapped to environments, and different transforms can have different hints. Therefore, we can have multiple environments with different hints, but the same container image.
   My change is not yet ready to review, but the replication logic looks like this:
   https://github.com/apache/beam/pull/14082/files#diff-252b68d1b24f6f7cdd8c5e54163d4856afad59fd385f5f6a91bf0fe66f09e67dR243 
   
   I think deduplicating logic as proposed in this change will be difficult to reconcile with resource hints representation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #14189:
URL: https://github.com/apache/beam/pull/14189#issuecomment-806011890


   Run Portable_Python PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] y1chi commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
y1chi commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r600953336



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -282,45 +283,26 @@ def __init__(
     # Dataflow workers.
     environments_to_use = self._get_environments_from_tranforms()
     if _use_unified_worker(options):
-      # Adding a SDK container image for the pipeline SDKs
-      container_image = dataflow.SdkHarnessContainerImage()
-      pipeline_sdk_container_image = get_container_image_from_options(options)
-      container_image.containerImage = pipeline_sdk_container_image
-      container_image.useSingleCorePerContainer = True  # True for Python SDK.
-      pool.sdkHarnessContainerImages.append(container_image)
-
-      already_added_containers = [pipeline_sdk_container_image]
+      python_sdk_container_image = get_container_image_from_options(options)
 
       # Adding container images for other SDKs that may be needed for
       # cross-language pipelines.
-      for environment in environments_to_use:
+      for id, environment in environments_to_use:
         if environment.urn != common_urns.environments.DOCKER.urn:
           raise Exception(
               'Dataflow can only execute pipeline steps in Docker environments.'
               ' Received %r.' % environment)
         environment_payload = proto_utils.parse_Bytes(
             environment.payload, beam_runner_api_pb2.DockerPayload)
         container_image_url = environment_payload.container_image
-        if container_image_url in already_added_containers:
-          # Do not add the pipeline environment again.
-
-          # Currently, Dataflow uses Docker container images to uniquely
-          # identify execution environments. Hence Dataflow executes all
-          # transforms that specify the the same Docker container image in a
-          # single container instance. Dependencies of all environments that
-          # specify a given container image will be staged in the container
-          # instance for that particular container image.
-          # TODO(BEAM-9455): loosen this restriction to support multiple
-          # environments with the same container image when Dataflow supports
-          # environment specific artifact provisioning.
-          continue
-        already_added_containers.append(container_image_url)
 
         container_image = dataflow.SdkHarnessContainerImage()
         container_image.containerImage = container_image_url
         # Currently we only set following to True for Python SDK.
         # TODO: set this correctly for remote environments that might be Python.
-        container_image.useSingleCorePerContainer = False
+        container_image.useSingleCorePerContainer = (
+            container_image_url == python_sdk_container_image)

Review comment:
       For the prebuilding workflow I think the python_sdk_container_image will be set to the prebuilt image id https://github.com/apache/beam/blob/7ffa53264dbeea76c8d274d066078d09dabce6a3/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L453. So I guess it won't be a special case as long as we made sure the environment has image set correctly https://github.com/apache/beam/blob/fdb0fd7aacf13836a4335db68fe67953c45b5025/sdks/python/apache_beam/transforms/environments.py#L296.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #14189: [BEAM-11935] Updates Dataflow SDK Harness map to set Environment ID

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r599651690



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +719,25 @@ def _apply_sdk_environment_overrides(
       new_payload.container_image = new_container_image
       environment.payload = new_payload.SerializeToString()
 
+    # De-dup environments by Docker container image since currently Dataflow

Review comment:
       Thanks for tagging me on this change. I am working on a related change to reflect pipeline resource hints in portable proto. Hints are defined in `Environment.resource_hints`. Transforms are mapped to environments, and different transforms can have different hints. Therefore, we can have multiple environments with different hints, but the same container image.
   My change is not yet ready to review, but the replication logic looks like this:
   https://github.com/apache/beam/pull/14082/files#diff-252b68d1b24f6f7cdd8c5e54163d4856afad59fd385f5f6a91bf0fe66f09e67dR243 
   
   I think deduplicating logic as proposed in this change will be difficult to reconcile with resource hints representation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org