You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "robertwb (via GitHub)" <gi...@apache.org> on 2024/03/28 19:38:46 UTC

[PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

robertwb opened a new pull request, #30793:
URL: https://github.com/apache/beam/pull/30793

   We use the schema transform discovery service to determine all the transforms that a given provider vends which may be larger than those that were explicitly declared (e.g. the basic mapping transforms are part of java core) and use this to expand the possible set of transforms this provider can be used to instanciate (attaching the appropreate renaming, etc. as needed). This can greatly increase the chances that we can use the same provider, and hence same SDK and environment, for adjacent steps.
   
   This is done lazily as transforms are used to avoid instanciating all possible providers for all pipelines.
   
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #30793:
URL: https://github.com/apache/beam/pull/30793#issuecomment-2025976248

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on code in PR #30793:
URL: https://github.com/apache/beam/pull/30793#discussion_r1589346078


##########
sdks/python/apache_beam/yaml/yaml_provider.py:
##########
@@ -152,6 +169,17 @@ def __init__(self, urns, service):
   def provided_transforms(self):
     return self._urns.keys()
 
+  def with_underlying_provider(self, other_provider):
+    if not isinstance(other_provider, ExternalProvider):
+      return
+
+    all_urns = set(other_provider.schema_transforms().keys()).union(

Review Comment:
   > Hmm... the only thing red was coverage in the presubmits.
   
   Yeah, I noticed this - I'm not sure why... It looks like the failures are around 2.57.0.dev, so it could be some versioning issue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #30793:
URL: https://github.com/apache/beam/pull/30793#issuecomment-2025974733

   R: @Polber 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on code in PR #30793:
URL: https://github.com/apache/beam/pull/30793#discussion_r1589335802


##########
sdks/python/apache_beam/yaml/yaml_provider.py:
##########
@@ -152,6 +169,17 @@ def __init__(self, urns, service):
   def provided_transforms(self):
     return self._urns.keys()
 
+  def with_underlying_provider(self, other_provider):
+    if not isinstance(other_provider, ExternalProvider):
+      return
+
+    all_urns = set(other_provider.schema_transforms().keys()).union(

Review Comment:
   This line is part of the stack trace. I'm very confident this PR is the culprit, creating a revert for now since I'm not sure what the immediate solution is



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on code in PR #30793:
URL: https://github.com/apache/beam/pull/30793#discussion_r1589343690


##########
sdks/python/apache_beam/yaml/yaml_provider.py:
##########
@@ -152,6 +169,17 @@ def __init__(self, urns, service):
   def provided_transforms(self):
     return self._urns.keys()
 
+  def with_underlying_provider(self, other_provider):
+    if not isinstance(other_provider, ExternalProvider):
+      return
+
+    all_urns = set(other_provider.schema_transforms().keys()).union(

Review Comment:
   Hmm... the only thing red was coverage in the presubmits. 
   
   Do you have a rollback already or should I create one? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb merged PR #30793:
URL: https://github.com/apache/beam/pull/30793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #30793:
URL: https://github.com/apache/beam/pull/30793#issuecomment-2093190723

   Looks like this likely broke our python tests -  https://github.com/apache/beam/actions/workflows/python_tests.yml?query=branch%3Amaster+event%3Apush - since this was merged, they've been permared/failing on yaml tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on code in PR #30793:
URL: https://github.com/apache/beam/pull/30793#discussion_r1589345057


##########
sdks/python/apache_beam/yaml/yaml_provider.py:
##########
@@ -152,6 +169,17 @@ def __init__(self, urns, service):
   def provided_transforms(self):
     return self._urns.keys()
 
+  def with_underlying_provider(self, other_provider):
+    if not isinstance(other_provider, ExternalProvider):
+      return
+
+    all_urns = set(other_provider.schema_transforms().keys()).union(

Review Comment:
   https://github.com/apache/beam/pull/31169 - still waiting on presubmits to go green



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #30793:
URL: https://github.com/apache/beam/pull/30793#issuecomment-2050038374

   Any update on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #30793:
URL: https://github.com/apache/beam/pull/30793#issuecomment-2051924037

   > Took a bit for me to wrap my head around the logic, but this makes sense to me.
   > 
   > So now with this logic, if we were to package all turnkey external transforms in a common gradle target, a single provider and expansion service could be used, correct?
   
   Yes. (Though the downside of a huge common target is that it may be much more heavyweight than what the user needs. There can also be dependency conflicts. But it's something we could experiment with.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [YAML] Increase re-use of providers with implicitly overlapping transforms. [beam]

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on code in PR #30793:
URL: https://github.com/apache/beam/pull/30793#discussion_r1589346429


##########
sdks/python/apache_beam/yaml/yaml_provider.py:
##########
@@ -152,6 +169,17 @@ def __init__(self, urns, service):
   def provided_transforms(self):
     return self._urns.keys()
 
+  def with_underlying_provider(self, other_provider):
+    if not isinstance(other_provider, ExternalProvider):
+      return
+
+    all_urns = set(other_provider.schema_transforms().keys()).union(

Review Comment:
   I checked that we've published snapshots and everything though



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org