You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/02 22:06:35 UTC

[GitHub] [beam] robertwb opened a new pull request, #17822: Allow creation of dynamically defined transforms in the Python expansion service.

robertwb opened a new pull request, #17822:
URL: https://github.com/apache/beam/pull/17822

   This frees one from being restricted to invoking already existing,
   named transform.
   
   There are two variants:
   
     The first, easiest to use, allows one to specify a Python callable
     that is then invoked on with the input as the first parameter.
   
     The second allows one to define and return an entire PTransform, which
     can be useful if other properties need to be set (such as input and
     output types).
   
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Add a link to the appropriate issue in your description, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on a diff in pull request #17822: Allow creation of dynamically defined transforms in the Python expansion service.

Posted by GitBox <gi...@apache.org>.
robertwb commented on code in PR #17822:
URL: https://github.com/apache/beam/pull/17822#discussion_r893724451


##########
sdks/python/apache_beam/transforms/fully_qualified_named_transform.py:
##########
@@ -52,16 +52,37 @@ def __init__(self, constructor, args, kwargs):
     self._kwargs = kwargs
 
   def expand(self, pinput):
-    return pinput | self._resolve(self._constructor)(
-        *self._args, **self._kwargs)
+    if self._constructor in ('__callable__', '__constructor__'):
+      self._check_allowed(self._constructor)
+      if self._args:
+        source, *args = tuple(self._args)
+        kwargs = self._kwargs
+      else:
+        args = self._args
+        kwargs = dict(self._kwargs)
+        source = kwargs.pop('source')
+
+      if self._constructor == '__constructor__':

Review Comment:
   Fair point. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on a diff in pull request #17822: Allow creation of dynamically defined transforms in the Python expansion service.

Posted by GitBox <gi...@apache.org>.
robertwb commented on code in PR #17822:
URL: https://github.com/apache/beam/pull/17822#discussion_r893725504


##########
sdks/python/apache_beam/utils/python_callable.py:
##########
@@ -88,6 +88,9 @@ def load_from_script(source):
       if line[0] != ' ':
         if line.startswith('def '):
           name = line[4:line.index('(')].strip()
+        elif line.startswith('class '):
+          name = line[5:line.index('(') if '(' in

Review Comment:
   Despite the syntax, in all cases it's "the last thing that is defined." Hopefully the docs help. (And the Python tests should serve as good documentation too.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #17822: Allow creation of dynamically defined transforms in the Python expansion service.

Posted by GitBox <gi...@apache.org>.
robertwb commented on PR #17822:
URL: https://github.com/apache/beam/pull/17822#issuecomment-1151351527

   PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb merged pull request #17822: Allow creation of dynamically defined transforms in the Python expansion service.

Posted by GitBox <gi...@apache.org>.
robertwb merged PR #17822:
URL: https://github.com/apache/beam/pull/17822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on a diff in pull request #17822: Allow creation of dynamically defined transforms in the Python expansion service.

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on code in PR #17822:
URL: https://github.com/apache/beam/pull/17822#discussion_r893939123


##########
sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java:
##########
@@ -62,7 +62,44 @@
     extends PTransform<InputT, OutputT> {
 
   private static final SchemaRegistry SCHEMA_REGISTRY = SchemaRegistry.createDefault();
+
+  /**
+   * The fully qualified name of the a Python callable that will be use to instantiate the
+   * transform. Often this is the fully qualified name of a PTransform class, in which case the
+   * arguments will be passed to its constructor, but any callable will do.
+   *
+   * <p>Two special names, {@code __callable__} and {@code __constructor__} can be used to define a
+   * suitable transform inline if none exists.
+   *
+   * <p>When {@code __callable__} is provided, the first argument (or {@code source} keyword
+   * argument) should be a {@link PythonCallableSource} which represents the expand method of the
+   * PTransform accepting and returning a PValue (and may also take additional arguments and keyword

Review Comment:
   {@link PTransform} and {@link PValue} (or @code if you don't want to import).



##########
sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java:
##########
@@ -62,7 +62,44 @@
     extends PTransform<InputT, OutputT> {
 
   private static final SchemaRegistry SCHEMA_REGISTRY = SchemaRegistry.createDefault();
+
+  /**
+   * The fully qualified name of the a Python callable that will be use to instantiate the
+   * transform. Often this is the fully qualified name of a PTransform class, in which case the
+   * arguments will be passed to its constructor, but any callable will do.
+   *
+   * <p>Two special names, {@code __callable__} and {@code __constructor__} can be used to define a
+   * suitable transform inline if none exists.
+   *
+   * <p>When {@code __callable__} is provided, the first argument (or {@code source} keyword
+   * argument) should be a {@link PythonCallableSource} which represents the expand method of the
+   * PTransform accepting and returning a PValue (and may also take additional arguments and keyword
+   * arguments). For example, one might write
+   *
+   * <pre>
+   * PythonExternalTransform
+   *     .from("__callable__")
+   *     .withArgs(
+   *         PythonCallable.of("def expand(pcoll, x, y): return pcoll | ..."),
+   *         valueForX,
+   *         valueForY);
+   * </pre>
+   *
+   * <p>When {@code __constructor__} is provided, the first argument (or {@code source} keyword

Review Comment:
   Can you please move this to the public "from(String tranformName)" method definition and refer to that from the public "from(String tranformName, String expansionService)" definition. Commends attached to private fields will not show up in Java docs I believe.



##########
sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java:
##########
@@ -62,7 +62,44 @@
     extends PTransform<InputT, OutputT> {
 
   private static final SchemaRegistry SCHEMA_REGISTRY = SchemaRegistry.createDefault();
+
+  /**
+   * The fully qualified name of the a Python callable that will be use to instantiate the
+   * transform. Often this is the fully qualified name of a PTransform class, in which case the

Review Comment:
   {@link PTransform}



##########
sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java:
##########
@@ -62,7 +62,44 @@
     extends PTransform<InputT, OutputT> {
 
   private static final SchemaRegistry SCHEMA_REGISTRY = SchemaRegistry.createDefault();
+
+  /**
+   * The fully qualified name of the a Python callable that will be use to instantiate the
+   * transform. Often this is the fully qualified name of a PTransform class, in which case the
+   * arguments will be passed to its constructor, but any callable will do.
+   *
+   * <p>Two special names, {@code __callable__} and {@code __constructor__} can be used to define a
+   * suitable transform inline if none exists.
+   *
+   * <p>When {@code __callable__} is provided, the first argument (or {@code source} keyword
+   * argument) should be a {@link PythonCallableSource} which represents the expand method of the
+   * PTransform accepting and returning a PValue (and may also take additional arguments and keyword
+   * arguments). For example, one might write
+   *
+   * <pre>
+   * PythonExternalTransform
+   *     .from("__callable__")
+   *     .withArgs(
+   *         PythonCallable.of("def expand(pcoll, x, y): return pcoll | ..."),
+   *         valueForX,
+   *         valueForY);
+   * </pre>
+   *
+   * <p>When {@code __constructor__} is provided, the first argument (or {@code source} keyword
+   * argument) should be a {@link PythonCallableSource} which will return the desired PTransform
+   * when called with the remaining arguments and keyword arguments. Often this will be a
+   * PythonCallable representing a PTransform class, for example

Review Comment:
   {@code PythonCallable}



##########
sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java:
##########
@@ -62,7 +62,44 @@
     extends PTransform<InputT, OutputT> {
 
   private static final SchemaRegistry SCHEMA_REGISTRY = SchemaRegistry.createDefault();
+
+  /**
+   * The fully qualified name of the a Python callable that will be use to instantiate the

Review Comment:
   s/use/used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #17822: Allow creation of dynamically defined transforms in the Python expansion service.

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #17822:
URL: https://github.com/apache/beam/pull/17822#issuecomment-1145467089

   # [Codecov](https://codecov.io/gh/apache/beam/pull/17822?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#17822](https://codecov.io/gh/apache/beam/pull/17822?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a11c8d6) into [master](https://codecov.io/gh/apache/beam/commit/f24cedf4edb1312f2d07df00d0f29569dfcb4b39?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f24cedf) will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #17822      +/-   ##
   ==========================================
   + Coverage   74.07%   74.08%   +0.01%     
   ==========================================
     Files         697      697              
     Lines       91927    91942      +15     
   ==========================================
   + Hits        68092    68116      +24     
   + Misses      22590    22581       -9     
     Partials     1245     1245              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `83.75% <100.00%> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/17822?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...beam/transforms/fully\_qualified\_named\_transform.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9mdWxseV9xdWFsaWZpZWRfbmFtZWRfdHJhbnNmb3JtLnB5) | `100.00% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/internal/gcp/auth.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvZ2NwL2F1dGgucHk=) | `73.33% <0.00%> (-5.34%)` | :arrow_down: |
   | [...eam/runners/portability/fn\_api\_runner/execution.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL2V4ZWN1dGlvbi5weQ==) | `92.44% <0.00%> (-0.65%)` | :arrow_down: |
   | [sdks/python/apache\_beam/ml/inference/pytorch.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3B5dG9yY2gucHk=) | | |
   | [...thon/apache\_beam/ml/inference/pytorch\_inference.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3B5dG9yY2hfaW5mZXJlbmNlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...eam/runners/interactive/interactive\_environment.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9lbnZpcm9ubWVudC5weQ==) | `92.02% <0.00%> (+0.30%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.67% <0.00%> (+0.37%)` | :arrow_up: |
   | [sdks/python/apache\_beam/typehints/schemas.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3NjaGVtYXMucHk=) | `94.55% <0.00%> (+0.38%)` | :arrow_up: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `94.02% <0.00%> (+0.74%)` | :arrow_up: |
   | [...che\_beam/runners/interactive/interactive\_runner.py](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9ydW5uZXIucHk=) | `91.39% <0.00%> (+1.32%)` | :arrow_up: |
   | ... and [3 more](https://codecov.io/gh/apache/beam/pull/17822/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/17822?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/17822?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [f24cedf...a11c8d6](https://codecov.io/gh/apache/beam/pull/17822?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on a diff in pull request #17822: Allow creation of dynamically defined transforms in the Python expansion service.

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on code in PR #17822:
URL: https://github.com/apache/beam/pull/17822#discussion_r892949378


##########
sdks/python/apache_beam/transforms/fully_qualified_named_transform.py:
##########
@@ -52,16 +52,37 @@ def __init__(self, constructor, args, kwargs):
     self._kwargs = kwargs
 
   def expand(self, pinput):
-    return pinput | self._resolve(self._constructor)(
-        *self._args, **self._kwargs)
+    if self._constructor in ('__callable__', '__constructor__'):
+      self._check_allowed(self._constructor)
+      if self._args:
+        source, *args = tuple(self._args)
+        kwargs = self._kwargs
+      else:
+        args = self._args
+        kwargs = dict(self._kwargs)
+        source = kwargs.pop('source')
+
+      if self._constructor == '__constructor__':

Review Comment:
   So in the first case the function returns the PTransform. In the second case the function defines the "expand" method of a PTransform. Is that correct ?
   
   We should document all this properly in JavaExternalTransform since this usage is not clear without going through the details of the code.



##########
sdks/python/apache_beam/transforms/fully_qualified_named_transform.py:
##########
@@ -52,16 +52,37 @@ def __init__(self, constructor, args, kwargs):
     self._kwargs = kwargs
 
   def expand(self, pinput):
-    return pinput | self._resolve(self._constructor)(
-        *self._args, **self._kwargs)
+    if self._constructor in ('__callable__', '__constructor__'):

Review Comment:
   So the API is that user has to specific one of these strings as the method name and also make sure that the filter allows the same string ?



##########
sdks/python/apache_beam/utils/python_callable.py:
##########
@@ -88,6 +88,9 @@ def load_from_script(source):
       if line[0] != ' ':
         if line.startswith('def '):
           name = line[4:line.index('(')].strip()
+        elif line.startswith('class '):
+          name = line[5:line.index('(') if '(' in

Review Comment:
   Noting that we are extending the definition of the PythonCallable here, from a "function followed by some statements" to a "class followed by some statements that acts as a callable". This can be quite difficult to follow so please add appropriate documentation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #17822: Allow creation of dynamically defined transforms in the Python expansion service.

Posted by GitBox <gi...@apache.org>.
robertwb commented on PR #17822:
URL: https://github.com/apache/beam/pull/17822#issuecomment-1145391376

   R: @chamikaramj @ihji 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org