You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/22 21:16:44 UTC

[GitHub] [beam] ihji opened a new pull request, #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

ihji opened a new pull request, #22004:
URL: https://github.com/apache/beam/pull/22004

   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ihji commented on pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by GitBox <gi...@apache.org>.
ihji commented on PR #22004:
URL: https://github.com/apache/beam/pull/22004#issuecomment-1181200727

   > I don't think we want the user to manually create an External transform that is a union of External transforms--the user might not even know which transforms are external and which ones are not. Instead, we should allow these transforms to be composed arbitrarily.
   > 
   > This may require creating a new type/coder in Java that represents the encoded form of the external type.
   
   I also thought the similar approach but wasn't sure if it's good idea to add the dummy (virtually do nothing but just placeholder) coder for Python objects to the Java SDK. So instead I implemented smaller, more limited scoped feature only allowing to manually combine consecutive external transforms.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #22004:
URL: https://github.com/apache/beam/pull/22004#issuecomment-1183682339

   Fixes #22003 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on a diff in pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on code in PR #22004:
URL: https://github.com/apache/beam/pull/22004#discussion_r920523534


##########
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java:
##########
@@ -76,25 +81,36 @@
   "nullness" // TODO(https://github.com/apache/beam/issues/20497)
 })
 public class External {
-  private static final String EXPANDED_TRANSFORM_BASE_NAME = "external";
+  protected static final String EXPANDED_TRANSFORM_BASE_NAME = "external";
   private static final String IMPULSE_PREFIX = "IMPULSE";
   private static AtomicInteger namespaceCounter = new AtomicInteger(0);
 
   private static final ExpansionServiceClientFactory DEFAULT =
       DefaultExpansionServiceClientFactory.create(
           endPoint -> ManagedChannelBuilder.forTarget(endPoint.getUrl()).usePlaintext().build());
 
-  private static int getFreshNamespaceIndex() {
+  public static int getFreshNamespaceIndex() {
     return namespaceCounter.getAndIncrement();
   }
 
   public static <InputT extends PInput, OutputT>
       SingleOutputExpandableTransform<InputT, OutputT> of(
           String urn, byte[] payload, String endpoint) {
-    Endpoints.ApiServiceDescriptor apiDesc =
-        Endpoints.ApiServiceDescriptor.newBuilder().setUrl(endpoint).build();
     return new SingleOutputExpandableTransform<>(
-        urn, payload, apiDesc, DEFAULT, getFreshNamespaceIndex(), ImmutableMap.of());
+        ImmutableList.of(ExpansionInfo.create(urn, payload, endpoint, getFreshNamespaceIndex())),
+        DEFAULT,
+        ImmutableMap.of());
+  }
+
+  public static <InputT extends PInput, OutputT>
+      SingleOutputExpandableTransform<InputT, OutputT> of(

Review Comment:
   Why do we need this syntax ? Beam transform composing is done through building composite and "consecutive" external transforms would imply consecutive "apply" methods to me. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] closed pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK
URL: https://github.com/apache/beam/pull/22004


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #22004: [WIP/BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by GitBox <gi...@apache.org>.
robertwb commented on PR #22004:
URL: https://github.com/apache/beam/pull/22004#issuecomment-1163627193

   Could you give an example of what was broken/impossible before and is fixed by this change? (Ideally as a test :-). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #22004:
URL: https://github.com/apache/beam/pull/22004#issuecomment-1453481838

   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by GitBox <gi...@apache.org>.
robertwb commented on PR #22004:
URL: https://github.com/apache/beam/pull/22004#issuecomment-1182110472

   I think it makes sense to look at what the ideal API would be for the end-user, and then whether that's feasible to implement, rather than let the API reflect underlying implementation constraints (when possible). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on a diff in pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by GitBox <gi...@apache.org>.
robertwb commented on code in PR #22004:
URL: https://github.com/apache/beam/pull/22004#discussion_r917163121


##########
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java:
##########
@@ -76,25 +81,36 @@
   "nullness" // TODO(https://github.com/apache/beam/issues/20497)
 })
 public class External {
-  private static final String EXPANDED_TRANSFORM_BASE_NAME = "external";
+  protected static final String EXPANDED_TRANSFORM_BASE_NAME = "external";
   private static final String IMPULSE_PREFIX = "IMPULSE";
   private static AtomicInteger namespaceCounter = new AtomicInteger(0);
 
   private static final ExpansionServiceClientFactory DEFAULT =
       DefaultExpansionServiceClientFactory.create(
           endPoint -> ManagedChannelBuilder.forTarget(endPoint.getUrl()).usePlaintext().build());
 
-  private static int getFreshNamespaceIndex() {
+  public static int getFreshNamespaceIndex() {

Review Comment:
   I don't think this (or EXPANDED_TRANSFORM_BASE_NAME) should be exposed. If we have to expose something, we could expose getFreshNamespace() that returns the concatenation, but best to not expose these internal implementation details at all. 



##########
sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java:
##########
@@ -435,6 +437,15 @@ public OutputT expand(InputT input) {
     }
   }
 
+  public List<External.ExpansionInfo> getExpansionInfoList() {

Review Comment:
   This shouldn't be needed (it doesn't overload anything, right?)



##########
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java:
##########
@@ -221,6 +221,17 @@ protected void pythonDependenciesTest(Pipeline pipeline) {
               .apply(External.of(TEST_PYTHON_BS4_URN, new byte[] {}, expansionAddr));
       PAssert.that(col).containsInAnyOrder("The Dormouse's story");
     }
+
+    protected void combineMultipleTransformTest(Pipeline pipeline) throws IOException {
+      PCollection<String> col =
+          pipeline
+              .apply(Create.of(1L, 2L, 3L))
+              .apply(
+                  External.of(
+                      External.of("map_to_union_types", new byte[] {}, expansionAddr),

Review Comment:
   Reading this test, I wouldn't know where that is defined. Similar for TEST_PREFIX_URN. Let's make this self-contained, e.g. 
   
   ```
   pipeline
   .apply(Create.of(-1, 0, 1))
   .apply(PythonMap.viaMapFn("lambda x: 'negative' if x < 0 else x"))
   .apply(PythonMap.viaMapFn("type"))
   .apply(PythonMap.viaMapFn("str"));
   ```
   
   which will result in "str", "int", "int".



##########
sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java:
##########
@@ -435,6 +437,15 @@ public OutputT expand(InputT input) {
     }
   }
 
+  public List<External.ExpansionInfo> getExpansionInfoList() {
+    return ImmutableList.of(
+        External.ExpansionInfo.create(
+            "beam:transforms:python:fully_qualified_named",
+            generatePayload().toByteArray(),
+            expansionService,
+            External.getFreshNamespaceIndex()));

Review Comment:
   This has side effects, shouldn't be called multiple times (especially from a getter) as it returns a different value each time. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ihji commented on a diff in pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by GitBox <gi...@apache.org>.
ihji commented on code in PR #22004:
URL: https://github.com/apache/beam/pull/22004#discussion_r906359505


##########
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java:
##########
@@ -221,6 +221,17 @@ protected void pythonDependenciesTest(Pipeline pipeline) {
               .apply(External.of(TEST_PYTHON_BS4_URN, new byte[] {}, expansionAddr));
       PAssert.that(col).containsInAnyOrder("The Dormouse's story");
     }
+
+    protected void combineMultipleTransformTest(Pipeline pipeline) throws IOException {
+      PCollection<String> col =
+          pipeline
+              .apply(Create.of(1L, 2L, 3L))
+              .apply(
+                  External.of(
+                      External.of("map_to_union_types", new byte[] {}, expansionAddr),

Review Comment:
   This test will never succeed without the ability to merge consecutive external transforms since `map_to_union_types` return a union of int, string and float types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ihji commented on pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by GitBox <gi...@apache.org>.
ihji commented on PR #22004:
URL: https://github.com/apache/beam/pull/22004#issuecomment-1165880963

   R: @robertwb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ihji commented on pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by GitBox <gi...@apache.org>.
ihji commented on PR #22004:
URL: https://github.com/apache/beam/pull/22004#issuecomment-1166196637

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #22004: [BEAM-22003] Allow merging consecutive external transforms in Java SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #22004:
URL: https://github.com/apache/beam/pull/22004#issuecomment-1463756466

   This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org