You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/04 01:04:32 UTC

[GitHub] [beam] robertwb commented on a diff in pull request #23413: Updates ExpansionService to support dynamically discovering and expanding SchemaTransforms

robertwb commented on code in PR #23413:
URL: https://github.com/apache/beam/pull/23413#discussion_r1013517512


##########
sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java:
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.expansion.service;
+
+import static org.apache.beam.runners.core.construction.BeamUrns.getUrn;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.ServiceLoader;
+import org.apache.beam.model.pipeline.v1.ExternalTransforms.ExpansionMethods;
+import org.apache.beam.model.pipeline.v1.ExternalTransforms.SchemaTransformPayload;
+import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec;
+import org.apache.beam.sdk.coders.RowCoder;
+import org.apache.beam.sdk.expansion.service.ExpansionService.TransformProvider;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.SchemaTranslation;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.PInput;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.vendor.grpc.v1p48p1.com.google.protobuf.InvalidProtocolBufferException;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+@SuppressWarnings({"rawtypes"})
+public class ExpansionServiceSchemaTransformProvider implements TransformProvider {
+
+  static final String DEFAULT_INPUT_TAG = "INPUT";
+
+  private Map<String, org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider>
+      schemaTransformProviders = new HashMap<>();
+  private static @Nullable ExpansionServiceSchemaTransformProvider transformProvider = null;
+
+  private ExpansionServiceSchemaTransformProvider() {
+    try {
+      for (org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider schemaTransformProvider :
+          ServiceLoader.load(
+              org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider.class)) {
+        if (schemaTransformProviders.containsKey(schemaTransformProvider.identifier())) {
+          throw new IllegalArgumentException(
+              "Found multiple SchemaTransformProvider implementations with the same identifier "
+                  + schemaTransformProvider.identifier());
+        }
+        schemaTransformProviders.put(schemaTransformProvider.identifier(), schemaTransformProvider);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e.getMessage());
+    }
+  }
+
+  public static ExpansionServiceSchemaTransformProvider of() {
+    if (transformProvider == null) {
+      transformProvider = new ExpansionServiceSchemaTransformProvider();
+    }
+
+    return transformProvider;
+  }
+
+  static class RowTransform extends PTransform {

Review Comment:
   So this class basically exists to undo the logic at https://github.com/apache/beam/blob/release-2.42.0/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L348 . 
   
   I think we should instead override the createInput (and possibly extractOutputs) methods in ExpansionServiceSchemaTransformProvider.



##########
sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java:
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.expansion.service;
+
+import static org.apache.beam.runners.core.construction.BeamUrns.getUrn;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.ServiceLoader;
+import org.apache.beam.model.pipeline.v1.ExternalTransforms.ExpansionMethods;
+import org.apache.beam.model.pipeline.v1.ExternalTransforms.SchemaTransformPayload;
+import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec;
+import org.apache.beam.sdk.coders.RowCoder;
+import org.apache.beam.sdk.expansion.service.ExpansionService.TransformProvider;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.SchemaTranslation;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.PInput;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.vendor.grpc.v1p48p1.com.google.protobuf.InvalidProtocolBufferException;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+@SuppressWarnings({"rawtypes"})
+public class ExpansionServiceSchemaTransformProvider implements TransformProvider {
+
+  static final String DEFAULT_INPUT_TAG = "INPUT";
+
+  private Map<String, org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider>
+      schemaTransformProviders = new HashMap<>();
+  private static @Nullable ExpansionServiceSchemaTransformProvider transformProvider = null;
+
+  private ExpansionServiceSchemaTransformProvider() {
+    try {
+      for (org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider schemaTransformProvider :
+          ServiceLoader.load(
+              org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider.class)) {
+        if (schemaTransformProviders.containsKey(schemaTransformProvider.identifier())) {
+          throw new IllegalArgumentException(
+              "Found multiple SchemaTransformProvider implementations with the same identifier "
+                  + schemaTransformProvider.identifier());
+        }
+        schemaTransformProviders.put(schemaTransformProvider.identifier(), schemaTransformProvider);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e.getMessage());
+    }
+  }
+
+  public static ExpansionServiceSchemaTransformProvider of() {
+    if (transformProvider == null) {
+      transformProvider = new ExpansionServiceSchemaTransformProvider();
+    }
+
+    return transformProvider;
+  }
+
+  static class RowTransform extends PTransform {
+
+    private PTransform<PCollectionRowTuple, PCollectionRowTuple> rowTuplePTransform;
+
+    public RowTransform(PTransform<PCollectionRowTuple, PCollectionRowTuple> rowTuplePTransform) {
+      this.rowTuplePTransform = rowTuplePTransform;
+    }
+
+    @Override
+    public POutput expand(PInput input) {
+      PCollectionRowTuple inputRowTuple;
+
+      if (input instanceof PCollectionRowTuple) {
+        inputRowTuple = (PCollectionRowTuple) input;
+      } else if (input instanceof PCollection) {
+        inputRowTuple = PCollectionRowTuple.of(DEFAULT_INPUT_TAG, (PCollection) input);
+      } else if (input instanceof PBegin) {
+        inputRowTuple = PCollectionRowTuple.empty(input.getPipeline());
+      } else if (input instanceof PCollectionTuple) {
+        inputRowTuple = PCollectionRowTuple.empty(input.getPipeline());
+        PCollectionTuple inputTuple = (PCollectionTuple) input;
+        for (TupleTag<?> tag : inputTuple.getAll().keySet()) {
+          inputRowTuple = inputRowTuple.and(tag.getId(), (PCollection<Row>) inputTuple.get(tag));
+        }
+      } else {
+        throw new RuntimeException(String.format("Unsupported input type: %s", input));
+      }
+      PCollectionRowTuple output = inputRowTuple.apply(this.rowTuplePTransform);
+
+      if (output.getAll().size() > 1) {

Review Comment:
   I meant why can't we just return the PCollectionRowTuple itself? (Possibly we need to update the expansion service code to handle this type, but that should be done anyway.)



##########
model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/external_transforms.proto:
##########
@@ -51,6 +51,11 @@ message ExpansionMethods {
     // Transform payload will be of type JavaClassLookupPayload.
     JAVA_CLASS_LOOKUP = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) =
       "beam:expansion:payload:java_class_lookup:v1"];
+
+    // Expanding a SchemaTransform identified by the expansion service.
+    // Transform payload will be of type  SchemaTransformPayload.
+    SCHEMATRANSFORM = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) =

Review Comment:
   Another ping on this.



##########
sdks/python/apache_beam/transforms/external_test.py:
##########
@@ -445,6 +447,35 @@ class DataclassTransform(beam.ExternalTransform):
     return get_payload(DataclassTransform(**values))
 
 
+class SchemaTransformPayloadBuilderTest(unittest.TestCase):
+  def test_build_payload(self):
+    ComplexType = typing.NamedTuple(
+        "ComplexType", [
+            ("str_sub_field", str),
+            ("int_sub_field", np.int32),

Review Comment:
   Can't we just use int here? (Does that give us an int64?)



##########
sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java:
##########
@@ -118,6 +123,11 @@ public POutput expand(PInput input) {
         return PDone.in(input.getPipeline());
       }
     }
+
+    @Override
+    public String getName() {
+      return "RowTransform_of_" + this.rowTuplePTransform.getName();

Review Comment:
   Does the "RowTransform_of_" prefix add value? Maybe simply drop it, as this will be the name the user sees before drilling down. 



##########
sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java:
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.expansion.service;
+
+import static org.apache.beam.runners.core.construction.BeamUrns.getUrn;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.ServiceLoader;
+import org.apache.beam.model.pipeline.v1.ExternalTransforms.ExpansionMethods;
+import org.apache.beam.model.pipeline.v1.ExternalTransforms.SchemaTransformPayload;
+import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec;
+import org.apache.beam.sdk.coders.RowCoder;
+import org.apache.beam.sdk.expansion.service.ExpansionService.TransformProvider;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.SchemaTranslation;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.PInput;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.vendor.grpc.v1p48p1.com.google.protobuf.InvalidProtocolBufferException;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+@SuppressWarnings({"rawtypes"})
+public class ExpansionServiceSchemaTransformProvider implements TransformProvider {
+
+  static final String DEFAULT_INPUT_TAG = "INPUT";
+
+  private Map<String, org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider>
+      schemaTransformProviders = new HashMap<>();
+  private static @Nullable ExpansionServiceSchemaTransformProvider transformProvider = null;
+
+  private ExpansionServiceSchemaTransformProvider() {
+    try {
+      for (org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider schemaTransformProvider :
+          ServiceLoader.load(
+              org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider.class)) {
+        if (schemaTransformProviders.containsKey(schemaTransformProvider.identifier())) {
+          throw new IllegalArgumentException(
+              "Found multiple SchemaTransformProvider implementations with the same identifier "
+                  + schemaTransformProvider.identifier());
+        }
+        schemaTransformProviders.put(schemaTransformProvider.identifier(), schemaTransformProvider);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e.getMessage());
+    }
+  }
+
+  public static ExpansionServiceSchemaTransformProvider of() {
+    if (transformProvider == null) {
+      transformProvider = new ExpansionServiceSchemaTransformProvider();
+    }
+
+    return transformProvider;
+  }
+
+  static class RowTransform extends PTransform {
+
+    private PTransform<PCollectionRowTuple, PCollectionRowTuple> rowTuplePTransform;
+
+    public RowTransform(PTransform<PCollectionRowTuple, PCollectionRowTuple> rowTuplePTransform) {
+      this.rowTuplePTransform = rowTuplePTransform;
+    }
+
+    @Override
+    public POutput expand(PInput input) {
+      PCollectionRowTuple inputRowTuple;
+
+      if (input instanceof PCollectionRowTuple) {
+        inputRowTuple = (PCollectionRowTuple) input;
+      } else if (input instanceof PCollection) {
+        inputRowTuple = PCollectionRowTuple.of(DEFAULT_INPUT_TAG, (PCollection) input);
+      } else if (input instanceof PBegin) {
+        inputRowTuple = PCollectionRowTuple.empty(input.getPipeline());
+      } else if (input instanceof PCollectionTuple) {
+        inputRowTuple = PCollectionRowTuple.empty(input.getPipeline());
+        PCollectionTuple inputTuple = (PCollectionTuple) input;
+        for (TupleTag<?> tag : inputTuple.getAll().keySet()) {
+          inputRowTuple = inputRowTuple.and(tag.getId(), (PCollection<Row>) inputTuple.get(tag));
+        }
+      } else {
+        throw new RuntimeException(String.format("Unsupported input type: %s", input));
+      }
+      PCollectionRowTuple output = inputRowTuple.apply(this.rowTuplePTransform);
+
+      if (output.getAll().size() > 1) {
+        PCollectionTuple pcTuple = PCollectionTuple.empty(input.getPipeline());
+        for (String key : output.getAll().keySet()) {
+          pcTuple = pcTuple.and(key, output.get(key));
+        }
+        return pcTuple;
+      } else if (output.getAll().size() == 1) {
+        return output.getAll().values().iterator().next();
+      } else {
+        return PDone.in(input.getPipeline());
+      }
+    }
+  }
+
+  @Override
+  public PTransform getTransform(FunctionSpec spec) {
+    SchemaTransformPayload payload;
+    try {
+      payload = SchemaTransformPayload.parseFrom(spec.getPayload());
+      String identifier = payload.getIdentifier();
+      if (!schemaTransformProviders.containsKey(identifier)) {
+        throw new RuntimeException(
+            "Did not find a SchemaTransformProvider with the identifier " + identifier);
+      }
+
+    } catch (InvalidProtocolBufferException e) {
+      throw new IllegalArgumentException(
+          "Invalid payload type for URN " + getUrn(ExpansionMethods.Enum.SCHEMATRANSFORM), e);
+    }
+
+    String identifier = payload.getIdentifier();
+    org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider provider =
+        schemaTransformProviders.get(identifier);
+    if (provider == null) {
+      throw new IllegalArgumentException(
+          "Could not find a SchemaTransform with identifier " + identifier);
+    }
+
+    Schema configSchemaFromRequest =
+        SchemaTranslation.schemaFromProto((payload.getConfigurationSchema()));
+    Schema configSchemaFromProvider = provider.configurationSchema();
+
+    if (!configSchemaFromRequest.assignableTo(configSchemaFromProvider)) {
+      throw new IllegalArgumentException(
+          String.format(
+              "Config schema provided with the expansion request %s is not compatible with the "
+                  + "config of the Schema transform %s.",
+              configSchemaFromRequest, configSchemaFromProvider));
+    }
+
+    Row configRow;
+    try {
+      configRow =
+          RowCoder.of(provider.configurationSchema())
+              .decode(payload.getConfigurationRow().newInput());
+    } catch (IOException e) {
+      throw new RuntimeException("Error decoding payload", e);
+    }
+
+    return new RowTransform(provider.from(configRow).buildTransform());
+  }
+
+  Iterable<org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider> getAllProviders() {
+    return schemaTransformProviders.values();

Review Comment:
   I would hope that we can eventually replace writing a Provider with a decorator on the (suitably configured) PTransform class itself, but that's future work.



##########
sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java:
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.expansion.service;
+
+import static org.apache.beam.runners.core.construction.BeamUrns.getUrn;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.ServiceLoader;
+import org.apache.beam.model.pipeline.v1.ExternalTransforms.ExpansionMethods;
+import org.apache.beam.model.pipeline.v1.ExternalTransforms.SchemaTransformPayload;
+import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec;
+import org.apache.beam.sdk.coders.RowCoder;
+import org.apache.beam.sdk.expansion.service.ExpansionService.TransformProvider;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.SchemaTranslation;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.PInput;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.vendor.grpc.v1p48p1.com.google.protobuf.InvalidProtocolBufferException;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+@SuppressWarnings({"rawtypes"})
+public class ExpansionServiceSchemaTransformProvider implements TransformProvider {
+
+  static final String DEFAULT_INPUT_TAG = "INPUT";
+
+  private Map<String, org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider>
+      schemaTransformProviders = new HashMap<>();
+  private static @Nullable ExpansionServiceSchemaTransformProvider transformProvider = null;
+
+  private ExpansionServiceSchemaTransformProvider() {
+    try {
+      for (org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider schemaTransformProvider :
+          ServiceLoader.load(
+              org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider.class)) {
+        if (schemaTransformProviders.containsKey(schemaTransformProvider.identifier())) {
+          throw new IllegalArgumentException(
+              "Found multiple SchemaTransformProvider implementations with the same identifier "
+                  + schemaTransformProvider.identifier());
+        }
+        schemaTransformProviders.put(schemaTransformProvider.identifier(), schemaTransformProvider);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e.getMessage());
+    }
+  }
+
+  public static ExpansionServiceSchemaTransformProvider of() {
+    if (transformProvider == null) {
+      transformProvider = new ExpansionServiceSchemaTransformProvider();
+    }
+
+    return transformProvider;
+  }
+
+  static class RowTransform extends PTransform {
+
+    private PTransform<PCollectionRowTuple, PCollectionRowTuple> rowTuplePTransform;
+
+    public RowTransform(PTransform<PCollectionRowTuple, PCollectionRowTuple> rowTuplePTransform) {
+      this.rowTuplePTransform = rowTuplePTransform;
+    }
+
+    @Override
+    public POutput expand(PInput input) {

Review Comment:
   One can nest transforms of the same name. (In fact, nesting with distinct prefixes is how we get around users having to specify names *everywhere*.)



##########
sdks/python/apache_beam/transforms/external.py:
##########
@@ -289,6 +310,70 @@ def _has_constructor(self):
         self._constructor_param_kwargs)
 
 
+# Information regarding a SchemaTransform available in an external SDK.
+SchemaTransformsConfig = namedtuple(
+    'SchemaTransformsConfig',
+    ['identifier', 'configuration_schema', 'inputs', 'outputs'])
+
+
+class SchemaAwareExternalTransform(ptransform.PTransform):
+  """A proxy transform for SchemaTransforms implemented in external SDKs.
+
+  This allows Python pipelines to directly use existing SchemaTransforms
+  available to the expansion service without adding additional code in external
+  SDKs.
+
+  :param identifier: unique identifier of the SchemaTransform.
+  :param expansion_service: (Optional) an expansion service to use.  If none is
+      provided, a default expansion service will be started.
+  :param classpath: (Optional) A list paths to additional jars to place on the
+      expansion service classpath.
+  :kwargs: field name to value mapping for configuring the schema transform.
+      keys map to the field names of the schema of the SchemaTransform
+      (in-order).
+  """
+  def __init__(
+      self, identifier, expansion_service=None, classpath=None, **kwargs):
+    self._expansion_service = expansion_service
+    self._payload_builder = SchemaTransformPayloadBuilder(identifier, **kwargs)
+    self._classpath = classpath
+
+  def expand(self, pcolls):
+    # Register transform with the expansion service and the identifier.
+    # Expand the transform using the expansion service and the config_row.
+    if self._expansion_service is None:
+      self._expansion_service = BeamJarExpansionService(
+          ':sdks:java:expansion-service:app:shadowJar',

Review Comment:
   Do we expect many schema transforms to live in this jar, or should we make identifying the expansion service mandatory?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org