You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/15 13:31:48 UTC

[GitHub] [beam] pcoet commented on a diff in pull request #22263: Rewrote Java multi-language pipeline quickstart

pcoet commented on code in PR #22263:
URL: https://github.com/apache/beam/pull/22263#discussion_r922168920


##########
website/www/site/content/en/documentation/sdks/java-multi-language-pipelines.md:
##########
@@ -19,162 +19,185 @@ limitations under the License.
 
 # Java multi-language pipelines quickstart
 
-> **Note:** This page is a work in progress. Please see [Multi-language pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines) for full documentation.
-
-This page demonstrates how to write a Java pipeline that uses a Python cross-language transform.
-
-The goal of a cross-language pipeline is to incorporate transforms from one SDK (e.g. the Python SDK) into a pipeline written using another SDK (e.g. the Java SDK). This enables having already developed transforms (e.g. ML transforms in Python) and libraries (e.g. the vast library of IOs in Java), and strengths of certain languages at your disposal in whichever language you are more comfortable authoring pipelines while vastly expanding your toolkit in given language.
-
-In this section we will cover a specific use-case: incorporating a Python transform that does inference on a model but is part of a larger Java pipeline. The section is broken down into 2 parts:
-
-1. How to author the cross-language pipeline?
-1. How to run the cross-language pipeline?
-
-{{< language-switcher java py >}}
-
-## How to author the cross-language pipeline?
-
-This section digs into what changes when authoring a cross-language pipeline:
-
-1. "Classic" pipeline in Java
-1. External transform in Python
-1. Expansion server
-
-### "Classic" pipeline
-
-We start by developing an Apache Beam pipeline like we would normally do if you were using only one SDK (e.g. the Java SDK):
-
-{{< highlight java >}}
-public class CrossLanguageTransform extends PTransform<PCollection<String>, PCollection<String>> {
-    private static final String URN = "beam:transforms:xlang:pythontransform";
-
-    private static String expansionAddress;
-
-    public CrossLanguageTransform(String expansionAddress) {
-        this.expansionAddress = expansionAddress;
-    }
-
-    @Override
-    public PCollection<String> expand(PCollection<String> input) {
-        PCollection<String> output =
-            input.apply(
-                "ExternalPythonTransform",
-                External.of(URN, new byte [] {}, this.expansionAddress)
-            );
-    }
-}
-
-public class CrossLanguagePipeline {
-    public static void main(String[] args) {
-        Pipeline p = Pipeline.create();
-
-        String expansionAddress = "localhost:9097"
-
-        PCollection<String> inputs = p.apply(Create.of("features { feature { key: 'country' value { bytes_list { value: 'Belgium' }}}}"));
-        input.apply(new CrossLanguageTransform(expansionAddress));
-
-        p.run().waitUntilFinish();
-    }
+This page provides a high-level overview of creating multi-language pipelines
+with the Apache Beam SDK for Java. For a more complete discussion of the topic,
+see
+[Multi-language pipelines](/documentation/programming-guide/#multi-language-pipelines).
+
+A *multi-language pipeline* is a pipeline that’s built in one Beam SDK language
+and uses one or more transforms from another Beam SDK language. These transforms
+from another SDK are called *cross-language transforms*. Multi-language support
+makes pipeline components easier to share across the Beam SDKs and grows the
+pool of available transforms for all the SDKs.
+
+In the examples below, the multi-language pipeline is built with the Beam Java
+SDK, and the cross-language transform is built with the Beam Python SDK.
+
+## Prerequisites
+
+This quickstart is based on a Java example pipeline,
+[PythonDataframeWordCount](https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/PythonDataframeWordCount.java),
+that counts words in a Shakespeare text. If you’d like to run the pipeline, you
+can clone or download the Beam repository and build the example from the source
+code.
+
+To build and run the example, you need a Java environment with the Beam Java SDK
+version 2.40.0 or later installed, and a Python environment. If you don’t
+already have these environments set up, first complete the
+[Apache Beam Java SDK Quickstart](/get-started/quickstart-java/) and the
+[Apache Beam Python SDK Quickstart](/get-started/quickstart-py/).
+
+## Specify a cross-language transform
+
+The Java example pipeline uses the Python
+[DataframeTransform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/dataframe/transforms.py)
+as a cross-language transform. The transform is part of the
+[Beam Dataframe API](/documentation/dsls/dataframes/overview/) for working with
+pandas-like
+[DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
+objects.
+
+To apply a cross-language transform, your pipeline must specify it. Python
+transforms are identified by their fully qualified name. For example,
+`DataframeTransform` can be found in the `apache_beam.dataframe.transforms`
+package, so its fully qualified name is
+`apache_beam.dataframe.transforms.DataframeTransform`.
+The example pipeline,
+[PythonDataframeWordCount](https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/PythonDataframeWordCount.java),
+passes this fully qualified name to
+[PythonExternalTransform](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/extensions/python/PythonExternalTransform.html).
+There's also a higher-level cross-language [DataframeTransform](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/DataframeTransform.java)

Review Comment:
   That's great, thanks. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org