You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by he...@apache.org on 2022/11/14 19:16:07 UTC
[beam] branch master updated: Updates Multi-lang Java quickstart
This is an automated email from the ASF dual-hosted git repository.
heejong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 5bd75c25de2 Updates Multi-lang Java quickstart
new 4a044999b8e Merge pull request #24076 from chamikaramj/multilang_java_updates
5bd75c25de2 is described below
commit 5bd75c25de291e517cc5c5799ae4adaaaaceacb7
Author: Chamikara Jayalath <ch...@gmail.com>
AuthorDate: Wed Nov 9 18:04:28 2022 -0800
Updates Multi-lang Java quickstart
---
.../en/documentation/sdks/java-multi-language-pipelines.md | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/website/www/site/content/en/documentation/sdks/java-multi-language-pipelines.md b/website/www/site/content/en/documentation/sdks/java-multi-language-pipelines.md
index fe1fba52d17..855bd421681 100644
--- a/website/www/site/content/en/documentation/sdks/java-multi-language-pipelines.md
+++ b/website/www/site/content/en/documentation/sdks/java-multi-language-pipelines.md
@@ -50,6 +50,9 @@ already have these environments set up, first complete the
For running with portable DirectRunner, you need to have Docker installed
locally and the Docker daemon should be running. This is not needed for Dataflow.
+For running on Dataflow, you need a Google Cloud project with billing enabled and a
+[Google Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets).
+
This example relies on Python pandas package 1.4.0 or later which is unavailable
for Python versions earlier than 3.8. Hence please make sure that the default Python
version installed in your system is 3.8 or later.
@@ -70,7 +73,7 @@ transforms are identified by their fully qualified name. For example,
package, so its fully qualified name is
`apache_beam.dataframe.transforms.DataframeTransform`.
The example pipeline,
-[PythonDataframeWordCount](https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/PythonDataframeWordCount.java),
+[PythonDataframeWordCount](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/multilanguage/PythonDataframeWordCount.java),
passes this fully qualified name to
[PythonExternalTransform](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/extensions/python/PythonExternalTransform.html).
@@ -138,6 +141,9 @@ default Beam SDK, you might need to run your own expansion service. In such
cases, [start the expansion service](#advanced-start-an-expansion-service)
before running your pipeline.
+Before running the pipeline, make sure to perform the
+[runner specific setup](https://beam.apache.org/get-started/quickstart-java/#run-a-pipeline) for your selected Beam runner.
+
### Run with Dataflow runner at HEAD (Beam 2.41.0 and later)
> **Note:** Due to [issue#23717](https://github.com/apache/beam/issues/23717),
@@ -238,7 +244,7 @@ follow these steps:
2. Install Apache Beam with `gcp` and `dataframe` packages.
```
-pip install apache-beam[gcp,dataframe]
+pip install 'apache-beam[gcp,dataframe]'
```
4. Run the following command