You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by da...@apache.org on 2022/10/03 17:34:27 UTC

[beam] branch master updated: [Website] Add new Java quickstart (#22747)

This is an automated email from the ASF dual-hosted git repository.

damccorm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new d28a93e07f1 [Website] Add new Java quickstart (#22747)
d28a93e07f1 is described below

commit d28a93e07f1144387ab2cb075708b9f420805660
Author: Veronica Wasson <39...@users.noreply.github.com>
AuthorDate: Mon Oct 3 10:34:20 2022 -0700

    [Website] Add new Java quickstart (#22747)
    
    * [WebSite] Add new Java quickstart
    
    * Incorporate review feedback
    
    * Code indentation style
    
    * Fix trailing whitespace
    
    * Fix Java version in prereqs
    
    * Minor formatting and typo fix
    
    Co-authored-by: root <ro...@google.com>
    Co-authored-by: Veronica Wasson <Ve...@users.noreply.github.com>
---
 .../site/content/en/get-started/quickstart-java.md |   4 +-
 .../site/content/en/get-started/quickstart/java.md | 195 +++++++++++++++++++++
 .../partials/section-menu/en/get-started.html      |   4 +-
 3 files changed, 200 insertions(+), 3 deletions(-)

diff --git a/website/www/site/content/en/get-started/quickstart-java.md b/website/www/site/content/en/get-started/quickstart-java.md
index e85ef5ffb86..d0884385352 100644
--- a/website/www/site/content/en/get-started/quickstart-java.md
+++ b/website/www/site/content/en/get-started/quickstart-java.md
@@ -1,5 +1,5 @@
 ---
-title: "Beam Quickstart for Java"
+title: "WordCount quickstart for Java"
 aliases:
   - /get-started/quickstart/
   - /use/quickstart/
@@ -19,7 +19,7 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-# Apache Beam Java SDK quickstart
+# WordCount quickstart for Java
 
 This quickstart shows you how to set up a Java development environment and run
 an [example pipeline](/get-started/wordcount-example) written with the
diff --git a/website/www/site/content/en/get-started/quickstart/java.md b/website/www/site/content/en/get-started/quickstart/java.md
new file mode 100644
index 00000000000..00a2d5b1624
--- /dev/null
+++ b/website/www/site/content/en/get-started/quickstart/java.md
@@ -0,0 +1,195 @@
+---
+title: "Beam Quickstart for Java"
+aliases:
+  - /get-started/quickstart/
+  - /use/quickstart/
+  - /getting-started/
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Apache Beam Java SDK quickstart
+
+This quickstart shows you how to run an
+[example pipeline](https://github.com/apache/beam-starter-java) written with
+the [Apache Beam Java SDK](/documentation/sdks/java), using the
+[Direct Runner](/documentation/runners/direct/). The Direct Runner executes
+pipelines locally on your machine.
+
+If you're interested in contributing to the Apache Beam Java codebase, see the
+[Contribution Guide](/contribute).
+
+On this page:
+
+{{< toc >}}
+
+## Set up your development environment
+
+Use [`sdkman`](https://sdkman.io/) to install the Java Development Kit (JDK).
+
+{{< highlight >}}
+# Install sdkman
+curl -s "https://get.sdkman.io" | bash
+
+# Install Java 11
+sdk install java 11.0.16-tem
+{{< /highlight >}}
+
+You can use either [Gradle](https://gradle.org/) or
+[Apache Maven](https://maven.apache.org/) to run this quickstart:
+
+{{< highlight >}}
+# Install Gradle
+sdk install gradle
+
+# Install Maven
+sdk install maven
+{{< /highlight >}}
+
+## Clone the GitHub repository
+
+Clone or download the
+[apache/beam-starter-java](https://github.com/apache/beam-starter-java) GitHub
+repository and change into the `beam-starter-java` directory.
+
+{{< highlight >}}
+git clone https://github.com/apache/beam-starter-java.git
+cd beam-starter-java
+{{< /highlight >}}
+
+## Run the quickstart
+
+**Gradle**: To run the quickstart with Gradle, run the following command:
+
+{{< highlight >}}
+gradle run --args='--inputText=Greetings'
+{{< /highlight >}}
+
+**Maven**: To run the quickstart with Maven, run the following command:
+
+{{< highlight >}}
+mvn compile exec:java -Dexec.args=--inputText='Greetings'
+{{< /highlight >}}
+
+The output is similar to the following:
+
+{{< highlight >}}
+Hello
+World!
+Greetings
+{{< /highlight >}}
+
+The lines might appear in a different order.
+
+## Explore the code
+
+The main code file for this quickstart is **App.java**
+([GitHub](https://github.com/apache/beam-starter-java/blob/main/src/main/java/com/example/App.java)).
+The code performs the following steps:
+
+1. Create a Beam pipeline.
+3. Create an initial `PCollection`.
+3. Apply a transform to the `PCollection`.
+4. Run the pipeline, using the Direct Runner.
+
+### Create a pipeline
+
+The code first creates a `Pipeline` object. The `Pipeline` object builds up the
+graph of transformations to be executed.
+
+```java
+var options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
+var pipeline = Pipeline.create(options);
+```
+
+The `PipelineOptions` object lets you set various options for the pipeline. The
+`fromArgs` method shown in this example parses command-line arguments, which
+lets you set pipeline options through the command line.
+
+### Create an initial PCollection
+
+The `PCollection` abstraction represents a potentially distributed,
+multi-element data set. A Beam pipeline needs a source of data to populate an
+initial `PCollection`. The source can be bounded (with a known, fixed size) or
+unbounded (with unlimited size).
+
+This example uses the
+[`Create.of`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Create.html)
+method to create a `PCollection` from an in-memory array of strings. The
+resulting `PCollection` contains the strings "Hello", "World!", and a
+user-provided input string.
+
+```java
+return pipeline
+  .apply("Create elements", Create.of(Arrays.asList("Hello", "World!", inputText)))
+```
+
+### Apply a transform to the PCollection
+
+Transforms can change, filter, group, analyze, or otherwise process the
+elements in a `PCollection`. This example uses the
+[`MapElements`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/MapElements.html)
+transform, which maps the elements of a collection into a new collection:
+
+```java
+.apply("Print elements",
+    MapElements.into(TypeDescriptors.strings()).via(x -> {
+      System.out.println(x);
+      return x;
+    }));
+```
+
+where
+
+* `into` specifies the data type for the elements in the output collection.
+* `via` defines a mapping function that is called on each element of the input
+  collection to create the output collection.
+
+In this example, the mapping function is a lambda that just returns the
+original value. It also prints the value to `System.out` as a side effect.
+
+### Run the pipeline
+
+The code shown in the previous sections defines a pipeline, but does not
+process any data yet. To process data, you run the pipeline:
+
+```java
+pipeline.run().waitUntilFinish();
+```
+
+A Beam [runner](https://beam.apache.org/documentation/basics/#runner) runs a
+Beam pipeline on a specific platform. This example uses the
+[Direct Runner](https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/runners/direct/DirectRunner.html),
+which is the default runner if you don't specify one. The Direct Runner runs
+the pipeline locally on your machine. It is meant for testing and development,
+rather than being optimized for efficiency. For more information, see
+[Using the Direct Runner](https://beam.apache.org/documentation/runners/direct/).
+
+For production workloads, you typically use a distributed runner that runs the
+pipeline on a big data processing system such as Apache Flink, Apache Spark, or
+Google Cloud Dataflow. These systems support massively parallel processing.
+
+## Next Steps
+
+* Learn more about the [Beam SDK for Java](/documentation/sdks/java/)
+  and look through the
+  [Java SDK API reference](https://beam.apache.org/releases/javadoc).
+* Take a self-paced tour through our
+  [Learning Resources](/documentation/resources/learning-resources).
+* Dive in to some of our favorite
+  [Videos and Podcasts](/documentation/resources/videos-and-podcasts).
+* Join the Beam [users@](/community/contact-us) mailing list.
+
+Please don't hesitate to [reach out](/community/contact-us) if you encounter any
+issues!
diff --git a/website/www/site/layouts/partials/section-menu/en/get-started.html b/website/www/site/layouts/partials/section-menu/en/get-started.html
index 65e8be1a27a..63507fda4f7 100644
--- a/website/www/site/layouts/partials/section-menu/en/get-started.html
+++ b/website/www/site/layouts/partials/section-menu/en/get-started.html
@@ -19,10 +19,12 @@
   <ul class="section-nav-list">
     <li><a href="/get-started/try-apache-beam/">Try Apache Beam</a></li>
     <li><a href="/get-started/try-beam-playground/">Try Beam Playground (Beta)</a></li>
-    <li><a href="/get-started/quickstart-java/">Java quickstart</a></li>
+    <li><a href="/get-started/quickstart/java/">Java quickstart</a></li>
     <li><a href="/get-started/quickstart-py/">Python quickstart</a></li>
     <li><a href="/get-started/quickstart-go/">Go quickstart</a></li>
     <li><a href="/get-started/from-spark/">Apache Spark</a></li>
+    <li><a href="/get-started/quickstart-java/">WordCount quickstart for
+        Java</a></li>
   </ul>
 </li>
 <li><a href="/get-started/downloads">Install the SDK</a></li>