You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by da...@apache.org on 2017/04/21 18:11:44 UTC

[1/4] beam-site git commit: [BEAM-825] Fill in the documentation/runners/apex portion of the website

Repository: beam-site
Updated Branches:
  refs/heads/asf-site dd3a16da7 -> 35d630627


[BEAM-825] Fill in the documentation/runners/apex portion of the website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/01641c2b
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/01641c2b
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/01641c2b

Branch: refs/heads/asf-site
Commit: 01641c2b76b1ad59a384e4f8f5456e3b80e5bed4
Parents: dd3a16d
Author: Sandeep Deshmukh <sa...@datatorrent.com>
Authored: Thu Dec 1 19:19:17 2016 +0530
Committer: Davor Bonaci <da...@google.com>
Committed: Fri Apr 21 11:10:45 2017 -0700

----------------------------------------------------------------------
 src/documentation/runners/apex.md | 43 +++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/01641c2b/src/documentation/runners/apex.md
----------------------------------------------------------------------
diff --git a/src/documentation/runners/apex.md b/src/documentation/runners/apex.md
index 050eb83..a74c310 100644
--- a/src/documentation/runners/apex.md
+++ b/src/documentation/runners/apex.md
@@ -5,5 +5,46 @@ permalink: /documentation/runners/apex/
 ---
 # Using the Apache Apex Runner
 
-This page is under construction ([BEAM-825](https://issues.apache.org/jira/browse/BEAM-825)).
+The Apex Runner executes Apache Beam pipelines using [Apache Apex](http://apex.apache.org/) as an underlying engine. The runner has broad support for the [Beam model and supports streaming and batch pipelines]({{ site.baseurl }}/documentation/runners/capability-matrix/).
+
+[Apache Apex](http://apex.apache.org/) is a stream processing platform and framework for low-latency, high-throughput and fault-tolerant analytics applications on Apache Hadoop. Apex has a unified streaming architecture and can be used for real-time and batch processing. With its stateful stream processing architecture, Apex can support all of the concepts in the Beam model (event time, triggers, watermarks etc.).
+
+## Apex-Runner prerequisites and setup
+
+You may set up your own Hadoop cluster,  and [setup Apache Apex on top of it](http://apex.apache.org/docs/apex/apex_development_setup/) or choose any vendor-specific distribution that includes Hadoop and Apex pre-installed. Please see the [distribution information on the Apache Apex website](http://apex.apache.org/downloads.html).
+
+## Running wordcount using Apex-Runner
+
+Download some data for processing and put it on HDFS
+```
+curl http://www.gutenberg.org/cache/epub/1128/pg1128.txt > /tmp/kinglear.txt
+hdfs dfs -mkdir -p /tmp/input/
+hdfs dfs -put /tmp/kinglear.txt /tmp/input/
+```
+
+The output directory should not exist on HDFS. Delete it if it exists.
+```
+hdfs dfs -rm -r -f /tmp/output/
+```
+
+Run the wordcount example
+```
+mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=/tmp/input/ --output=/tmp/output/ --runner=ApexRunner --embeddedExecution=false --configFile=beam-runners-apex.properties" -Papex-runner
+```
+
+This will launch an Apex application.
+
+## Checking output
+
+The sample program which is processing small amount of data would finish quickly. You can check contents on /tmp/output/ on HDFS
+```
+hdfs dfs -ls /tmp/output/
+```
+
+## Montoring progress of your job
+
+Depending on your installation, you may be able to monitor the progress of your job on the Hadoop cluster. Alternately, you have folloing optoins:
+
+* YARN : Using YARN web UI generally running on 8088 on the node running resource manager
+* Apex cli: [Using apex cli to get running application information](http://apex.apache.org/docs/apex/apex_cli/#apex-cli-commands)
 


[4/4] beam-site git commit: This closes #219

Posted by da...@apache.org.
This closes #219


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/35d63062
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/35d63062
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/35d63062

Branch: refs/heads/asf-site
Commit: 35d630627978c9fd7d340f31f859049f7bbcf8b6
Parents: dd3a16d 876a895
Author: Davor Bonaci <da...@google.com>
Authored: Fri Apr 21 11:11:22 2017 -0700
Committer: Davor Bonaci <da...@google.com>
Committed: Fri Apr 21 11:11:22 2017 -0700

----------------------------------------------------------------------
 content/documentation/runners/apex/index.html | 58 +++++++++++++++++++++-
 src/documentation/runners/apex.md             | 57 ++++++++++++++++++++-
 2 files changed, 113 insertions(+), 2 deletions(-)
----------------------------------------------------------------------



[2/4] beam-site git commit: Fix Apex runner instructions (pending review comments and other changes) closes #98

Posted by da...@apache.org.
Fix Apex runner instructions (pending review comments and other changes)
closes #98


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/0da810c9
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/0da810c9
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/0da810c9

Branch: refs/heads/asf-site
Commit: 0da810c9ca062bbe85fd17581aeacac04b5d2ffe
Parents: 01641c2
Author: Thomas Weise <th...@apache.org>
Authored: Fri Apr 21 00:55:52 2017 -0700
Committer: Davor Bonaci <da...@google.com>
Committed: Fri Apr 21 11:10:46 2017 -0700

----------------------------------------------------------------------
 src/documentation/runners/apex.md | 44 ++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 15 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/0da810c9/src/documentation/runners/apex.md
----------------------------------------------------------------------
diff --git a/src/documentation/runners/apex.md b/src/documentation/runners/apex.md
index a74c310..9e97d77 100644
--- a/src/documentation/runners/apex.md
+++ b/src/documentation/runners/apex.md
@@ -7,44 +7,58 @@ permalink: /documentation/runners/apex/
 
 The Apex Runner executes Apache Beam pipelines using [Apache Apex](http://apex.apache.org/) as an underlying engine. The runner has broad support for the [Beam model and supports streaming and batch pipelines]({{ site.baseurl }}/documentation/runners/capability-matrix/).
 
-[Apache Apex](http://apex.apache.org/) is a stream processing platform and framework for low-latency, high-throughput and fault-tolerant analytics applications on Apache Hadoop. Apex has a unified streaming architecture and can be used for real-time and batch processing. With its stateful stream processing architecture, Apex can support all of the concepts in the Beam model (event time, triggers, watermarks etc.).
+[Apache Apex](http://apex.apache.org/) is a stream processing platform and framework for low-latency, high-throughput and fault-tolerant analytics applications on Apache Hadoop. Apex has a unified streaming architecture and can be used for real-time and batch processing.
 
-## Apex-Runner prerequisites and setup
+## Apex Runner prerequisites
 
-You may set up your own Hadoop cluster,  and [setup Apache Apex on top of it](http://apex.apache.org/docs/apex/apex_development_setup/) or choose any vendor-specific distribution that includes Hadoop and Apex pre-installed. Please see the [distribution information on the Apache Apex website](http://apex.apache.org/downloads.html).
+You may set up your own Hadoop cluster. Beam does not require anything extra to launch the pipelines on YARN.
+An optional Apex installation may be useful for monitoring and troubleshooting.
+The Apex CLI can be [built](http://apex.apache.org/docs/apex/apex_development_setup/) or
+obtained as [binary build](http://www.atrato.io/blog/2017/04/08/apache-apex-cli/).
+For more download options see [distribution information on the Apache Apex website](http://apex.apache.org/downloads.html).
 
-## Running wordcount using Apex-Runner
+## Running wordcount using Apex Runner
 
-Download some data for processing and put it on HDFS
+Put data for processing into HDFS:
 ```
-curl http://www.gutenberg.org/cache/epub/1128/pg1128.txt > /tmp/kinglear.txt
 hdfs dfs -mkdir -p /tmp/input/
-hdfs dfs -put /tmp/kinglear.txt /tmp/input/
+hdfs dfs -put pom.xml /tmp/input/
 ```
 
-The output directory should not exist on HDFS. Delete it if it exists.
+The output directory should not exist on HDFS:
 ```
 hdfs dfs -rm -r -f /tmp/output/
 ```
 
-Run the wordcount example
+Run the wordcount example (*example project needs to be modified to include HDFS file provider*)
 ```
-mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=/tmp/input/ --output=/tmp/output/ --runner=ApexRunner --embeddedExecution=false --configFile=beam-runners-apex.properties" -Papex-runner
+mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=/tmp/input/pom.xml --output=/tmp/output/ --runner=ApexRunner --embeddedExecution=false --configFile=beam-runners-apex.properties" -Papex-runner
+```
+
+The application will run asynchronously. Check status with `yarn application -list -appStates ALL`
+
+The configuration file is optional, it can be used to influence how Apex operators are deployed into YARN containers.
+The following example will reduce the number of required containers by collocating the operators into the same container
+and lower the heap memory per operator - suitable for execution in a single node Hadoop sandbox.
+
+```
+dt.application.*.operator.*.attr.MEMORY_MB=64
+dt.stream.*.prop.locality=CONTAINER_LOCAL
+dt.application.*.operator.*.attr.TIMEOUT_WINDOW_COUNT=1200
 ```
 
-This will launch an Apex application.
 
 ## Checking output
 
-The sample program which is processing small amount of data would finish quickly. You can check contents on /tmp/output/ on HDFS
+Check the output of the pipeline in the HDFS location.
 ```
 hdfs dfs -ls /tmp/output/
 ```
 
 ## Montoring progress of your job
 
-Depending on your installation, you may be able to monitor the progress of your job on the Hadoop cluster. Alternately, you have folloing optoins:
+Depending on your installation, you may be able to monitor the progress of your job on the Hadoop cluster. Alternatively, you have following options:
 
-* YARN : Using YARN web UI generally running on 8088 on the node running resource manager
-* Apex cli: [Using apex cli to get running application information](http://apex.apache.org/docs/apex/apex_cli/#apex-cli-commands)
+* YARN : Using YARN web UI generally running on 8088 on the node running resource manager.
+* Apex command-line interface: [Using the Apex CLI to get running application information](http://apex.apache.org/docs/apex/apex_cli/#apex-cli-commands).
 


[3/4] beam-site git commit: Regenerate website

Posted by da...@apache.org.
Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/876a895b
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/876a895b
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/876a895b

Branch: refs/heads/asf-site
Commit: 876a895b9d4a49afbfdfe47d459bf164c8156deb
Parents: 0da810c
Author: Davor Bonaci <da...@google.com>
Authored: Fri Apr 21 11:11:22 2017 -0700
Committer: Davor Bonaci <da...@google.com>
Committed: Fri Apr 21 11:11:22 2017 -0700

----------------------------------------------------------------------
 content/documentation/runners/apex/index.html | 58 +++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/876a895b/content/documentation/runners/apex/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/runners/apex/index.html b/content/documentation/runners/apex/index.html
index 875f3e0..c91f8d6 100644
--- a/content/documentation/runners/apex/index.html
+++ b/content/documentation/runners/apex/index.html
@@ -153,7 +153,63 @@
       <div class="row">
         <h1 id="using-the-apache-apex-runner">Using the Apache Apex Runner</h1>
 
-<p>This page is under construction (<a href="https://issues.apache.org/jira/browse/BEAM-825">BEAM-825</a>).</p>
+<p>The Apex Runner executes Apache Beam pipelines using <a href="http://apex.apache.org/">Apache Apex</a> as an underlying engine. The runner has broad support for the <a href="/documentation/runners/capability-matrix/">Beam model and supports streaming and batch pipelines</a>.</p>
+
+<p><a href="http://apex.apache.org/">Apache Apex</a> is a stream processing platform and framework for low-latency, high-throughput and fault-tolerant analytics applications on Apache Hadoop. Apex has a unified streaming architecture and can be used for real-time and batch processing.</p>
+
+<h2 id="apex-runner-prerequisites">Apex Runner prerequisites</h2>
+
+<p>You may set up your own Hadoop cluster. Beam does not require anything extra to launch the pipelines on YARN.
+An optional Apex installation may be useful for monitoring and troubleshooting.
+The Apex CLI can be <a href="http://apex.apache.org/docs/apex/apex_development_setup/">built</a> or
+obtained as <a href="http://www.atrato.io/blog/2017/04/08/apache-apex-cli/">binary build</a>.
+For more download options see <a href="http://apex.apache.org/downloads.html">distribution information on the Apache Apex website</a>.</p>
+
+<h2 id="running-wordcount-using-apex-runner">Running wordcount using Apex Runner</h2>
+
+<p>Put data for processing into HDFS:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>hdfs dfs -mkdir -p /tmp/input/
+hdfs dfs -put pom.xml /tmp/input/
+</code></pre>
+</div>
+
+<p>The output directory should not exist on HDFS:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>hdfs dfs -rm -r -f /tmp/output/
+</code></pre>
+</div>
+
+<p>Run the wordcount example (<em>example project needs to be modified to include HDFS file provider</em>)</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=/tmp/input/pom.xml --output=/tmp/output/ --runner=ApexRunner --embeddedExecution=false --configFile=beam-runners-apex.properties" -Papex-runner
+</code></pre>
+</div>
+
+<p>The application will run asynchronously. Check status with <code class="highlighter-rouge">yarn application -list -appStates ALL</code></p>
+
+<p>The configuration file is optional, it can be used to influence how Apex operators are deployed into YARN containers.
+The following example will reduce the number of required containers by collocating the operators into the same container
+and lower the heap memory per operator - suitable for execution in a single node Hadoop sandbox.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>dt.application.*.operator.*.attr.MEMORY_MB=64
+dt.stream.*.prop.locality=CONTAINER_LOCAL
+dt.application.*.operator.*.attr.TIMEOUT_WINDOW_COUNT=1200
+</code></pre>
+</div>
+
+<h2 id="checking-output">Checking output</h2>
+
+<p>Check the output of the pipeline in the HDFS location.</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>hdfs dfs -ls /tmp/output/
+</code></pre>
+</div>
+
+<h2 id="montoring-progress-of-your-job">Montoring progress of your job</h2>
+
+<p>Depending on your installation, you may be able to monitor the progress of your job on the Hadoop cluster. Alternatively, you have following options:</p>
+
+<ul>
+  <li>YARN : Using YARN web UI generally running on 8088 on the node running resource manager.</li>
+  <li>Apex command-line interface: <a href="http://apex.apache.org/docs/apex/apex_cli/#apex-cli-commands">Using the Apex CLI to get running application information</a>.</li>
+</ul>
 
 
       </div>