You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by gi...@apache.org on 2019/09/24 00:17:07 UTC
[beam] branch asf-site updated: Publishing website 2019/09/24
00:16:54 at commit 113461a
This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new e18160c Publishing website 2019/09/24 00:16:54 at commit 113461a
e18160c is described below
commit e18160cfccadba882c97e3277d87f2547df4862a
Author: jenkins <bu...@apache.org>
AuthorDate: Tue Sep 24 00:16:54 2019 +0000
Publishing website 2019/09/24 00:16:54 at commit 113461a
---
.../documentation/runners/direct/index.html | 76 ++++++++++++++++++++++
.../documentation/sdks/python-streaming/index.html | 7 +-
2 files changed, 77 insertions(+), 6 deletions(-)
diff --git a/website/generated-content/documentation/runners/direct/index.html b/website/generated-content/documentation/runners/direct/index.html
index 994d5df..8187175 100644
--- a/website/generated-content/documentation/runners/direct/index.html
+++ b/website/generated-content/documentation/runners/direct/index.html
@@ -208,6 +208,7 @@
<ul>
<li><a href="#memory-considerations">Memory considerations</a></li>
<li><a href="#streaming-execution">Streaming execution</a></li>
+ <li><a href="#execution-mode">Execution Mode</a></li>
</ul>
</li>
</ul>
@@ -295,6 +296,81 @@ interface for defaults and additional pipeline configuration options.</p>
<p>If your pipeline uses an unbounded data source or sink, you must set the <code class="highlighter-rouge">streaming</code> option to <code class="highlighter-rouge">true</code>.</p>
+<h3 id="execution-mode">Execution Mode</h3>
+
+<p>Python <a href="https://beam.apache.org/contribute/runner-guide/#the-fn-api">FnApiRunner</a> supports multi-threading and multi-processing mode.</p>
+
+<h4 id="setting-parallelism">Setting parallelism</h4>
+
+<p>Number of threads or subprocesses is defined by setting the <code class="highlighter-rouge">direct_num_workers</code> option. There are several ways to set this option.</p>
+
+<ul>
+ <li>Passing through CLI when executing a pipeline.
+ <div class="highlighter-rouge"><pre class="highlight"><code>python wordcount.py --input xx --output xx --direct_num_workers 2
+</code></pre>
+ </div>
+ </li>
+ <li>Setting with <code class="highlighter-rouge">PipelineOptions</code>.
+ <div class="highlighter-rouge"><pre class="highlight"><code>from apache_beam.options.pipeline_options import PipelineOptions
+pipeline_options = PipelineOptions(['--direct_num_workers', '2'])
+</code></pre>
+ </div>
+ </li>
+ <li>Adding to existing <code class="highlighter-rouge">PipelineOptions</code>.
+ <div class="highlighter-rouge"><pre class="highlight"><code>from apache_beam.options.pipeline_options import DirectOptions
+pipeline_options = PipelineOptions(xxx)
+pipeline_options.view_as(DirectOptions).direct_num_workers = 2
+</code></pre>
+ </div>
+ </li>
+</ul>
+
+<h4 id="running-with-multi-threading-mode">Running with multi-threading mode</h4>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>import argparse
+
+import apache_beam as beam
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.runners.portability import fn_api_runner
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.portability import python_urns
+
+parser = argparse.ArgumentParser()
+parser.add_argument(...)
+known_args, pipeline_args = parser.parse_known_args(argv)
+pipeline_options = PipelineOptions(pipeline_args)
+
+p = beam.Pipeline(options=pipeline_options,
+ runner=fn_api_runner.FnApiRunner(
+ default_environment=beam_runner_api_pb2.Environment(
+ urn=python_urns.EMBEDDED_PYTHON_GRPC)))
+</code></pre>
+</div>
+
+<h4 id="running-with-multi-processing-mode">Running with multi-processing mode</h4>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>import argparse
+import sys
+
+import apache_beam as beam
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.runners.portability import fn_api_runner
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.portability import python_urns
+
+parser = argparse.ArgumentParser()
+parser.add_argument(...)
+known_args, pipeline_args = parser.parse_known_args(argv)
+pipeline_options = PipelineOptions(pipeline_args)
+
+p = beam.Pipeline(options=pipeline_options,
+ runner=fn_api_runner.FnApiRunner(
+ default_environment=beam_runner_api_pb2.Environment(
+ urn=python_urns.SUBPROCESS_SDK,
+ payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
+ % sys.executable.encode('ascii'))))
+</code></pre>
+</div>
</div>
</div>
diff --git a/website/generated-content/documentation/sdks/python-streaming/index.html b/website/generated-content/documentation/sdks/python-streaming/index.html
index 4531074..342cd83 100644
--- a/website/generated-content/documentation/sdks/python-streaming/index.html
+++ b/website/generated-content/documentation/sdks/python-streaming/index.html
@@ -451,7 +451,7 @@ about executing streaming pipelines:</p>
<li>Custom source API</li>
<li>Splittable <code class="highlighter-rouge">DoFn</code> API</li>
<li>Handling of late data</li>
- <li>User-defined custom <code class="highlighter-rouge">WindowFn</code></li>
+ <li>User-defined custom merging <code class="highlighter-rouge">WindowFn</code> (with fnapi)</li>
</ul>
<h3 id="dataflowrunner-specific-features">DataflowRunner specific features</h3>
@@ -460,12 +460,7 @@ about executing streaming pipelines:</p>
Dataflow specific features with Python streaming execution.</p>
<ul>
- <li>Streaming autoscaling</li>
- <li>Updating existing pipelines</li>
<li>Cloud Dataflow Templates</li>
- <li>Some monitoring features, such as msec counters, display data, metrics, and
-element counts for transforms. However, logging, watermarks, and element
-counts for sources are supported.</li>
</ul>