You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by gi...@apache.org on 2020/11/10 18:03:14 UTC

[beam] branch asf-site updated: Publishing website 2020/11/10 18:03:00 at commit 4a62f2b

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 2f207cf  Publishing website 2020/11/10 18:03:00 at commit 4a62f2b
2f207cf is described below

commit 2f207cfe307dee193ee7e1f95d9d7d093fbf91cd
Author: jenkins <bu...@apache.org>
AuthorDate: Tue Nov 10 18:03:00 2020 +0000

    Publishing website 2020/11/10 18:03:00 at commit 4a62f2b
---
 .../documentation/runners/direct/index.html        | 90 ++++++++++++----------
 .../documentation/runners/flink/index.html         |  2 +-
 website/generated-content/sitemap.xml              |  2 +-
 3 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/website/generated-content/documentation/runners/direct/index.html b/website/generated-content/documentation/runners/direct/index.html
index 952a053..3bb23d4 100644
--- a/website/generated-content/documentation/runners/direct/index.html
+++ b/website/generated-content/documentation/runners/direct/index.html
@@ -9,52 +9,62 @@
 <span class=o>&lt;/</span><span class=n>dependency</span><span class=o>&gt;</span></code></pre></div></div></p><p><span class=language-py>This section is not applicable to the Beam SDK for Python.</span></p><h2 id=pipeline-options-for-the-direct-runner>Pipeline options for the Direct Runner</h2><p>When executing your pipeline from the command-line, set <code>runner</code> to <code>direct</code> or <code>DirectRunner</code>. The default values for the other pipeline options are generally  [...]
 <span class=language-java><a href=https://beam.apache.org/releases/javadoc/2.25.0/index.html?org/apache/beam/runners/direct/DirectOptions.html><code>DirectOptions</code></a></span>
 <span class=language-py><a href=https://beam.apache.org/releases/pydoc/2.25.0/apache_beam.options.pipeline_options.html#apache_beam.options.pipeline_options.DirectOptions><code>DirectOptions</code></a></span>
-interface for defaults and additional pipeline configuration options.</p><h2 id=additional-information-and-caveats>Additional information and caveats</h2><h3 id=memory-considerations>Memory considerations</h3><p>Local execution is limited by the memory available in your local environment. It is highly recommended that you run your pipeline with data sets small enough to fit in local memory. You can create a small in-memory data set using a <span class=language-java><a href=https://beam.a [...]
-From 2.22.0, <code>direct_num_workers = 0</code> is supported. When <code>direct_num_workers</code> is set to 0, it will set the number of threads/subprocess to the number of cores of the machine where the pipelien is running.</p><p>There are several ways to set this option.</p><ul><li>Passing through CLI when executing a pipeline.</li></ul><pre><code>python wordcount.py --input xx --output xx --direct_num_workers 2
-</code></pre><ul><li>Setting with <code>PipelineOptions</code>.</li></ul><pre><code>from apache_beam.options.pipeline_options import PipelineOptions
-pipeline_options = PipelineOptions(['--direct_num_workers', '2'])
-</code></pre><ul><li>Adding to existing <code>PipelineOptions</code>.</li></ul><pre><code>from apache_beam.options.pipeline_options import DirectOptions
-pipeline_options = PipelineOptions(xxx)
-pipeline_options.view_as(DirectOptions).direct_num_workers = 2
-</code></pre><p><strong>Setting running mode</strong></p><p>From 2.19, a new option was added to set running mode. We can use <code>direct_running_mode</code> option to set the running mode.
-<code>direct_running_mode</code> can be one of [<code>'in_memory'</code>, <code>'multi_threading'</code>, <code>'multi_processing'</code>].</p><p><b>in_memory</b>: Runner and workers&rsquo; communication happens in memory (not through gRPC). This is a default mode.</p><p><b>multi_threading</b>: Runner and workers communicate through gRPC and each worker runs in a thread.</p><p><b>multi_processing</b>: Runner and workers communicate through gRPC and each worker runs in a subprocess.</p><p [...]
+interface for defaults and additional pipeline configuration options.</p><h2 id=additional-information-and-caveats>Additional information and caveats</h2><h3 id=memory-considerations>Memory considerations</h3><p>Local execution is limited by the memory available in your local environment. It is highly recommended that you run your pipeline with data sets small enough to fit in local memory. You can create a small in-memory data set using a <span class=language-java><a href=https://beam.a [...]
+Python <a href=https://beam.apache.org/contribute/runner-guide/#the-fn-api>FnApiRunner</a> supports multi-threading and multi-processing mode.</p><p>{:.language-py}
+<strong>Setting parallelism</strong></p><p>{:.language-py}
+Number of threads or subprocesses is defined by setting the <code>direct_num_workers</code> option.
+From 2.22.0, <code>direct_num_workers = 0</code> is supported. When <code>direct_num_workers</code> is set to 0, it will set the number of threads/subprocess to the number of cores of the machine where the pipeline is running.</p><p>{:.language-py}</p><ul><li>There are several ways to set this option.</li></ul><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>python</span> <span class=n>wordcount</span><span class=o>.</span><span class=n>py</span>  [...]
+</code></pre></div><p>{:.language-py}</p><ul><li>Setting with <code>PipelineOptions</code>.</li></ul><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>PipelineOptions</span>
+<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>([</span><span class=s1>&#39;--direct_num_workers&#39;</span><span class=p>,</span> <span class=s1>&#39;2&#39;</span><span class=p>])</span>
+</code></pre></div><p>{:.language-py}</p><ul><li>Adding to existing <code>PipelineOptions</code>.</li></ul><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>DirectOptions</span>
+<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span><span class=n>xxx</span><span class=p>)</span>
+<span class=n>pipeline_options</span><span class=o>.</span><span class=n>view_as</span><span class=p>(</span><span class=n>DirectOptions</span><span class=p>)</span><span class=o>.</span><span class=n>direct_num_workers</span> <span class=o>=</span> <span class=mi>2</span>
+</code></pre></div><p>{:.language-py}
+<strong>Setting running mode</strong></p><p>{:.language-py}
+From 2.19, a new option was added to set running mode. We can use <code>direct_running_mode</code> option to set the running mode.
+<code>direct_running_mode</code> can be one of [<code>'in_memory'</code>, <code>'multi_threading'</code>, <code>'multi_processing'</code>].</p><p>{:.language-py}
+<b>in_memory</b>: Runner and workers&rsquo; communication happens in memory (not through gRPC). This is a default mode.</p><p>{:.language-py}
+<b>multi_threading</b>: Runner and workers communicate through gRPC and each worker runs in a thread.</p><p>{:.language-py}
+<b>multi_processing</b>: Runner and workers communicate through gRPC and each worker runs in a subprocess.</p><p>{:.language-py}
+Same as other options, <code>direct_running_mode</code> can be passed through CLI or set with <code>PipelineOptions</code>.</p><p>{:.language-py}
+For the versions before 2.19.0, the running mode should be set with <code>FnApiRunner()</code>. Please refer following examples.</p><p>{:.language-py}</p><h4 id=running-with-multi-threading-mode>Running with multi-threading mode</h4><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>argparse</span>
 
-import apache_beam as beam
-from apache_beam.options.pipeline_options import PipelineOptions
-from apache_beam.runners.portability import fn_api_runner
-from apache_beam.portability.api import beam_runner_api_pb2
-from apache_beam.portability import python_urns
+<span class=kn>import</span> <span class=nn>apache_beam</span> <span class=kn>as</span> <span class=nn>beam</span>
+<span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>PipelineOptions</span>
+<span class=kn>from</span> <span class=nn>apache_beam.runners.portability</span> <span class=kn>import</span> <span class=n>fn_api_runner</span>
+<span class=kn>from</span> <span class=nn>apache_beam.portability.api</span> <span class=kn>import</span> <span class=n>beam_runner_api_pb2</span>
+<span class=kn>from</span> <span class=nn>apache_beam.portability</span> <span class=kn>import</span> <span class=n>python_urns</span>
 
-parser = argparse.ArgumentParser()
-parser.add_argument(...)
-known_args, pipeline_args = parser.parse_known_args(argv)
-pipeline_options = PipelineOptions(pipeline_args)
+<span class=n>parser</span> <span class=o>=</span> <span class=n>argparse</span><span class=o>.</span><span class=n>ArgumentParser</span><span class=p>()</span>
+<span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span><span class=o>...</span><span class=p>)</span>
+<span class=n>known_args</span><span class=p>,</span> <span class=n>pipeline_args</span> <span class=o>=</span> <span class=n>parser</span><span class=o>.</span><span class=n>parse_known_args</span><span class=p>(</span><span class=n>argv</span><span class=p>)</span>
+<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span><span class=n>pipeline_args</span><span class=p>)</span>
 
-p = beam.Pipeline(options=pipeline_options,
-      runner=fn_api_runner.FnApiRunner(
-          default_environment=beam_runner_api_pb2.Environment(
-          urn=python_urns.EMBEDDED_PYTHON_GRPC)))
-</code></pre><h4 id=running-with-multi-processing-mode>Running with multi-processing mode</h4><pre><code>import argparse
-import sys
+<span class=n>p</span> <span class=o>=</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>pipeline_options</span><span class=p>,</span>
+      <span class=n>runner</span><span class=o>=</span><span class=n>fn_api_runner</span><span class=o>.</span><span class=n>FnApiRunner</span><span class=p>(</span>
+          <span class=n>default_environment</span><span class=o>=</span><span class=n>beam_runner_api_pb2</span><span class=o>.</span><span class=n>Environment</span><span class=p>(</span>
+          <span class=n>urn</span><span class=o>=</span><span class=n>python_urns</span><span class=o>.</span><span class=n>EMBEDDED_PYTHON_GRPC</span><span class=p>)))</span>
+</code></pre></div><p>{:.language-py}</p><h4 id=running-with-multi-processing-mode>Running with multi-processing mode</h4><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>argparse</span>
+<span class=kn>import</span> <span class=nn>sys</span>
 
-import apache_beam as beam
-from apache_beam.options.pipeline_options import PipelineOptions
-from apache_beam.runners.portability import fn_api_runner
-from apache_beam.portability.api import beam_runner_api_pb2
-from apache_beam.portability import python_urns
+<span class=kn>import</span> <span class=nn>apache_beam</span> <span class=kn>as</span> <span class=nn>beam</span>
+<span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>PipelineOptions</span>
+<span class=kn>from</span> <span class=nn>apache_beam.runners.portability</span> <span class=kn>import</span> <span class=n>fn_api_runner</span>
+<span class=kn>from</span> <span class=nn>apache_beam.portability.api</span> <span class=kn>import</span> <span class=n>beam_runner_api_pb2</span>
+<span class=kn>from</span> <span class=nn>apache_beam.portability</span> <span class=kn>import</span> <span class=n>python_urns</span>
 
-parser = argparse.ArgumentParser()
-parser.add_argument(...)
-known_args, pipeline_args = parser.parse_known_args(argv)
-pipeline_options = PipelineOptions(pipeline_args)
+<span class=n>parser</span> <span class=o>=</span> <span class=n>argparse</span><span class=o>.</span><span class=n>ArgumentParser</span><span class=p>()</span>
+<span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span><span class=o>...</span><span class=p>)</span>
+<span class=n>known_args</span><span class=p>,</span> <span class=n>pipeline_args</span> <span class=o>=</span> <span class=n>parser</span><span class=o>.</span><span class=n>parse_known_args</span><span class=p>(</span><span class=n>argv</span><span class=p>)</span>
+<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span><span class=n>pipeline_args</span><span class=p>)</span>
 
-p = beam.Pipeline(options=pipeline_options,
-      runner=fn_api_runner.FnApiRunner(
-          default_environment=beam_runner_api_pb2.Environment(
-              urn=python_urns.SUBPROCESS_SDK,
-              payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
-                        % sys.executable.encode('ascii'))))
-</code></pre></div></div><footer class=footer><div class=footer__contained><div class=footer__cols><div class=footer__cols__col><div class=footer__cols__col__logo><img src=/images/beam_logo_circle.svg class=footer__logo alt="Beam logo"></div><div class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg class=footer__logo alt="Apache logo"></div></div><div class="footer__cols__col footer__cols__col--md"><div class=footer__cols__col__title>Start</div><div class=footer__cols__c [...]
+<span class=n>p</span> <span class=o>=</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>pipeline_options</span><span class=p>,</span>
+      <span class=n>runner</span><span class=o>=</span><span class=n>fn_api_runner</span><span class=o>.</span><span class=n>FnApiRunner</span><span class=p>(</span>
+          <span class=n>default_environment</span><span class=o>=</span><span class=n>beam_runner_api_pb2</span><span class=o>.</span><span class=n>Environment</span><span class=p>(</span>
+              <span class=n>urn</span><span class=o>=</span><span class=n>python_urns</span><span class=o>.</span><span class=n>SUBPROCESS_SDK</span><span class=p>,</span>
+              <span class=n>payload</span><span class=o>=</span><span class=sa>b</span><span class=s1>&#39;</span><span class=si>%s</span><span class=s1> -m apache_beam.runners.worker.sdk_worker_main&#39;</span>
+                        <span class=o>%</span> <span class=n>sys</span><span class=o>.</span><span class=n>executable</span><span class=o>.</span><span class=n>encode</span><span class=p>(</span><span class=s1>&#39;ascii&#39;</span><span class=p>))))</span>
+</code></pre></div></div></div><footer class=footer><div class=footer__contained><div class=footer__cols><div class=footer__cols__col><div class=footer__cols__col__logo><img src=/images/beam_logo_circle.svg class=footer__logo alt="Beam logo"></div><div class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg class=footer__logo alt="Apache logo"></div></div><div class="footer__cols__col footer__cols__col--md"><div class=footer__cols__col__title>Start</div><div class=footer__c [...]
 <a href=http://www.apache.org>The Apache Software Foundation</a>
 | <a href=/privacy_policy>Privacy Policy</a>
 | <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. All other products or name brands are trademarks of their respective holders, including The Apache Software Foundation.</div></footer></body></html>
\ No newline at end of file
diff --git a/website/generated-content/documentation/runners/flink/index.html b/website/generated-content/documentation/runners/flink/index.html
index eb0f927..9d61584 100644
--- a/website/generated-content/documentation/runners/flink/index.html
+++ b/website/generated-content/documentation/runners/flink/index.html
@@ -85,7 +85,7 @@ and will not work on remote clusters.
 See <a href=/documentation/runtime/sdk-harness-config/>here</a> for details.</p><h2 id=additional-information-and-caveats>Additional information and caveats</h2><h3 id=monitoring-your-job>Monitoring your job</h3><p>You can monitor a running Flink job using the Flink JobManager Dashboard or its Rest interfaces. By default, this is available at port <code>8081</code> of the JobManager node. If you have a Flink installation on your local machine that would be <code>http://localhost:8081</co [...]
 Many sources like <code>PubSubIO</code> rely on their checkpoints to be acknowledged which can only be done when checkpointing is enabled for the <code>FlinkRunner</code>. To enable checkpointing, please set <span class=language-java><code>checkpointingInterval</code></span><span class=language-py><code>checkpointing_interval</code></span> to the desired checkpointing interval in milliseconds.</p><h2 id=pipeline-options-for-the-flink-runner>Pipeline options for the Flink Runner</h2><p>Wh [...]
 <a href=https://beam.apache.org/releases/javadoc/2.25.0/index.html?org/apache/beam/runners/flink/FlinkPipelineOptions.html>FlinkPipelineOptions</a>
-reference class:</p><div class=language-java><table class="table table-bordered"><tr><td><code>allowNonRestoredState</code></td><td>Flag indicating whether non restored state is allowed if the savepoint contains state for an operator that is no longer part of the pipeline.</td><td>Default: <code>false</code></td></tr><tr><td><code>autoBalanceWriteFilesShardingEnabled</code></td><td>Flag indicating whether auto-balance sharding for WriteFiles transform should be enabled. This might prove  [...]
+reference class:</p><div class=language-java><table class="table table-bordered"><tr><td><code>allowNonRestoredState</code></td><td>Flag indicating whether non restored state is allowed if the savepoint contains state for an operator that is no longer part of the pipeline.</td><td>Default: <code>false</code></td></tr><tr><td><code>autoBalanceWriteFilesShardingEnabled</code></td><td>Flag indicating whether auto-balance sharding for WriteFiles transform should be enabled. This might prove  [...]
 <a href=https://beam.apache.org/releases/javadoc/2.25.0/index.html?org/apache/beam/sdk/options/PipelineOptions.html>PipelineOptions</a>
 reference.</p><h2 id=flink-version-compatibility>Flink Version Compatibility</h2><p>The Flink cluster version has to match the minor version used by the FlinkRunner.
 The minor version is the first two numbers in the version string, e.g. in <code>1.8.0</code> the
diff --git a/website/generated-content/sitemap.xml b/website/generated-content/sitemap.xml
index 51ab6a2..2c23908 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.25.0/</loc><lastmod>2020-10-29T14:08:19-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2020-10-29T14:08:19-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2020-10-29T14:08:19-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2020-10-29T14:08:19-07:00</lastmod></url><url><loc>/blog/b [...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.25.0/</loc><lastmod>2020-10-29T14:08:19-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2020-10-29T14:08:19-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2020-10-29T14:08:19-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2020-10-29T14:08:19-07:00</lastmod></url><url><loc>/blog/b [...]
\ No newline at end of file