You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by gi...@apache.org on 2022/07/01 10:15:42 UTC
[beam] branch asf-site updated: Publishing website 2022/07/01 10:15:32 at commit 680ed5b

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 366f74fd4e9 Publishing website 2022/07/01 10:15:32 at commit 680ed5b
366f74fd4e9 is described below

commit 366f74fd4e9e3b5159683980d77a1cbf6ec0bb23
Author: jenkins <bu...@apache.org>
AuthorDate: Fri Jul 1 10:15:33 2022 +0000

    Publishing website 2022/07/01 10:15:32 at commit 680ed5b
---
 .../documentation/runners/spark/index.html               | 16 ++++++++--------
 website/generated-content/sitemap.xml                    |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/website/generated-content/documentation/runners/spark/index.html b/website/generated-content/documentation/runners/spark/index.html
index e4dcced00d2..54a5c7e20c0 100644
--- a/website/generated-content/documentation/runners/spark/index.html
+++ b/website/generated-content/documentation/runners/spark/index.html
@@ -29,19 +29,19 @@ architecture of the Runners had to be changed significantly to support executing
 pipelines written in other languages.</p><p>If your applications only use Java, then you should currently go with one of the java based runners.
 If you want to run Python or Go pipelines with Beam on Spark, you need to use
 the portable Runner. For more information on portability, please visit the
-<a href=/roadmap/portability/>Portability page</a>.</p><nav class=language-switcher><strong>Adapt for:</strong><ul><li data-type=language-java>Non portable (Java)</li><li data-type=language-py>Portable (Java/Python/Go)</li></ul></nav><h2 id=spark-runner-prerequisites-and-setup>Spark Runner prerequisites and setup</h2><p>The Spark runner currently supports Spark&rsquo;s 2.x branch, and more specifically any version greater than 2.4.0.</p><p class=language-java>You can add a dependency on  [...]
+<a href=/roadmap/portability/>Portability page</a>.</p><nav class=language-switcher><strong>Adapt for:</strong><ul><li data-type=language-java>Non portable (Java)</li><li data-type=language-py>Portable (Java/Python/Go)</li></ul></nav><h2 id=spark-runner-prerequisites-and-setup>Spark Runner prerequisites and setup</h2><p>The Spark runner currently supports Spark&rsquo;s 3.1.x branch.</p><blockquote><p><strong>Note:</strong> Support for Spark 2.4.x is deprecated and will be dropped with th [...]
   <span class=o>&lt;</span><span class=n>groupId</span><span class=o>&gt;</span><span class=n>org</span><span class=o>.</span><span class=na>apache</span><span class=o>.</span><span class=na>beam</span><span class=o>&lt;/</span><span class=n>groupId</span><span class=o>&gt;</span>
-  <span class=o>&lt;</span><span class=n>artifactId</span><span class=o>&gt;</span><span class=n>beam</span><span class=o>-</span><span class=n>runners</span><span class=o>-</span><span class=n>spark</span><span class=o>&lt;/</span><span class=n>artifactId</span><span class=o>&gt;</span>
+  <span class=o>&lt;</span><span class=n>artifactId</span><span class=o>&gt;</span><span class=n>beam</span><span class=o>-</span><span class=n>runners</span><span class=o>-</span><span class=n>spark</span><span class=o>-</span><span class=n>3</span><span class=o>&lt;/</span><span class=n>artifactId</span><span class=o>&gt;</span>
   <span class=o>&lt;</span><span class=n>version</span><span class=o>&gt;</span><span class=n>2</span><span class=o>.</span><span class=na>40</span><span class=o>.</span><span class=na>0</span><span class=o>&lt;/</span><span class=n>version</span><span class=o>&gt;</span>
 <span class=o>&lt;/</span><span class=n>dependency</span><span class=o>&gt;</span></code></pre></div></div></div><h3 id=deploying-spark-with-your-application>Deploying Spark with your application</h3><p class=language-java>In some cases, such as running in local mode/Standalone, your (self-contained) application would be required to pack Spark by explicitly adding the following dependencies in your pom.xml:</p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a  [...]
   <span class=o>&lt;</span><span class=n>groupId</span><span class=o>&gt;</span><span class=n>org</span><span class=o>.</span><span class=na>apache</span><span class=o>.</span><span class=na>spark</span><span class=o>&lt;/</span><span class=n>groupId</span><span class=o>&gt;</span>
-  <span class=o>&lt;</span><span class=n>artifactId</span><span class=o>&gt;</span><span class=n>spark</span><span class=o>-</span><span class=n>core_2</span><span class=o>.</span><span class=na>11</span><span class=o>&lt;/</span><span class=n>artifactId</span><span class=o>&gt;</span>
+  <span class=o>&lt;</span><span class=n>artifactId</span><span class=o>&gt;</span><span class=n>spark</span><span class=o>-</span><span class=n>core_2</span><span class=o>.</span><span class=na>12</span><span class=o>&lt;/</span><span class=n>artifactId</span><span class=o>&gt;</span>
   <span class=o>&lt;</span><span class=n>version</span><span class=o>&gt;</span><span class=n>$</span><span class=o>{</span><span class=n>spark</span><span class=o>.</span><span class=na>version</span><span class=o>}&lt;/</span><span class=n>version</span><span class=o>&gt;</span>
 <span class=o>&lt;/</span><span class=n>dependency</span><span class=o>&gt;</span>
 
 <span class=o>&lt;</span><span class=n>dependency</span><span class=o>&gt;</span>
   <span class=o>&lt;</span><span class=n>groupId</span><span class=o>&gt;</span><span class=n>org</span><span class=o>.</span><span class=na>apache</span><span class=o>.</span><span class=na>spark</span><span class=o>&lt;/</span><span class=n>groupId</span><span class=o>&gt;</span>
-  <span class=o>&lt;</span><span class=n>artifactId</span><span class=o>&gt;</span><span class=n>spark</span><span class=o>-</span><span class=n>streaming_2</span><span class=o>.</span><span class=na>11</span><span class=o>&lt;/</span><span class=n>artifactId</span><span class=o>&gt;</span>
+  <span class=o>&lt;</span><span class=n>artifactId</span><span class=o>&gt;</span><span class=n>spark</span><span class=o>-</span><span class=n>streaming_2</span><span class=o>.</span><span class=na>12</span><span class=o>&lt;/</span><span class=n>artifactId</span><span class=o>&gt;</span>
   <span class=o>&lt;</span><span class=n>version</span><span class=o>&gt;</span><span class=n>$</span><span class=o>{</span><span class=n>spark</span><span class=o>.</span><span class=na>version</span><span class=o>}&lt;/</span><span class=n>version</span><span class=o>&gt;</span>
 <span class=o>&lt;/</span><span class=n>dependency</span><span class=o>&gt;</span></code></pre></div></div></div><p class=language-java>And shading the application jar using the maven shade plugin:</p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java>< [...]
   <span class=o>&lt;</span><span class=n>groupId</span><span class=o>&gt;</span><span class=n>org</span><span class=o>.</span><span class=na>apache</span><span class=o>.</span><span class=na>maven</span><span class=o>.</span><span class=na>plugins</span><span class=o>&lt;/</span><span class=n>groupId</span><span class=o>&gt;</span>
@@ -79,7 +79,7 @@ the portable Runner. For more information on portability, please visit the
 Apache Beam with Python you have to install the Apache Beam Python SDK: <code>pip install apache_beam</code>. Please refer to the <a href=/documentation/sdks/python/>Python documentation</a>
 on how to create a Python pipeline.</p><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>pip</span> <span class=n>install</span> <span class=n>apache_beam</span></code></pre></div></div></div><p class=language-py>Starting from Beam 2.20.0, pre- [...]
 <a href=https://hub.docker.com/r/apache/beam_spark_job_server>Docker Hub</a>.</p><p class=language-py>For older Beam versions, you will need a copy of Apache Beam&rsquo;s source code. You can
-download it on the <a href=/get-started/downloads/>Downloads page</a>.</p><p class=language-py><ol><li>Start the JobService endpoint:<ul><li>with Docker (preferred): <code>docker run --net=host apache/beam_spark_job_server:latest</code></li><li>or from Beam source code: <code>./gradlew :runners:spark:2:job-server:runShadow</code></li></ul></li></ol></p><p class=language-py>The JobService is the central instance where you submit your Beam pipeline.
+download it on the <a href=/get-started/downloads/>Downloads page</a>.</p><p class=language-py><ol><li>Start the JobService endpoint:<ul><li>with Docker (preferred): <code>docker run --net=host apache/beam_spark_job_server:latest</code></li><li>or from Beam source code: <code>./gradlew :runners:spark:3:job-server:runShadow</code></li></ul></li></ol></p><p class=language-py>The JobService is the central instance where you submit your Beam pipeline.
 The JobService will create a Spark job for the pipeline and execute the
 job. To execute the job on a Spark cluster, the Beam JobService needs to be
 provided with the Spark master address.</p><p class=language-py><ol start=2><li>Submit the Python pipeline to the above endpoint by using the <code>PortableRunner</code>, <code>job_endpoint</code> set to <code>localhost:8099</code> (this is the default address of the JobService), and <code>environment_type</code> set to <code>LOOPBACK</code>. For example:</li></ol></p><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip [...]
@@ -92,10 +92,10 @@ provided with the Spark master address.</p><p class=language-py><ol start=2><li>
 <span class=p>])</span>
 <span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=p>)</span> <span class=k>as</span> <span class=n>p</span><span class=p>:</span>
     <span class=o>...</span></code></pre></div></div></div><h3 id=running-on-a-pre-deployed-spark-cluster>Running on a pre-deployed Spark cluster</h3><p>Deploying your Beam pipeline on a cluster that already has a Spark deployment (Spark classes are available in container classpath) does not require any additional dependencies.
-For more details on the different deployment modes see: <a href=https://spark.apache.org/docs/latest/spark-standalone.html>Standalone</a>, <a href=https://spark.apache.org/docs/latest/running-on-yarn.html>YARN</a>, or <a href=https://spark.apache.org/docs/latest/running-on-mesos.html>Mesos</a>.</p><p class=language-py><ol><li>Start a Spark cluster which exposes the master on port 7077 by default.</li></ol></p><p class=language-py><ol start=2><li>Start JobService that will connect with th [...]
+For more details on the different deployment modes see: <a href=https://spark.apache.org/docs/latest/spark-standalone.html>Standalone</a>, <a href=https://spark.apache.org/docs/latest/running-on-yarn.html>YARN</a>, or <a href=https://spark.apache.org/docs/latest/running-on-mesos.html>Mesos</a>.</p><p class=language-py><ol><li>Start a Spark cluster which exposes the master on port 7077 by default.</li></ol></p><p class=language-py><ol start=2><li>Start JobService that will connect with th [...]
 Note however that <code>environment_type=LOOPBACK</code> is only intended for local testing.
 See <a href=/roadmap/portability/#sdk-harness-config>here</a> for details.</li></ol></p><p class=language-py>(Note that, depending on your cluster setup, you may need to change the <code>environment_type</code> option.
-See <a href=/roadmap/portability/#sdk-harness-config>here</a> for details.)</p><h3 id=running-on-dataproc-cluster-yarn-backed>Running on Dataproc cluster (YARN backed)</h3><p>To run Beam jobs written in Python, Go, and other supported languages, you can use the <code>SparkRunner</code> and <code>PortableRunner</code> as described on the Beam&rsquo;s <a href=https://beam.apache.org/documentation/runners/spark/>Spark Runner</a> page (also see <a href=https://beam.apache.org/roadmap/portabi [...]
+See <a href=/roadmap/portability/#sdk-harness-config>here</a> for details.)</p><h3 id=running-on-dataproc-cluster-yarn-backed>Running on Dataproc cluster (YARN backed)</h3><p>To run Beam jobs written in Python, Go, and other supported languages, you can use the <code>SparkRunner</code> and <code>PortableRunner</code> as described on the Beam&rsquo;s <a href=https://beam.apache.org/documentation/runners/spark/>Spark Runner</a> page (also see <a href=https://beam.apache.org/roadmap/portabi [...]
 gcloud dataproc clusters create <b><i>CLUSTER_NAME</i></b> \
     --optional-components=DOCKER \
     --image-version=<b><i>DATAPROC_IMAGE_VERSION</i></b> \
@@ -127,7 +127,7 @@ Passing any of the above mentioned options could be done as one of the <code>app
 For more on how to generally use <code>spark-submit</code> checkout Spark <a href=https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit>documentation</a>.</p><h3 id=monitoring-your-job>Monitoring your job</h3><p>You can monitor a running Spark job using the Spark <a href=https://spark.apache.org/docs/latest/monitoring.html#web-interfaces>Web Interfaces</a>. By default, this is available at port <code>4040</code> on the driver node. If  [...]
 Spark also has a history server to <a href=https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact>view after the fact</a>.<p class=language-java>Metrics are also available via <a href=https://spark.apache.org/docs/latest/monitoring.html#rest-api>REST API</a>.
 Spark provides a <a href=https://spark.apache.org/docs/latest/monitoring.html#metrics>metrics system</a> that allows reporting Spark metrics to a variety of Sinks. The Spark runner reports user-defined Beam Aggregators using this same metrics system and currently supports <code>GraphiteSink</code> and <code>CSVSink</code>, and providing support for additional Sinks supported by Spark is easy and straight-forward.</p><p class=language-py>Spark metrics are not yet supported on the portable [...]
-Instead, you should use <code>SparkContextOptions</code> which can only be used programmatically and is not a common <code>PipelineOptions</code> implementation.<br><br><b>For Structured Streaming based runner:</b><br>Provided SparkSession and StreamingListeners are not supported on the Spark Structured Streaming runner</p><p class=language-py>Provided SparkContext and StreamingListeners are not supported on the Spark portable runner.</p><h3 id=kubernetes>Kubernetes</h3><p>An <a href=htt [...]
+Instead, you should use <code>SparkContextOptions</code> which can only be used programmatically and is not a common <code>PipelineOptions</code> implementation.<br><br><b>For Structured Streaming based runner:</b><br>Provided SparkSession and StreamingListeners are not supported on the Spark Structured Streaming runner</p><p class=language-py>Provided SparkContext and StreamingListeners are not supported on the Spark portable runner.</p><h3 id=kubernetes>Kubernetes</h3><p>An <a href=htt [...]
 <a href=http://www.apache.org>The Apache Software Foundation</a>
 | <a href=/privacy_policy>Privacy Policy</a>
 | <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. All other products or name brands are trademarks of their respective holders, including The Apache Software Foundation.</div></div><div class="footer__cols__col footer__cols__col__logos"><div class=footer__cols__col--group><div class=footer__cols__col__logo><a href=https://github.com/apache/beam><im [...]
\ No newline at end of file
diff --git a/website/generated-content/sitemap.xml b/website/generated-content/sitemap.xml
index 1fdc3d3495c..78a4c80e77a 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.40.0/</loc><lastmod>2022-06-29T10:26:49-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2022-06-29T10:26:49-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2022-06-29T10:26:49-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2022-06-29T10:26:49-07:00</lastmod></url><url><loc>/case-s [...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.40.0/</loc><lastmod>2022-06-29T10:26:49-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2022-06-29T10:26:49-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2022-06-29T10:26:49-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2022-06-29T10:26:49-07:00</lastmod></url><url><loc>/case-s [...]
\ No newline at end of file