You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by ec...@apache.org on 2022/07/01 10:08:33 UTC
[beam] branch master updated: Deprecate runner support for Spark 2.4 (closes #22094)
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new cd6bb9569e5 Deprecate runner support for Spark 2.4 (closes #22094)
new 680ed5b3a49 Merge pull request #22097 from mosche/22094-DeprecateSpark2
cd6bb9569e5 is described below
commit cd6bb9569e5f8a0a9c6b55473c13a0b453ee6c8f
Author: Moritz Mack <mm...@talend.com>
AuthorDate: Wed Jun 29 14:51:50 2022 +0200
Deprecate runner support for Spark 2.4 (closes #22094)
---
CHANGES.md | 1 +
.../runners/spark/translation/SparkContextFactory.java | 8 +++++++-
.../www/site/content/en/documentation/runners/spark.md | 15 ++++++++-------
3 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/CHANGES.md b/CHANGES.md
index 9a5873ae940..53af5dd28b0 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -70,6 +70,7 @@
## Deprecations
+* Support for Spark 2.4.x is deprecated and will be dropped with the release of Beam 2.44.0 or soon after (Spark runner) ([#22094](https://github.com/apache/beam/issues/22094)).
* X behavior is deprecated and will be removed in X versions ([#X](https://github.com/apache/beam/issues/X)).
## Bugfixes
diff --git a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java
index 9f9465ccde8..4b714b65581 100644
--- a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java
@@ -143,6 +143,12 @@ public final class SparkContextFactory {
conf.setAppName(options.getAppName());
// register immutable collections serializers because the SDK uses them.
conf.set("spark.kryo.registrator", SparkRunnerKryoRegistrator.class.getName());
- return new JavaSparkContext(conf);
+ JavaSparkContext jsc = new JavaSparkContext(conf);
+ if (jsc.sc().version().startsWith("2")) {
+ LOG.warn(
+ "Support for Spark 2 is deprecated, this runner will be removed in a few releases.\n"
+ + "Spark 2 is reaching its EOL, consider migrating to Spark 3.");
+ }
+ return jsc;
}
}
diff --git a/website/www/site/content/en/documentation/runners/spark.md b/website/www/site/content/en/documentation/runners/spark.md
index 91b72d542a7..abc1031840b 100644
--- a/website/www/site/content/en/documentation/runners/spark.md
+++ b/website/www/site/content/en/documentation/runners/spark.md
@@ -67,7 +67,8 @@ the portable Runner. For more information on portability, please visit the
## Spark Runner prerequisites and setup
-The Spark runner currently supports Spark's 2.x branch, and more specifically any version greater than 2.4.0.
+The Spark runner currently supports Spark's 3.1.x branch.
+> **Note:** Support for Spark 2.4.x is deprecated and will be dropped with the release of Beam 2.44.0 (or soon after).
{{< paragraph class="language-java" >}}
You can add a dependency on the latest version of the Spark runner by adding to your pom.xml the following:
@@ -76,7 +77,7 @@ You can add a dependency on the latest version of the Spark runner by adding to
{{< highlight java >}}
<dependency>
<groupId>org.apache.beam</groupId>
- <artifactId>beam-runners-spark</artifactId>
+ <artifactId>beam-runners-spark-3</artifactId>
<version>{{< param release_latest >}}</version>
</dependency>
{{< /highlight >}}
@@ -90,13 +91,13 @@ In some cases, such as running in local mode/Standalone, your (self-contained) a
{{< highlight java >}}
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-core_2.11</artifactId>
+ <artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-streaming_2.11</artifactId>
+ <artifactId>spark-streaming_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
{{< /highlight >}}
@@ -193,7 +194,7 @@ download it on the [Downloads page](/get-started/downloads/).
{{< paragraph class="language-py" >}}
1. Start the JobService endpoint:
* with Docker (preferred): `docker run --net=host apache/beam_spark_job_server:latest`
- * or from Beam source code: `./gradlew :runners:spark:2:job-server:runShadow`
+ * or from Beam source code: `./gradlew :runners:spark:3:job-server:runShadow`
{{< /paragraph >}}
{{< paragraph class="language-py" >}}
@@ -228,7 +229,7 @@ For more details on the different deployment modes see: [Standalone](https://spa
{{< paragraph class="language-py" >}}
2. Start JobService that will connect with the Spark master:
* with Docker (preferred): `docker run --net=host apache/beam_spark_job_server:latest --spark-master-url=spark://localhost:7077`
- * or from Beam source code: `./gradlew :runners:spark:2:job-server:runShadow -PsparkMasterUrl=spark://localhost:7077`
+ * or from Beam source code: `./gradlew :runners:spark:3:job-server:runShadow -PsparkMasterUrl=spark://localhost:7077`
{{< /paragraph >}}
{{< paragraph class="language-py" >}}3. Submit the pipeline as above.
@@ -246,7 +247,7 @@ To run Beam jobs written in Python, Go, and other supported languages, you can u
The following example runs a portable Beam job in Python from the Dataproc cluster's master node with Yarn backed.
-> Note: This example executes successfully with Dataproc 2.0, Spark 2.4.8 and 3.1.2 and Beam 2.37.0.
+> Note: This example executes successfully with Dataproc 2.0, Spark 3.1.2 and Beam 2.37.0.
1. Create a Dataproc cluster with [Docker](https://cloud.google.com/dataproc/docs/concepts/components/docker) component enabled.