You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "mosche (via GitHub)" <gi...@apache.org> on 2023/02/01 17:13:34 UTC

[GitHub] [beam] mosche opened a new pull request, #25263: [Spark runner] Removal of Spark 2 runner support

mosche opened a new pull request, #25263:
URL: https://github.com/apache/beam/pull/25263

   The runner for Spark 2 was deprecated quite a while back in August 2022 with the release of [Beam 2.41.0](https://github.com/apache/beam/blob/master/CHANGES.md#2410---2022-08-23). 
   
   This PR finally removes support for Spark 2 (beam-runners-spark) to only maintain support for Spark 3 (beam-runners-spark-3) going forward.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1412442585

   R: @aromanenko-dev 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on a diff in pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on code in PR #25263:
URL: https://github.com/apache/beam/pull/25263#discussion_r1097307309


##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -1535,9 +1535,8 @@ def _add_argparse_args(cls, parser):
     parser.add_argument(
         '--spark_version',
         default='3',
-        choices=['3', '2'],
-        help='Spark major version to use. '
-        'Note, Spark 2 support is deprecated')
+        choices=['3'],

Review Comment:
   No, removing the pipeline option would be a breaking change for the ones using spark 3 and setting this explicitly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1415495133

   Run Python_Runners PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1422451730

   Run Java_Kafka_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1415816646

   Run Java_Pulsar_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1418999986

   Run Java PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1422361161

   @aromanenko-dev That's what github does if resolving a conflict in the UI. It absolutely doesn't make a difference if squashed before merging. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1418704456

   @aromanenko-dev Could you have a look, pls. Any objections moving ahead with the removal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1415224181

   R: @je-ik 
   R: @JozoVilcek 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "aromanenko-dev (via GitHub)" <gi...@apache.org>.
aromanenko-dev commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1422354821

   @mosche Could you rebase a feature branch against `master` instead of merging?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1412488925

   # [Codecov](https://codecov.io/gh/apache/beam/pull/25263?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#25263](https://codecov.io/gh/apache/beam/pull/25263?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2ef241c) into [master](https://codecov.io/gh/apache/beam/commit/af416a4cc4f6300b08118fa4179fb3228792bf5f?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (af416a4) will **not change** coverage.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #25263   +/-   ##
   =======================================
     Coverage   72.96%   72.96%           
   =======================================
     Files         743      743           
     Lines       99037    99037           
   =======================================
     Hits        72264    72264           
     Misses      25407    25407           
     Partials     1366     1366           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/25263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/25263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `93.97% <ø> (ø)` | |
   | [...on/apache\_beam/runners/portability/spark\_runner.py](https://codecov.io/gh/apache/beam/pull/25263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9zcGFya19ydW5uZXIucHk=) | `67.34% <0.00%> (ø)` | |
   | [...m/runners/portability/spark\_uber\_jar\_job\_server.py](https://codecov.io/gh/apache/beam/pull/25263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9zcGFya191YmVyX2phcl9qb2Jfc2VydmVyLnB5) | `84.32% <0.00%> (ø)` | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on a diff in pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "aromanenko-dev (via GitHub)" <gi...@apache.org>.
aromanenko-dev commented on code in PR #25263:
URL: https://github.com/apache/beam/pull/25263#discussion_r1097253743


##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -1535,9 +1535,8 @@ def _add_argparse_args(cls, parser):
     parser.add_argument(
         '--spark_version',
         default='3',
-        choices=['3', '2'],
-        help='Spark major version to use. '
-        'Note, Spark 2 support is deprecated')
+        choices=['3'],

Review Comment:
   Do you keep it for future Spark versions, like `Spark 4`?



##########
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy:
##########
@@ -730,8 +729,6 @@ class BeamModulePlugin implements Plugin<Project> {
         slf4j_jcl                                   : "org.slf4j:slf4j-jcl:$slf4j_version",
         snappy_java                                 : "org.xerial.snappy:snappy-java:1.1.8.4",
         spark_core                                  : "org.apache.spark:spark-core_2.11:$spark2_version",

Review Comment:
   Should this line and a one with `spark_streaming` below be removed as well?



##########
sdks/python/apache_beam/runners/portability/spark_runner.py:
##########
@@ -93,9 +93,7 @@ def path_to_jar(self):
       return self._jar
     else:
       if self._spark_version == '2':

Review Comment:
   Why we still need this check?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche merged pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche merged PR #25263:
URL: https://github.com/apache/beam/pull/25263


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1419187337

   > Before merging, please, run all related pre- and post- commit jobs to make sure that nothing is broken.
   
   @aromanenko-dev  This pretty much triggered all `pre` jobs automatically in addition to the `post` jobs run above. Anything else you have in mind?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1419002934

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1412482115

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on a diff in pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on code in PR #25263:
URL: https://github.com/apache/beam/pull/25263#discussion_r1097306036


##########
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy:
##########
@@ -730,8 +729,6 @@ class BeamModulePlugin implements Plugin<Project> {
         slf4j_jcl                                   : "org.slf4j:slf4j-jcl:$slf4j_version",
         snappy_java                                 : "org.xerial.snappy:snappy-java:1.1.8.4",
         spark_core                                  : "org.apache.spark:spark-core_2.11:$spark2_version",

Review Comment:
   These are still used for cdapio (spark receiver)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on a diff in pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on code in PR #25263:
URL: https://github.com/apache/beam/pull/25263#discussion_r1097308175


##########
sdks/python/apache_beam/runners/portability/spark_runner.py:
##########
@@ -93,9 +93,7 @@ def path_to_jar(self):
       return self._jar
     else:
       if self._spark_version == '2':

Review Comment:
   yes, if somebody explicitly requires spark 2 it should fail fast



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1418997890

   Run Python Spark ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1419001723

   Run SQL PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "aromanenko-dev (via GitHub)" <gi...@apache.org>.
aromanenko-dev commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1419208957

   @mosche Nope, I think it should be enough


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #25263: [Spark runner] Removal of Spark 2 runner support

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on PR #25263:
URL: https://github.com/apache/beam/pull/25263#issuecomment-1415495454

   Run Python_Runners PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org