You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/22 08:56:40 UTC

[GitHub] [beam] mosche opened a new pull request, #22408: 22407 separate spark ssrunner sources

mosche opened a new pull request, #22408:
URL: https://github.com/apache/beam/pull/22408

   Spark 2 runner support has been deprecated recently with Spark 2 reaching its end of life.
   
   This PR separates sources for the SparkStructuredStreamingRunner for Spark 2 & 3 to make it easier to innovate on / improve the new experimental runner for Spark 3 without having to support Spark 2.
   
   This PR just copies everything from `runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming` to:
   
   - `runners/spark/2/src/main/java/org/apache/beam/runners/spark/structuredstreaming`
   - `runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming`
   
   and removes the original files.
   
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] echauchot commented on pull request #22408: #22407: Stop sharing sources for SparkStructuredStreamingRunner for Spark 2 & 3

Posted by GitBox <gi...@apache.org>.
echauchot commented on PR #22408:
URL: https://github.com/apache/beam/pull/22408#issuecomment-1192526539

   @mosche IIUC no more overridden classes for braking changes between spark 2 and 3 and a common source base but a duplication of the common classes, right ? If it does not last for so long (should be only next 2 versions) and that we don't make changes to the spark 2 source base (like backports of features or bug fixes) I'm ok with it. I'll also tweet about the deprecation of spark 2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #22408: #22407: Stop sharing sources for SparkStructuredStreamingRunner for Spark 2 & 3

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #22408:
URL: https://github.com/apache/beam/pull/22408#issuecomment-1192470744

   @echauchot I started rebasing my next PR on this, please don't squash but use a merge commit. it would be a nightmare to resolve otherwise 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #22408: #22407: Stop sharing sources for SparkStructuredStreamingRunner for Spark 2 & 3

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #22408:
URL: https://github.com/apache/beam/pull/22408#issuecomment-1192488391

   fixed, but rebase from scratch again :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] echauchot merged pull request #22408: #22407: Stop sharing sources for SparkStructuredStreamingRunner for Spark 2 & 3

Posted by GitBox <gi...@apache.org>.
echauchot merged PR #22408:
URL: https://github.com/apache/beam/pull/22408


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #22408: #22407: Stop sharing sources for SparkStructuredStreamingRunner for Spark 2 & 3

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #22408:
URL: https://github.com/apache/beam/pull/22408#issuecomment-1192480698

   @echauchot forgot to move the test sources 🤦 copying these now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] echauchot commented on pull request #22408: #22407: Stop sharing sources for SparkStructuredStreamingRunner for Spark 2 & 3

Posted by GitBox <gi...@apache.org>.
echauchot commented on PR #22408:
URL: https://github.com/apache/beam/pull/22408#issuecomment-1192554126

   Run Spark StructuredStreaming ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] echauchot commented on pull request #22408: #22407: Stop sharing sources for SparkStructuredStreamingRunner for Spark 2 & 3

Posted by GitBox <gi...@apache.org>.
echauchot commented on PR #22408:
URL: https://github.com/apache/beam/pull/22408#issuecomment-1192545475

   I misread your comment in the description I thought we were moving the 2 runners (RDD and structured streaming) sources. As StructuredStreaming runner is still experimental I guess we can focus on maintaining only the spark 3 sources.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #22408: #22407: Stop sharing sources for SparkStructuredStreamingRunner for Spark 2 & 3

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #22408:
URL: https://github.com/apache/beam/pull/22408#issuecomment-1192529967

   we had a look at the maven stats, the runner for Spark 2 is still by far far far the most used ... i doubt we can remove it so quickly to be honest :/ but certainly no intend to touch the Spark 2 sources
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] echauchot commented on pull request #22408: #22407: Stop sharing sources for SparkStructuredStreamingRunner for Spark 2 & 3

Posted by GitBox <gi...@apache.org>.
echauchot commented on PR #22408:
URL: https://github.com/apache/beam/pull/22408#issuecomment-1192528554

   > @echauchot I started rebasing my next PR on this, please don't squash but use a merge commit. it would be a nightmare to resolve otherwise
   
   ok


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org