You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Alexey Romanenko <ar...@gmail.com> on 2022/04/20 17:17:03 UTC

Re: [PROPOSAL] Stop Spark2 support in Spark Runner

Hi everyone,

A while ago, we already discussed on dev@ that there are several reasons to stop provide a support of Spark2 in Spark Runner (in all its variants that we have for now - RDD, Dataset, Portable) [1]. In two words, it brings some burden to Spark runner support that we would like to avoid in the future.

From the devs perspective I don’t see any objections about this. So, I’d like to know if there are users that still uses Spark2 for their Beam pipelines and it will be critical for them to keep using it. 

Please, share any your opinion on this!

—
Alexey

[1] https://lists.apache.org/thread/opfhg3xjb9nptv878sygwj9gjx38rmco

> On 31 Mar 2022, at 17:51, Alexey Romanenko <ar...@gmail.com> wrote:
> 
> Hi everyone,
> 
> For the moment, Beam Spark Runner supports two versions of Spark - 2.x and 3.x. 
> 
> Taking into account the several things that:
> - almost all cloud providers already mostly moved to Spark 3.x as a main supported version;
> - the latest Spark 2.x release (Spark 2.4.8, maintenance release) was done almost a year ago;
> - Spark 3 is considered as a mainstream Spark version for development and bug fixing;
> - better to avoid the burden of maintenance (there are some incompatibilities between Spark 2 and 3) of two versions; 
> 
> I’d suggest to stop support Spark 2 for the Spark Runner in the one of the next Beam releases. 
> 
> What are your thoughts on this? Are there any principal objections or reasons for not doing this that I probably missed?
> 
> —
> Alexey 
> 
>