You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by Jia Yu <ji...@apache.org> on 2022/06/19 06:10:31 UTC

[DISCUSS] Drop Spark 2.4 and Scala 2.11 support

Dear all,

I am proposing to drop the support of Spark 2.4 and Scala 2.11 in the next
Sedona release. The version number will be 1.3.0 if we drop this support,
otherwise it will be 1.2.1.

Here is the status of Spark 2.4 and Sedona for Spark 2.4
1. Spark community has announced Spark 2.4 EOL on March 03 2021:
https://www.mail-archive.com/dev@spark.apache.org/msg27476.html
2. Spark 3.0 was released on 06-16-2020.
3. Spark 3.3.0 was released a few days ago. And starting from Spark 3.2,
Spark releases binaries for both Scala 2.12 and 2.13.
4. Only a few Sedona users are using Spark 2.4. According to the statistics
of Maven Central (Scala/Java API only), only around 1K out of 100K
downloads are using Sedona for Spark 2.4. (core-2.4_2.11, core-2.4_2.12,
python-adapter-2.4_2.11, python-adapter-2.4_2.12)

Benefits of dropping the support:
1. Reduce the complexity of maintaining the source code for different Spark
versions. Currently, several files have two versions for Spark 2.4 and 3.x,
controlled by "anchor" keywords. I wrote a Python script to pre-process the
source code all the time:
https://github.com/apache/incubator-sedona/blob/master/spark-version-converter.py
2. Reduce the overhead of releasing binary packages. Currently, the main
POM.xml is quite complex in order to compile against different Spark
versions. Therefore, we weren't able to release Sedona for Scala 2.13.

Plan of Sedona for Spark 3.X
1. Sedona source code already supports Scala 2.13 but no Sedona binary
release. We will release Sedona for both Scala 2.12 and 2.13, but no Scala
2.11.
2. Sedona already releases binaries for Spark 3.0, 3.1, 3.2
3. The two latest PRs of Sedona are adding the support for Spark 3.3.
https://github.com/apache/incubator-sedona/pull/636
https://github.com/apache/incubator-sedona/pull/635

What do you think of this proposal? If you don't like this, what is the
best time to drop the support of Spark 2.4 and Scala 2.11?

I will let this discussion open for at least 3 days. If no objection, I
will remove Spark 2.4 from POM.xml and GitHub Actions, but leave the Spark
2.4 support in the source code. So whoever wants to use Sedona on Spark 2.4
can still compile the source code by themselves.

Thanks,
Jia

Re: [DISCUSS] Drop Spark 2.4 and Scala 2.11 support

Posted by Jia Yu <ji...@apache.org>.
My opinion is to keep -3.0 in the artifact ID just in case it will be
needed in the future.

For Flink, Flink is working to be Scala-free:
https://flink.apache.org/2022/02/22/scala-free.html  And Sedona Flink is
purely in Java. So I think it may be OK to stop compiling Sedona against
Flink Scala 2.11 API.


On Sun, Jun 19, 2022 at 11:12 AM Adam Binford <ad...@gmail.com> wrote:

> I'll start with my support. I think it's fair to upgrade Spark versions to
> get new features at this point.
>
> Questions:
> Since all the supported Spark versions are supported by a single artifact,
> do you drop the -3.0 in the artifact ID? Or leave it in case it's needed in
> the future?
>
> Does Flink still need Scala 2.11 support? I don't know much about Flink,
> but I guess that's self contained anyway so not a big deal either way?
>
> Adam
>
> On Sun, Jun 19, 2022, 2:10 AM Jia Yu <ji...@apache.org> wrote:
>
> > Dear all,
> >
> > I am proposing to drop the support of Spark 2.4 and Scala 2.11 in the
> next
> > Sedona release. The version number will be 1.3.0 if we drop this support,
> > otherwise it will be 1.2.1.
> >
> > Here is the status of Spark 2.4 and Sedona for Spark 2.4
> > 1. Spark community has announced Spark 2.4 EOL on March 03 2021:
> > https://www.mail-archive.com/dev@spark.apache.org/msg27476.html
> > 2. Spark 3.0 was released on 06-16-2020.
> > 3. Spark 3.3.0 was released a few days ago. And starting from Spark 3.2,
> > Spark releases binaries for both Scala 2.12 and 2.13.
> > 4. Only a few Sedona users are using Spark 2.4. According to the
> statistics
> > of Maven Central (Scala/Java API only), only around 1K out of 100K
> > downloads are using Sedona for Spark 2.4. (core-2.4_2.11, core-2.4_2.12,
> > python-adapter-2.4_2.11, python-adapter-2.4_2.12)
> >
> > Benefits of dropping the support:
> > 1. Reduce the complexity of maintaining the source code for different
> Spark
> > versions. Currently, several files have two versions for Spark 2.4 and
> 3.x,
> > controlled by "anchor" keywords. I wrote a Python script to pre-process
> the
> > source code all the time:
> >
> >
> https://github.com/apache/incubator-sedona/blob/master/spark-version-converter.py
> > 2. Reduce the overhead of releasing binary packages. Currently, the main
> > POM.xml is quite complex in order to compile against different Spark
> > versions. Therefore, we weren't able to release Sedona for Scala 2.13.
> >
> > Plan of Sedona for Spark 3.X
> > 1. Sedona source code already supports Scala 2.13 but no Sedona binary
> > release. We will release Sedona for both Scala 2.12 and 2.13, but no
> Scala
> > 2.11.
> > 2. Sedona already releases binaries for Spark 3.0, 3.1, 3.2
> > 3. The two latest PRs of Sedona are adding the support for Spark 3.3.
> > https://github.com/apache/incubator-sedona/pull/636
> > https://github.com/apache/incubator-sedona/pull/635
> >
> > What do you think of this proposal? If you don't like this, what is the
> > best time to drop the support of Spark 2.4 and Scala 2.11?
> >
> > I will let this discussion open for at least 3 days. If no objection, I
> > will remove Spark 2.4 from POM.xml and GitHub Actions, but leave the
> Spark
> > 2.4 support in the source code. So whoever wants to use Sedona on Spark
> 2.4
> > can still compile the source code by themselves.
> >
> > Thanks,
> > Jia
> >
>

Re: [DISCUSS] Drop Spark 2.4 and Scala 2.11 support

Posted by Adam Binford <ad...@gmail.com>.
I'll start with my support. I think it's fair to upgrade Spark versions to
get new features at this point.

Questions:
Since all the supported Spark versions are supported by a single artifact,
do you drop the -3.0 in the artifact ID? Or leave it in case it's needed in
the future?

Does Flink still need Scala 2.11 support? I don't know much about Flink,
but I guess that's self contained anyway so not a big deal either way?

Adam

On Sun, Jun 19, 2022, 2:10 AM Jia Yu <ji...@apache.org> wrote:

> Dear all,
>
> I am proposing to drop the support of Spark 2.4 and Scala 2.11 in the next
> Sedona release. The version number will be 1.3.0 if we drop this support,
> otherwise it will be 1.2.1.
>
> Here is the status of Spark 2.4 and Sedona for Spark 2.4
> 1. Spark community has announced Spark 2.4 EOL on March 03 2021:
> https://www.mail-archive.com/dev@spark.apache.org/msg27476.html
> 2. Spark 3.0 was released on 06-16-2020.
> 3. Spark 3.3.0 was released a few days ago. And starting from Spark 3.2,
> Spark releases binaries for both Scala 2.12 and 2.13.
> 4. Only a few Sedona users are using Spark 2.4. According to the statistics
> of Maven Central (Scala/Java API only), only around 1K out of 100K
> downloads are using Sedona for Spark 2.4. (core-2.4_2.11, core-2.4_2.12,
> python-adapter-2.4_2.11, python-adapter-2.4_2.12)
>
> Benefits of dropping the support:
> 1. Reduce the complexity of maintaining the source code for different Spark
> versions. Currently, several files have two versions for Spark 2.4 and 3.x,
> controlled by "anchor" keywords. I wrote a Python script to pre-process the
> source code all the time:
>
> https://github.com/apache/incubator-sedona/blob/master/spark-version-converter.py
> 2. Reduce the overhead of releasing binary packages. Currently, the main
> POM.xml is quite complex in order to compile against different Spark
> versions. Therefore, we weren't able to release Sedona for Scala 2.13.
>
> Plan of Sedona for Spark 3.X
> 1. Sedona source code already supports Scala 2.13 but no Sedona binary
> release. We will release Sedona for both Scala 2.12 and 2.13, but no Scala
> 2.11.
> 2. Sedona already releases binaries for Spark 3.0, 3.1, 3.2
> 3. The two latest PRs of Sedona are adding the support for Spark 3.3.
> https://github.com/apache/incubator-sedona/pull/636
> https://github.com/apache/incubator-sedona/pull/635
>
> What do you think of this proposal? If you don't like this, what is the
> best time to drop the support of Spark 2.4 and Scala 2.11?
>
> I will let this discussion open for at least 3 days. If no objection, I
> will remove Spark 2.4 from POM.xml and GitHub Actions, but leave the Spark
> 2.4 support in the source code. So whoever wants to use Sedona on Spark 2.4
> can still compile the source code by themselves.
>
> Thanks,
> Jia
>