You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by Martin Andersson <u....@gmail.com> on 2022/07/15 18:22:18 UTC

Re: [DISCUSS] Drop Spark 2.4 and Scala 2.11 support

Hi,

Spark 3.3 support is now merged into Sedona master.

Would this be a good time to release Sedona 1.2.1 and then drop support for
old versions of Spark and Scala?

Br,
Martin

fre 24 juni 2022 kl. 09:52 skrev Jia Yu <ji...@apache.org>:

> Hi Martin,
>
> I agree.
>
> 1. Currently, geom serializers, spatial partitioning code and some format
> reader code in Sedona-core (all in Java) is independent from Spark
> dependency. So Sedona-Flink actually re-uses those. But a refactor of
> Sedona ST / RS functions are needed as some of them depend on Spark SQL,
> which is not necessary.
>
> 2. So let's keep both Spark 2.4 and Spark 3.3 support in the next Sedona
> release (1.2.1), Scala version will include 2.11, 2.12 but no 2.13. In the
> next-next Sedona release (1.3.0), we will drop them completely. Sedona
> 1.2.1 will be the last release that supports Spark 2.4 and Scala 2.11.
>
> Thanks,
> Jia
>
>
> On Thu, Jun 23, 2022 at 4:03 AM Martin Andersson <
> u.martin.andersson@gmail.com> wrote:
>
> > Hi,
> >
> > I guess that the pending Spark 3.3 support is a big enough feature to
> > warrant a new Sedona release. It makes no sense to remove Spark 2.4 and
> > Scala 2.11 before the release.
> >
> > After Sedona-next is released I think that Spark 2.4 can safely be
> removed.
> >
> > Long term, but i think that's another discussion, there are a lot of
> > benefits to moving shared code between Sedona-Spark and Sedona-Flink to a
> > common java-only module (sedona-common?). That would include partitioning
> > code and probably most ST_x/RS_x functions.
> >
> > That would give Sedona-Flink first class, scala-free, support. It would
> > also open up Sedona to other jvm data tools regardless of whether they
> are
> > written in java, scala, kotlin, clojure or any other jvm language.
> Possibly
> > Sedona-Kafka, Sedona-Hive etc. That would make Scala-version support a
> > Sedona-Spark issue only and not a general Sedona issue.
> >
> > Br,
> > Martin
> >
> > On 2022/06/19 06:10:31 Jia Yu wrote:
> > > Dear all,
> > >
> > > I am proposing to drop the support of Spark 2.4 and Scala 2.11 in the
> > next
> > > Sedona release. The version number will be 1.3.0 if we drop this
> support,
> > > otherwise it will be 1.2.1.
> > >
> > > Here is the status of Spark 2.4 and Sedona for Spark 2.4
> > > 1. Spark community has announced Spark 2.4 EOL on March 03 2021:
> > > https://www.mail-archive.com/dev@spark.apache.org/msg27476.html
> > > 2. Spark 3.0 was released on 06-16-2020.
> > > 3. Spark 3.3.0 was released a few days ago. And starting from Spark
> 3.2,
> > > Spark releases binaries for both Scala 2.12 and 2.13.
> > > 4. Only a few Sedona users are using Spark 2.4. According to the
> > statistics
> > > of Maven Central (Scala/Java API only), only around 1K out of 100K
> > > downloads are using Sedona for Spark 2.4. (core-2.4_2.11,
> core-2.4_2.12,
> > > python-adapter-2.4_2.11, python-adapter-2.4_2.12)
> > >
> > > Benefits of dropping the support:
> > > 1. Reduce the complexity of maintaining the source code for different
> > Spark
> > > versions. Currently, several files have two versions for Spark 2.4 and
> > 3.x,
> > > controlled by "anchor" keywords. I wrote a Python script to pre-process
> > the
> > > source code all the time:
> > >
> >
> >
> https://github.com/apache/incubator-sedona/blob/master/spark-version-converter.py
> > > 2. Reduce the overhead of releasing binary packages. Currently, the
> main
> > > POM.xml is quite complex in order to compile against different Spark
> > > versions. Therefore, we weren't able to release Sedona for Scala 2.13.
> > >
> > > Plan of Sedona for Spark 3.X
> > > 1. Sedona source code already supports Scala 2.13 but no Sedona binary
> > > release. We will release Sedona for both Scala 2.12 and 2.13, but no
> > Scala
> > > 2.11.
> > > 2. Sedona already releases binaries for Spark 3.0, 3.1, 3.2
> > > 3. The two latest PRs of Sedona are adding the support for Spark 3.3.
> > > https://github.com/apache/incubator-sedona/pull/636
> > > https://github.com/apache/incubator-sedona/pull/635
> > >
> > > What do you think of this proposal? If you don't like this, what is the
> > > best time to drop the support of Spark 2.4 and Scala 2.11?
> > >
> > > I will let this discussion open for at least 3 days. If no objection, I
> > > will remove Spark 2.4 from POM.xml and GitHub Actions, but leave the
> > Spark
> > > 2.4 support in the source code. So whoever wants to use Sedona on Spark
> > 2.4
> > > can still compile the source code by themselves.
> > >
> > > Thanks,
> > > Jia
> > >
> >
>
-- 
Hälsningar,
Martin

Re: [DISCUSS] Drop Spark 2.4 and Scala 2.11 support

Posted by Jia Yu <ji...@gmail.com>.
I agree. Let's start the release process of Sedona 1.2.1.

On Fri, Jul 15, 2022 at 11:22 AM Martin Andersson <
u.martin.andersson@gmail.com> wrote:

> Hi,
>
> Spark 3.3 support is now merged into Sedona master.
>
> Would this be a good time to release Sedona 1.2.1 and then drop support for
> old versions of Spark and Scala?
>
> Br,
> Martin
>
> fre 24 juni 2022 kl. 09:52 skrev Jia Yu <ji...@apache.org>:
>
> > Hi Martin,
> >
> > I agree.
> >
> > 1. Currently, geom serializers, spatial partitioning code and some format
> > reader code in Sedona-core (all in Java) is independent from Spark
> > dependency. So Sedona-Flink actually re-uses those. But a refactor of
> > Sedona ST / RS functions are needed as some of them depend on Spark SQL,
> > which is not necessary.
> >
> > 2. So let's keep both Spark 2.4 and Spark 3.3 support in the next Sedona
> > release (1.2.1), Scala version will include 2.11, 2.12 but no 2.13. In
> the
> > next-next Sedona release (1.3.0), we will drop them completely. Sedona
> > 1.2.1 will be the last release that supports Spark 2.4 and Scala 2.11.
> >
> > Thanks,
> > Jia
> >
> >
> > On Thu, Jun 23, 2022 at 4:03 AM Martin Andersson <
> > u.martin.andersson@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I guess that the pending Spark 3.3 support is a big enough feature to
> > > warrant a new Sedona release. It makes no sense to remove Spark 2.4 and
> > > Scala 2.11 before the release.
> > >
> > > After Sedona-next is released I think that Spark 2.4 can safely be
> > removed.
> > >
> > > Long term, but i think that's another discussion, there are a lot of
> > > benefits to moving shared code between Sedona-Spark and Sedona-Flink
> to a
> > > common java-only module (sedona-common?). That would include
> partitioning
> > > code and probably most ST_x/RS_x functions.
> > >
> > > That would give Sedona-Flink first class, scala-free, support. It would
> > > also open up Sedona to other jvm data tools regardless of whether they
> > are
> > > written in java, scala, kotlin, clojure or any other jvm language.
> > Possibly
> > > Sedona-Kafka, Sedona-Hive etc. That would make Scala-version support a
> > > Sedona-Spark issue only and not a general Sedona issue.
> > >
> > > Br,
> > > Martin
> > >
> > > On 2022/06/19 06:10:31 Jia Yu wrote:
> > > > Dear all,
> > > >
> > > > I am proposing to drop the support of Spark 2.4 and Scala 2.11 in the
> > > next
> > > > Sedona release. The version number will be 1.3.0 if we drop this
> > support,
> > > > otherwise it will be 1.2.1.
> > > >
> > > > Here is the status of Spark 2.4 and Sedona for Spark 2.4
> > > > 1. Spark community has announced Spark 2.4 EOL on March 03 2021:
> > > > https://www.mail-archive.com/dev@spark.apache.org/msg27476.html
> > > > 2. Spark 3.0 was released on 06-16-2020.
> > > > 3. Spark 3.3.0 was released a few days ago. And starting from Spark
> > 3.2,
> > > > Spark releases binaries for both Scala 2.12 and 2.13.
> > > > 4. Only a few Sedona users are using Spark 2.4. According to the
> > > statistics
> > > > of Maven Central (Scala/Java API only), only around 1K out of 100K
> > > > downloads are using Sedona for Spark 2.4. (core-2.4_2.11,
> > core-2.4_2.12,
> > > > python-adapter-2.4_2.11, python-adapter-2.4_2.12)
> > > >
> > > > Benefits of dropping the support:
> > > > 1. Reduce the complexity of maintaining the source code for different
> > > Spark
> > > > versions. Currently, several files have two versions for Spark 2.4
> and
> > > 3.x,
> > > > controlled by "anchor" keywords. I wrote a Python script to
> pre-process
> > > the
> > > > source code all the time:
> > > >
> > >
> > >
> >
> https://github.com/apache/incubator-sedona/blob/master/spark-version-converter.py
> > > > 2. Reduce the overhead of releasing binary packages. Currently, the
> > main
> > > > POM.xml is quite complex in order to compile against different Spark
> > > > versions. Therefore, we weren't able to release Sedona for Scala
> 2.13.
> > > >
> > > > Plan of Sedona for Spark 3.X
> > > > 1. Sedona source code already supports Scala 2.13 but no Sedona
> binary
> > > > release. We will release Sedona for both Scala 2.12 and 2.13, but no
> > > Scala
> > > > 2.11.
> > > > 2. Sedona already releases binaries for Spark 3.0, 3.1, 3.2
> > > > 3. The two latest PRs of Sedona are adding the support for Spark 3.3.
> > > > https://github.com/apache/incubator-sedona/pull/636
> > > > https://github.com/apache/incubator-sedona/pull/635
> > > >
> > > > What do you think of this proposal? If you don't like this, what is
> the
> > > > best time to drop the support of Spark 2.4 and Scala 2.11?
> > > >
> > > > I will let this discussion open for at least 3 days. If no
> objection, I
> > > > will remove Spark 2.4 from POM.xml and GitHub Actions, but leave the
> > > Spark
> > > > 2.4 support in the source code. So whoever wants to use Sedona on
> Spark
> > > 2.4
> > > > can still compile the source code by themselves.
> > > >
> > > > Thanks,
> > > > Jia
> > > >
> > >
> >
> --
> Hälsningar,
> Martin
>