You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Mark Andreev <ma...@gmail.com> on 2023/01/06 18:03:38 UTC

[Suggest] Add geo function to core

Hi,

I suggest adding geographical functions to Apache Core like Clickhouse (
https://clickhouse.com/docs/en/sql-reference/functions/geo/).

- Geographical Coordinates Functions
- Geohash Functions
- H3 Indexes
- S2 Indexes

What do you think? What is current policy about core evolution? Should we
create a separate module (standalone repository out of apache) and after
success merge into the main branch?

--
Best regards,
Mark Andreev

Re: [Suggest] Add geo function to core

Posted by Mo Sarwat <mo...@apache.org>.
Mark,

There is already another Apache project (namely Apache Sedona) that provides comprehensive support of geospatial operations in Spark. Please check it out:

Github: https://github.com/apache/sedona
Website: https://sedona.apache.org

Please feel free to contribute more geospatial functions to Sedona too!

Regards,
-Mo

On 2023/01/06 18:03:38 Mark Andreev wrote:
> Hi,
> 
> I suggest adding geographical functions to Apache Core like Clickhouse (
> https://clickhouse.com/docs/en/sql-reference/functions/geo/).
> 
> - Geographical Coordinates Functions
> - Geohash Functions
> - H3 Indexes
> - S2 Indexes
> 
> What do you think? What is current policy about core evolution? Should we
> create a separate module (standalone repository out of apache) and after
> success merge into the main branch?
> 
> --
> Best regards,
> Mark Andreev
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: [Suggest] Add geo function to core

Posted by Bjørn Jørgensen <bj...@gmail.com>.
Mosaic by Databricks Labs <https://github.com/databrickslabs/mosaic>



tir. 17. jan. 2023 kl. 15:53 skrev Grigory Pomadchin <da...@gmail.com>:

> Hey Mo,
>
> That is awesome, great to hear!
>
> Best,
>
> Grigory
>
> On Tue, Jan 17, 2023 at 9:03 AM Mo Sarwat <mo...@apache.org> wrote:
>
>> Grigory,
>>
>> Thanks a lot for chiming - I really like the PostGIS to PostgreSQL
>> analogy. That is exactly what Sedona (an Apache project) is to Spark. Spark
>> core should remain light / generic enough (similar to PostgreSQL) and all
>> spatial functionalities should be pluggable extensions (Sedona). Otherwise,
>> the core will be unnecessarily heavy to maintain, release, and integrate.
>>
>> Sedona already supports geo-hashing among many other geospatial standard
>> functionality, which work seamlessly with Spark without any issues to the
>> end user. If there is something missing, I would highly recommend that we
>> bring it to the Sedona community, and that will directly feed into the
>> benefit of Spark uses who are doing geo.
>>
>> Implementing geospatial functionality in the core Spark will be a
>> replication of work done already. Databricks for instance already uses
>> Sedona internally with their geospatial capabilities.
>>
>> Finally, I would like to mention that I am totally willing to be
>> corrected on that. Especially, if you tried Sedona with Spark and figured
>> that it does not serve the purpose at all. But, please try it first and
>> let's come up with a few capabilities it cannot provide unless it is
>> implemented in Spark core. And, then we can suggest those capabilities to
>> the Spark community.
>>
>> Thanks,
>> -Mo
>>
>>
>> On 2023/01/17 03:09:06 Grigory Pomadchin wrote:
>> > Hey folks,
>> >
>> > Traditionally GIS functionality is distributed a bit separately - i.e.
>> > PostGIS is a great example; and indeed for GIS needs Sedona / GeoMesa /
>> > GeoWave may work out; I think GeoMesa implements GeoHash (see
>> >
>> https://www.geomesa.org/documentation/stable/user/spark/sparksql_functions.html
>> > -
>> > could be used as an inspiration at least);
>> >
>> > I'm pretty sure DataBricks provides some GIS functions (H3) at this
>> point.
>> > Could be an argument for having smth in the core / officially supported
>> by
>> > Spark community?
>> >
>> > I'd really love to see some relatively lightweight (JTS + Proj4j / SIS)
>> > library with basic expressions and optimization rules in the wild that
>> is
>> > usable in the Spark native interfaces primarily; so there is no need to
>> > figure out the API / way to set it up and / or resolve peculiar
>> > dependencies. Could be a step towards Spark GIS types standardization.
>> >
>> > Best,
>> >
>> > Grigory
>> >
>> > On Mon, Jan 16, 2023 at 6:21 PM Mo Sarwat <mo...@apache.org> wrote:
>> >
>> > > Martin, thanks for chiming in and mentioning Apache SIS. However,
>> Mark was
>> > > asking about Geo in Spark, which Sedona already supports.
>> > >
>> > > Yet, I like the idea of making all dependencies within the Apache
>> family.
>> > > I believe a good solution would be for you (or the SIS community at
>> large)
>> > > to include Apache SIS in Sedona to replace libs like GeoTools. The
>> Sedona
>> > > community would definitely welcome your contribution :)
>> > >
>> > > Regards,
>> > > -Mo
>> > >
>> > > On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
>> > > > Hello Mark
>> > > >
>> > > > Indeed Sedona is surely a serious candidate. Maybe one aspect to
>> take in
>> > > consideration, depending how "core" the geospatial services would be,
>> is
>> > > that Sedona depends on a LGPL library (GeoTools, bundled separately)
>> for
>> > > map projections, Shapefile and GeoTIFF support. So those features
>> could not
>> > > be in core since category X dependencies shall be optional.
>> > > >
>> > > > Regarding referencing by coordinates (including map projections),
>> I'm
>> > > aware of 3 libraries having a license compatible with Apache:
>> > > >
>> > > > * Apache SIS (Apache License)
>> > > > * PROJ4J (Apache license)
>> > > > * PROJ-JNI (MIT license)
>> > > >
>> > > > PROJ-JNI is a binding to PROJ native library using Java Native
>> Interface
>> > > (JNI). PROJ is the most well known map projection library, but it is
>> > > difficult to bundle native code in a Java application.
>> > > >
>> > > > I'm not in a neutral position to said that, but I believe that
>> Apache
>> > > SIS is the most powerful open source pure-Java referencing library.
>> But it
>> > > is relatively big, about 4 Mb for the referencing module with its
>> > > dependencies, not counting the optional EPSG geodetic dataset
>> (because not
>> > > compatible with Apache license). Apache SIS is not the library with
>> the
>> > > largest amount of map projections (PROJ4J has more), but it handles
>> some
>> > > difficult problems and scale well with three- or four-dimensional
>> data (or
>> > > more).
>> > > >
>> > > > PROJ4J is a lightweight library which may be sufficient if data are
>> > > mostly two-dimensional (limited 3D support seems also possible) and if
>> > > uncertainty of a few metres in coordinate transformations (depending
>> how
>> > > datum shifts are specified) is acceptable.
>> > > >
>> > > > It is possible to write some code in an implementation-independent
>> way
>> > > using GeoAPI interfaces, which aim to do what JDBC interfaces do for
>> > > databases. Apache SIS and PROJ-JNI are implementations of GeoAPI
>> > > interfaces, so by using those interfaces you can let users choose
>> among
>> > > those two implementations. I think that GeoAPI wrappers could easily
>> be
>> > > contributed to PROJ4J as well if there is a desire for that.
>> > > >
>> > > > Regarding Geohash, if we are talking about the algorithm described
>> at
>> > > https://en.wikipedia.org/wiki/Geohash, then SIS already supports it.
>> SIS
>> > > supports also the Military Grid Reference System (MGRS), which can be
>> seen
>> > > as another kind of geohash with better characteristics.
>> > > >
>> > > > Regards,
>> > > >
>> > > >     Martin
>> > > >
>> > > >
>> ---------------------------------------------------------------------
>> > > > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> > > >
>> > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> > >
>> > >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: [Suggest] Add geo function to core

Posted by Grigory Pomadchin <da...@gmail.com>.
Hey Mo,

That is awesome, great to hear!

Best,

Grigory

On Tue, Jan 17, 2023 at 9:03 AM Mo Sarwat <mo...@apache.org> wrote:

> Grigory,
>
> Thanks a lot for chiming - I really like the PostGIS to PostgreSQL
> analogy. That is exactly what Sedona (an Apache project) is to Spark. Spark
> core should remain light / generic enough (similar to PostgreSQL) and all
> spatial functionalities should be pluggable extensions (Sedona). Otherwise,
> the core will be unnecessarily heavy to maintain, release, and integrate.
>
> Sedona already supports geo-hashing among many other geospatial standard
> functionality, which work seamlessly with Spark without any issues to the
> end user. If there is something missing, I would highly recommend that we
> bring it to the Sedona community, and that will directly feed into the
> benefit of Spark uses who are doing geo.
>
> Implementing geospatial functionality in the core Spark will be a
> replication of work done already. Databricks for instance already uses
> Sedona internally with their geospatial capabilities.
>
> Finally, I would like to mention that I am totally willing to be corrected
> on that. Especially, if you tried Sedona with Spark and figured that it
> does not serve the purpose at all. But, please try it first and let's come
> up with a few capabilities it cannot provide unless it is implemented in
> Spark core. And, then we can suggest those capabilities to the Spark
> community.
>
> Thanks,
> -Mo
>
>
> On 2023/01/17 03:09:06 Grigory Pomadchin wrote:
> > Hey folks,
> >
> > Traditionally GIS functionality is distributed a bit separately - i.e.
> > PostGIS is a great example; and indeed for GIS needs Sedona / GeoMesa /
> > GeoWave may work out; I think GeoMesa implements GeoHash (see
> >
> https://www.geomesa.org/documentation/stable/user/spark/sparksql_functions.html
> > -
> > could be used as an inspiration at least);
> >
> > I'm pretty sure DataBricks provides some GIS functions (H3) at this
> point.
> > Could be an argument for having smth in the core / officially supported
> by
> > Spark community?
> >
> > I'd really love to see some relatively lightweight (JTS + Proj4j / SIS)
> > library with basic expressions and optimization rules in the wild that is
> > usable in the Spark native interfaces primarily; so there is no need to
> > figure out the API / way to set it up and / or resolve peculiar
> > dependencies. Could be a step towards Spark GIS types standardization.
> >
> > Best,
> >
> > Grigory
> >
> > On Mon, Jan 16, 2023 at 6:21 PM Mo Sarwat <mo...@apache.org> wrote:
> >
> > > Martin, thanks for chiming in and mentioning Apache SIS. However, Mark
> was
> > > asking about Geo in Spark, which Sedona already supports.
> > >
> > > Yet, I like the idea of making all dependencies within the Apache
> family.
> > > I believe a good solution would be for you (or the SIS community at
> large)
> > > to include Apache SIS in Sedona to replace libs like GeoTools. The
> Sedona
> > > community would definitely welcome your contribution :)
> > >
> > > Regards,
> > > -Mo
> > >
> > > On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
> > > > Hello Mark
> > > >
> > > > Indeed Sedona is surely a serious candidate. Maybe one aspect to
> take in
> > > consideration, depending how "core" the geospatial services would be,
> is
> > > that Sedona depends on a LGPL library (GeoTools, bundled separately)
> for
> > > map projections, Shapefile and GeoTIFF support. So those features
> could not
> > > be in core since category X dependencies shall be optional.
> > > >
> > > > Regarding referencing by coordinates (including map projections), I'm
> > > aware of 3 libraries having a license compatible with Apache:
> > > >
> > > > * Apache SIS (Apache License)
> > > > * PROJ4J (Apache license)
> > > > * PROJ-JNI (MIT license)
> > > >
> > > > PROJ-JNI is a binding to PROJ native library using Java Native
> Interface
> > > (JNI). PROJ is the most well known map projection library, but it is
> > > difficult to bundle native code in a Java application.
> > > >
> > > > I'm not in a neutral position to said that, but I believe that Apache
> > > SIS is the most powerful open source pure-Java referencing library.
> But it
> > > is relatively big, about 4 Mb for the referencing module with its
> > > dependencies, not counting the optional EPSG geodetic dataset (because
> not
> > > compatible with Apache license). Apache SIS is not the library with the
> > > largest amount of map projections (PROJ4J has more), but it handles
> some
> > > difficult problems and scale well with three- or four-dimensional data
> (or
> > > more).
> > > >
> > > > PROJ4J is a lightweight library which may be sufficient if data are
> > > mostly two-dimensional (limited 3D support seems also possible) and if
> > > uncertainty of a few metres in coordinate transformations (depending
> how
> > > datum shifts are specified) is acceptable.
> > > >
> > > > It is possible to write some code in an implementation-independent
> way
> > > using GeoAPI interfaces, which aim to do what JDBC interfaces do for
> > > databases. Apache SIS and PROJ-JNI are implementations of GeoAPI
> > > interfaces, so by using those interfaces you can let users choose among
> > > those two implementations. I think that GeoAPI wrappers could easily be
> > > contributed to PROJ4J as well if there is a desire for that.
> > > >
> > > > Regarding Geohash, if we are talking about the algorithm described at
> > > https://en.wikipedia.org/wiki/Geohash, then SIS already supports it.
> SIS
> > > supports also the Military Grid Reference System (MGRS), which can be
> seen
> > > as another kind of geohash with better characteristics.
> > > >
> > > > Regards,
> > > >
> > > >     Martin
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: [Suggest] Add geo function to core

Posted by Mo Sarwat <mo...@apache.org>.
Grigory,

Thanks a lot for chiming - I really like the PostGIS to PostgreSQL analogy. That is exactly what Sedona (an Apache project) is to Spark. Spark core should remain light / generic enough (similar to PostgreSQL) and all spatial functionalities should be pluggable extensions (Sedona). Otherwise, the core will be unnecessarily heavy to maintain, release, and integrate. 

Sedona already supports geo-hashing among many other geospatial standard functionality, which work seamlessly with Spark without any issues to the end user. If there is something missing, I would highly recommend that we bring it to the Sedona community, and that will directly feed into the benefit of Spark uses who are doing geo.

Implementing geospatial functionality in the core Spark will be a replication of work done already. Databricks for instance already uses Sedona internally with their geospatial capabilities.

Finally, I would like to mention that I am totally willing to be corrected on that. Especially, if you tried Sedona with Spark and figured that it does not serve the purpose at all. But, please try it first and let's come up with a few capabilities it cannot provide unless it is implemented in Spark core. And, then we can suggest those capabilities to the Spark community.

Thanks,
-Mo
 

On 2023/01/17 03:09:06 Grigory Pomadchin wrote:
> Hey folks,
> 
> Traditionally GIS functionality is distributed a bit separately - i.e.
> PostGIS is a great example; and indeed for GIS needs Sedona / GeoMesa /
> GeoWave may work out; I think GeoMesa implements GeoHash (see
> https://www.geomesa.org/documentation/stable/user/spark/sparksql_functions.html
> -
> could be used as an inspiration at least);
> 
> I'm pretty sure DataBricks provides some GIS functions (H3) at this point.
> Could be an argument for having smth in the core / officially supported by
> Spark community?
> 
> I'd really love to see some relatively lightweight (JTS + Proj4j / SIS)
> library with basic expressions and optimization rules in the wild that is
> usable in the Spark native interfaces primarily; so there is no need to
> figure out the API / way to set it up and / or resolve peculiar
> dependencies. Could be a step towards Spark GIS types standardization.
> 
> Best,
> 
> Grigory
> 
> On Mon, Jan 16, 2023 at 6:21 PM Mo Sarwat <mo...@apache.org> wrote:
> 
> > Martin, thanks for chiming in and mentioning Apache SIS. However, Mark was
> > asking about Geo in Spark, which Sedona already supports.
> >
> > Yet, I like the idea of making all dependencies within the Apache family.
> > I believe a good solution would be for you (or the SIS community at large)
> > to include Apache SIS in Sedona to replace libs like GeoTools. The Sedona
> > community would definitely welcome your contribution :)
> >
> > Regards,
> > -Mo
> >
> > On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
> > > Hello Mark
> > >
> > > Indeed Sedona is surely a serious candidate. Maybe one aspect to take in
> > consideration, depending how "core" the geospatial services would be, is
> > that Sedona depends on a LGPL library (GeoTools, bundled separately) for
> > map projections, Shapefile and GeoTIFF support. So those features could not
> > be in core since category X dependencies shall be optional.
> > >
> > > Regarding referencing by coordinates (including map projections), I'm
> > aware of 3 libraries having a license compatible with Apache:
> > >
> > > * Apache SIS (Apache License)
> > > * PROJ4J (Apache license)
> > > * PROJ-JNI (MIT license)
> > >
> > > PROJ-JNI is a binding to PROJ native library using Java Native Interface
> > (JNI). PROJ is the most well known map projection library, but it is
> > difficult to bundle native code in a Java application.
> > >
> > > I'm not in a neutral position to said that, but I believe that Apache
> > SIS is the most powerful open source pure-Java referencing library. But it
> > is relatively big, about 4 Mb for the referencing module with its
> > dependencies, not counting the optional EPSG geodetic dataset (because not
> > compatible with Apache license). Apache SIS is not the library with the
> > largest amount of map projections (PROJ4J has more), but it handles some
> > difficult problems and scale well with three- or four-dimensional data (or
> > more).
> > >
> > > PROJ4J is a lightweight library which may be sufficient if data are
> > mostly two-dimensional (limited 3D support seems also possible) and if
> > uncertainty of a few metres in coordinate transformations (depending how
> > datum shifts are specified) is acceptable.
> > >
> > > It is possible to write some code in an implementation-independent way
> > using GeoAPI interfaces, which aim to do what JDBC interfaces do for
> > databases. Apache SIS and PROJ-JNI are implementations of GeoAPI
> > interfaces, so by using those interfaces you can let users choose among
> > those two implementations. I think that GeoAPI wrappers could easily be
> > contributed to PROJ4J as well if there is a desire for that.
> > >
> > > Regarding Geohash, if we are talking about the algorithm described at
> > https://en.wikipedia.org/wiki/Geohash, then SIS already supports it. SIS
> > supports also the Military Grid Reference System (MGRS), which can be seen
> > as another kind of geohash with better characteristics.
> > >
> > > Regards,
> > >
> > >     Martin
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: [Suggest] Add geo function to core

Posted by Grigory Pomadchin <da...@gmail.com>.
Hey folks,

Traditionally GIS functionality is distributed a bit separately - i.e.
PostGIS is a great example; and indeed for GIS needs Sedona / GeoMesa /
GeoWave may work out; I think GeoMesa implements GeoHash (see
https://www.geomesa.org/documentation/stable/user/spark/sparksql_functions.html
-
could be used as an inspiration at least);

I'm pretty sure DataBricks provides some GIS functions (H3) at this point.
Could be an argument for having smth in the core / officially supported by
Spark community?

I'd really love to see some relatively lightweight (JTS + Proj4j / SIS)
library with basic expressions and optimization rules in the wild that is
usable in the Spark native interfaces primarily; so there is no need to
figure out the API / way to set it up and / or resolve peculiar
dependencies. Could be a step towards Spark GIS types standardization.

Best,

Grigory

On Mon, Jan 16, 2023 at 6:21 PM Mo Sarwat <mo...@apache.org> wrote:

> Martin, thanks for chiming in and mentioning Apache SIS. However, Mark was
> asking about Geo in Spark, which Sedona already supports.
>
> Yet, I like the idea of making all dependencies within the Apache family.
> I believe a good solution would be for you (or the SIS community at large)
> to include Apache SIS in Sedona to replace libs like GeoTools. The Sedona
> community would definitely welcome your contribution :)
>
> Regards,
> -Mo
>
> On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
> > Hello Mark
> >
> > Indeed Sedona is surely a serious candidate. Maybe one aspect to take in
> consideration, depending how "core" the geospatial services would be, is
> that Sedona depends on a LGPL library (GeoTools, bundled separately) for
> map projections, Shapefile and GeoTIFF support. So those features could not
> be in core since category X dependencies shall be optional.
> >
> > Regarding referencing by coordinates (including map projections), I'm
> aware of 3 libraries having a license compatible with Apache:
> >
> > * Apache SIS (Apache License)
> > * PROJ4J (Apache license)
> > * PROJ-JNI (MIT license)
> >
> > PROJ-JNI is a binding to PROJ native library using Java Native Interface
> (JNI). PROJ is the most well known map projection library, but it is
> difficult to bundle native code in a Java application.
> >
> > I'm not in a neutral position to said that, but I believe that Apache
> SIS is the most powerful open source pure-Java referencing library. But it
> is relatively big, about 4 Mb for the referencing module with its
> dependencies, not counting the optional EPSG geodetic dataset (because not
> compatible with Apache license). Apache SIS is not the library with the
> largest amount of map projections (PROJ4J has more), but it handles some
> difficult problems and scale well with three- or four-dimensional data (or
> more).
> >
> > PROJ4J is a lightweight library which may be sufficient if data are
> mostly two-dimensional (limited 3D support seems also possible) and if
> uncertainty of a few metres in coordinate transformations (depending how
> datum shifts are specified) is acceptable.
> >
> > It is possible to write some code in an implementation-independent way
> using GeoAPI interfaces, which aim to do what JDBC interfaces do for
> databases. Apache SIS and PROJ-JNI are implementations of GeoAPI
> interfaces, so by using those interfaces you can let users choose among
> those two implementations. I think that GeoAPI wrappers could easily be
> contributed to PROJ4J as well if there is a desire for that.
> >
> > Regarding Geohash, if we are talking about the algorithm described at
> https://en.wikipedia.org/wiki/Geohash, then SIS already supports it. SIS
> supports also the Military Grid Reference System (MGRS), which can be seen
> as another kind of geohash with better characteristics.
> >
> > Regards,
> >
> >     Martin
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: [Suggest] Add geo function to core

Posted by Mo Sarwat <mo...@apache.org>.
Martin, thanks for chiming in and mentioning Apache SIS. However, Mark was asking about Geo in Spark, which Sedona already supports. 

Yet, I like the idea of making all dependencies within the Apache family. I believe a good solution would be for you (or the SIS community at large) to include Apache SIS in Sedona to replace libs like GeoTools. The Sedona community would definitely welcome your contribution :)

Regards,
-Mo

On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
> Hello Mark
> 
> Indeed Sedona is surely a serious candidate. Maybe one aspect to take in consideration, depending how "core" the geospatial services would be, is that Sedona depends on a LGPL library (GeoTools, bundled separately) for map projections, Shapefile and GeoTIFF support. So those features could not be in core since category X dependencies shall be optional.
> 
> Regarding referencing by coordinates (including map projections), I'm aware of 3 libraries having a license compatible with Apache:
> 
> * Apache SIS (Apache License)
> * PROJ4J (Apache license)
> * PROJ-JNI (MIT license)
> 
> PROJ-JNI is a binding to PROJ native library using Java Native Interface (JNI). PROJ is the most well known map projection library, but it is difficult to bundle native code in a Java application.
> 
> I'm not in a neutral position to said that, but I believe that Apache SIS is the most powerful open source pure-Java referencing library. But it is relatively big, about 4 Mb for the referencing module with its dependencies, not counting the optional EPSG geodetic dataset (because not compatible with Apache license). Apache SIS is not the library with the largest amount of map projections (PROJ4J has more), but it handles some difficult problems and scale well with three- or four-dimensional data (or more).
> 
> PROJ4J is a lightweight library which may be sufficient if data are mostly two-dimensional (limited 3D support seems also possible) and if uncertainty of a few metres in coordinate transformations (depending how datum shifts are specified) is acceptable.
> 
> It is possible to write some code in an implementation-independent way using GeoAPI interfaces, which aim to do what JDBC interfaces do for databases. Apache SIS and PROJ-JNI are implementations of GeoAPI interfaces, so by using those interfaces you can let users choose among those two implementations. I think that GeoAPI wrappers could easily be contributed to PROJ4J as well if there is a desire for that.
> 
> Regarding Geohash, if we are talking about the algorithm described at https://en.wikipedia.org/wiki/Geohash, then SIS already supports it. SIS supports also the Military Grid Reference System (MGRS), which can be seen as another kind of geohash with better characteristics.
> 
> Regards,
> 
>     Martin
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: [Suggest] Add geo function to core

Posted by Martin Desruisseaux <de...@apache.org>.
Hello Mark

Indeed Sedona is surely a serious candidate. Maybe one aspect to take in consideration, depending how "core" the geospatial services would be, is that Sedona depends on a LGPL library (GeoTools, bundled separately) for map projections, Shapefile and GeoTIFF support. So those features could not be in core since category X dependencies shall be optional.

Regarding referencing by coordinates (including map projections), I'm aware of 3 libraries having a license compatible with Apache:

* Apache SIS (Apache License)
* PROJ4J (Apache license)
* PROJ-JNI (MIT license)

PROJ-JNI is a binding to PROJ native library using Java Native Interface (JNI). PROJ is the most well known map projection library, but it is difficult to bundle native code in a Java application.

I'm not in a neutral position to said that, but I believe that Apache SIS is the most powerful open source pure-Java referencing library. But it is relatively big, about 4 Mb for the referencing module with its dependencies, not counting the optional EPSG geodetic dataset (because not compatible with Apache license). Apache SIS is not the library with the largest amount of map projections (PROJ4J has more), but it handles some difficult problems and scale well with three- or four-dimensional data (or more).

PROJ4J is a lightweight library which may be sufficient if data are mostly two-dimensional (limited 3D support seems also possible) and if uncertainty of a few metres in coordinate transformations (depending how datum shifts are specified) is acceptable.

It is possible to write some code in an implementation-independent way using GeoAPI interfaces, which aim to do what JDBC interfaces do for databases. Apache SIS and PROJ-JNI are implementations of GeoAPI interfaces, so by using those interfaces you can let users choose among those two implementations. I think that GeoAPI wrappers could easily be contributed to PROJ4J as well if there is a desire for that.

Regarding Geohash, if we are talking about the algorithm described at https://en.wikipedia.org/wiki/Geohash, then SIS already supports it. SIS supports also the Military Grid Reference System (MGRS), which can be seen as another kind of geohash with better characteristics.

Regards,

    Martin

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org