You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2017/06/28 01:32:18 UTC

Spatial indexes

Is anyone looking for a neat project in Calcite that would have a big
impact? I'm thinking that we could add support for spatial indexes to
Calcite in such a way that downstream projects such as Phoenix and
Flink could easily benefit from it.

GIS (Geographic Information Systems, aka Spatial database) is really
useful functionality to have in your database. To find restaurants
less than 1 km from downtown San Francisco, you could run

  select *
  from restaurants as r
  where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;

There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
and Microsoft SQL Server; and OpenGIS has standardized SQL
extensions[1].

Now, the SQL-GIS standard is rather large, and involves implementing
lots of data types and scalar functions. We could get to that
eventually. But I contend that many, many applications would be
satisfied by points and distances (like the query above) and a spatial
index to make them run quickly. And I believe that we can add spatial
index support to Calcite using a logical rewrite rule.

Rewriting spatial queries to indexes on space-filling curves is a
well-established technique [2].

Suppose that the restaurants table, above, had columns latitude and
longitude and a computed numeric column h = hilbert(latitude,
longitude). Hilbert curves are space-filling curves such that if two
points are close in space then their h values will be close. So, if
there is an index on h, we can find all restaurants close to a given
point using a range scan of the index.

So, the above query could be rewritten to something like

  select *
  from restaurants as r
  where (r.h between 123456 and 123599
      or r.h between 256789 and 259887)
    and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;

The range predicates on r.h quickly eliminate 99.9% of the rows in the
database, and the call to st_distance_internal eliminates the
remaining false positives.

That rewrite can be done using a logical rewrite rule, and the
resulting query will be faster on just about any database, but
especially one with key-sorted tables (like Phoenix/HBase) or
range-partitioned tables. The database does not need to have a
dedicated "spatial index" data structure.

Julian

[1] http://www.opengeospatial.org/standards/sfs

[2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf

Re: Spatial indexes

Posted by Atri Sharma <at...@gmail.com>.
I will log a JIRA case for this, and we can continue discussions there.

On Jun 28, 2017 10:43 AM, "Jacques Nadeau" <ja...@apache.org> wrote:

> I don't know how much I can contribute but it sounds like a great way to
> expand Calcite. Good idea Julian!
>
> On Jun 27, 2017 7:06 PM, "Atri Sharma" <at...@gmail.com> wrote:
>
> > I am all for it. This sounds really interesting.
> >
> > On Jun 28, 2017 7:02 AM, "Julian Hyde" <jh...@apache.org> wrote:
> >
> > > Is anyone looking for a neat project in Calcite that would have a big
> > > impact? I'm thinking that we could add support for spatial indexes to
> > > Calcite in such a way that downstream projects such as Phoenix and
> > > Flink could easily benefit from it.
> > >
> > > GIS (Geographic Information Systems, aka Spatial database) is really
> > > useful functionality to have in your database. To find restaurants
> > > less than 1 km from downtown San Francisco, you could run
> > >
> > >   select *
> > >   from restaurants as r
> > >   where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
> > >
> > > There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
> > > and Microsoft SQL Server; and OpenGIS has standardized SQL
> > > extensions[1].
> > >
> > > Now, the SQL-GIS standard is rather large, and involves implementing
> > > lots of data types and scalar functions. We could get to that
> > > eventually. But I contend that many, many applications would be
> > > satisfied by points and distances (like the query above) and a spatial
> > > index to make them run quickly. And I believe that we can add spatial
> > > index support to Calcite using a logical rewrite rule.
> > >
> > > Rewriting spatial queries to indexes on space-filling curves is a
> > > well-established technique [2].
> > >
> > > Suppose that the restaurants table, above, had columns latitude and
> > > longitude and a computed numeric column h = hilbert(latitude,
> > > longitude). Hilbert curves are space-filling curves such that if two
> > > points are close in space then their h values will be close. So, if
> > > there is an index on h, we can find all restaurants close to a given
> > > point using a range scan of the index.
> > >
> > > So, the above query could be rewritten to something like
> > >
> > >   select *
> > >   from restaurants as r
> > >   where (r.h between 123456 and 123599
> > >       or r.h between 256789 and 259887)
> > >     and st_distance_internal(point(-122.4194, 37.7749), r.coordinates)
> > <=
> > > 1;
> > >
> > > The range predicates on r.h quickly eliminate 99.9% of the rows in the
> > > database, and the call to st_distance_internal eliminates the
> > > remaining false positives.
> > >
> > > That rewrite can be done using a logical rewrite rule, and the
> > > resulting query will be faster on just about any database, but
> > > especially one with key-sorted tables (like Phoenix/HBase) or
> > > range-partitioned tables. The database does not need to have a
> > > dedicated "spatial index" data structure.
> > >
> > > Julian
> > >
> > > [1] http://www.opengeospatial.org/standards/sfs
> > >
> > > [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
> > >
> >
>

Re: Spatial indexes

Posted by Jacques Nadeau <ja...@apache.org>.
I don't know how much I can contribute but it sounds like a great way to
expand Calcite. Good idea Julian!

On Jun 27, 2017 7:06 PM, "Atri Sharma" <at...@gmail.com> wrote:

> I am all for it. This sounds really interesting.
>
> On Jun 28, 2017 7:02 AM, "Julian Hyde" <jh...@apache.org> wrote:
>
> > Is anyone looking for a neat project in Calcite that would have a big
> > impact? I'm thinking that we could add support for spatial indexes to
> > Calcite in such a way that downstream projects such as Phoenix and
> > Flink could easily benefit from it.
> >
> > GIS (Geographic Information Systems, aka Spatial database) is really
> > useful functionality to have in your database. To find restaurants
> > less than 1 km from downtown San Francisco, you could run
> >
> >   select *
> >   from restaurants as r
> >   where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
> >
> > There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
> > and Microsoft SQL Server; and OpenGIS has standardized SQL
> > extensions[1].
> >
> > Now, the SQL-GIS standard is rather large, and involves implementing
> > lots of data types and scalar functions. We could get to that
> > eventually. But I contend that many, many applications would be
> > satisfied by points and distances (like the query above) and a spatial
> > index to make them run quickly. And I believe that we can add spatial
> > index support to Calcite using a logical rewrite rule.
> >
> > Rewriting spatial queries to indexes on space-filling curves is a
> > well-established technique [2].
> >
> > Suppose that the restaurants table, above, had columns latitude and
> > longitude and a computed numeric column h = hilbert(latitude,
> > longitude). Hilbert curves are space-filling curves such that if two
> > points are close in space then their h values will be close. So, if
> > there is an index on h, we can find all restaurants close to a given
> > point using a range scan of the index.
> >
> > So, the above query could be rewritten to something like
> >
> >   select *
> >   from restaurants as r
> >   where (r.h between 123456 and 123599
> >       or r.h between 256789 and 259887)
> >     and st_distance_internal(point(-122.4194, 37.7749), r.coordinates)
> <=
> > 1;
> >
> > The range predicates on r.h quickly eliminate 99.9% of the rows in the
> > database, and the call to st_distance_internal eliminates the
> > remaining false positives.
> >
> > That rewrite can be done using a logical rewrite rule, and the
> > resulting query will be faster on just about any database, but
> > especially one with key-sorted tables (like Phoenix/HBase) or
> > range-partitioned tables. The database does not need to have a
> > dedicated "spatial index" data structure.
> >
> > Julian
> >
> > [1] http://www.opengeospatial.org/standards/sfs
> >
> > [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
> >
>

Re: Spatial indexes

Posted by Atri Sharma <at...@gmail.com>.
I am all for it. This sounds really interesting.

On Jun 28, 2017 7:02 AM, "Julian Hyde" <jh...@apache.org> wrote:

> Is anyone looking for a neat project in Calcite that would have a big
> impact? I'm thinking that we could add support for spatial indexes to
> Calcite in such a way that downstream projects such as Phoenix and
> Flink could easily benefit from it.
>
> GIS (Geographic Information Systems, aka Spatial database) is really
> useful functionality to have in your database. To find restaurants
> less than 1 km from downtown San Francisco, you could run
>
>   select *
>   from restaurants as r
>   where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>
> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
> and Microsoft SQL Server; and OpenGIS has standardized SQL
> extensions[1].
>
> Now, the SQL-GIS standard is rather large, and involves implementing
> lots of data types and scalar functions. We could get to that
> eventually. But I contend that many, many applications would be
> satisfied by points and distances (like the query above) and a spatial
> index to make them run quickly. And I believe that we can add spatial
> index support to Calcite using a logical rewrite rule.
>
> Rewriting spatial queries to indexes on space-filling curves is a
> well-established technique [2].
>
> Suppose that the restaurants table, above, had columns latitude and
> longitude and a computed numeric column h = hilbert(latitude,
> longitude). Hilbert curves are space-filling curves such that if two
> points are close in space then their h values will be close. So, if
> there is an index on h, we can find all restaurants close to a given
> point using a range scan of the index.
>
> So, the above query could be rewritten to something like
>
>   select *
>   from restaurants as r
>   where (r.h between 123456 and 123599
>       or r.h between 256789 and 259887)
>     and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <=
> 1;
>
> The range predicates on r.h quickly eliminate 99.9% of the rows in the
> database, and the call to st_distance_internal eliminates the
> remaining false positives.
>
> That rewrite can be done using a logical rewrite rule, and the
> resulting query will be faster on just about any database, but
> especially one with key-sorted tables (like Phoenix/HBase) or
> range-partitioned tables. The database does not need to have a
> dedicated "spatial index" data structure.
>
> Julian
>
> [1] http://www.opengeospatial.org/standards/sfs
>
> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
>

Re: Spatial indexes

Posted by Julian Hyde <jh...@apache.org>.
I would like to start with a test schema (maybe a single table backed by a CSV file), a query that finds points within a given distance of a point, and do a rewrite of that query to use a range scan of a hilbert column.

Calcite would not have a “CREATE [SPATIAL] INDEX” command, but the rewrite rule would be sufficient for other systems to create one. (Similar to how Phoenix and Druid use rewrites to target their indexes.)

Let’s stick to flat-earth Euclidean geometry for now. I.e. points have unit-less coordinates, not latitude and longitude. The distance between (0, 0) and (4, 3) is exactly 5, per Pythagoras’ theorem.

As for standards, let's reference PostGIS, but also let’s use OpenGIS SQL (reference [1] in my original email).

You are going to have some fun with Hilbert curves. You’ll need not only “long hilbert(int, int)”, but also to compute the ranges on the hilbert curve that are within a circle or rectangle. Should we work on integers or floating point numbers? Hilbert functions on integers are nice, because it’s just bit-twiddling [3].

Julian

[3] https://stackoverflow.com/questions/499166/mapping-n-dimensional-value-to-a-point-on-hilbert-curve <https://stackoverflow.com/questions/499166/mapping-n-dimensional-value-to-a-point-on-hilbert-curve>

> On Jun 28, 2017, at 11:42 AM, Atri Sharma <at...@gmail.com> wrote:
> 
> How do you propose we start here? I can start writing a spec based on
> the implementation, based on how Postgres and others implement them.
> 
> On Thu, Jun 29, 2017 at 12:08 AM, Julian Hyde <jh...@apache.org> wrote:
>> No, anyone brave enough to be an early adopter would already be on this list.
>> 
>> James (Phoenix), Jacques (Dremio), Fabian (Flink), Jinfeng/Aman (Drill) can be proxies for their their communities. And others who are building interesting stuff I don’t know about.
>> 
>> Julian
>> 
>>> On Jun 28, 2017, at 10:29 AM, Atri Sharma <at...@gmail.com> wrote:
>>> 
>>> Should we start a thread with potential users, eg Phoenix community?
>>> 
>>> On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> I’d also like to hear from potential users of this feature. They could try this functionality as it becomes available, and help us prioritize features.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <at...@gmail.com> wrote:
>>>>> 
>>>>> And I have created the JIRA:
>>>>> 
>>>>> https://issues.apache.org/jira/browse/CALCITE-1861
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <jh...@apache.org> wrote:
>>>>>> Is anyone looking for a neat project in Calcite that would have a big
>>>>>> impact? I'm thinking that we could add support for spatial indexes to
>>>>>> Calcite in such a way that downstream projects such as Phoenix and
>>>>>> Flink could easily benefit from it.
>>>>>> 
>>>>>> GIS (Geographic Information Systems, aka Spatial database) is really
>>>>>> useful functionality to have in your database. To find restaurants
>>>>>> less than 1 km from downtown San Francisco, you could run
>>>>>> 
>>>>>> select *
>>>>>> from restaurants as r
>>>>>> where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>>>> 
>>>>>> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
>>>>>> and Microsoft SQL Server; and OpenGIS has standardized SQL
>>>>>> extensions[1].
>>>>>> 
>>>>>> Now, the SQL-GIS standard is rather large, and involves implementing
>>>>>> lots of data types and scalar functions. We could get to that
>>>>>> eventually. But I contend that many, many applications would be
>>>>>> satisfied by points and distances (like the query above) and a spatial
>>>>>> index to make them run quickly. And I believe that we can add spatial
>>>>>> index support to Calcite using a logical rewrite rule.
>>>>>> 
>>>>>> Rewriting spatial queries to indexes on space-filling curves is a
>>>>>> well-established technique [2].
>>>>>> 
>>>>>> Suppose that the restaurants table, above, had columns latitude and
>>>>>> longitude and a computed numeric column h = hilbert(latitude,
>>>>>> longitude). Hilbert curves are space-filling curves such that if two
>>>>>> points are close in space then their h values will be close. So, if
>>>>>> there is an index on h, we can find all restaurants close to a given
>>>>>> point using a range scan of the index.
>>>>>> 
>>>>>> So, the above query could be rewritten to something like
>>>>>> 
>>>>>> select *
>>>>>> from restaurants as r
>>>>>> where (r.h between 123456 and 123599
>>>>>>    or r.h between 256789 and 259887)
>>>>>>  and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>>>> 
>>>>>> The range predicates on r.h quickly eliminate 99.9% of the rows in the
>>>>>> database, and the call to st_distance_internal eliminates the
>>>>>> remaining false positives.
>>>>>> 
>>>>>> That rewrite can be done using a logical rewrite rule, and the
>>>>>> resulting query will be faster on just about any database, but
>>>>>> especially one with key-sorted tables (like Phoenix/HBase) or
>>>>>> range-partitioned tables. The database does not need to have a
>>>>>> dedicated "spatial index" data structure.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> [1] http://www.opengeospatial.org/standards/sfs
>>>>>> 
>>>>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> 
>>>>> Atri
>>>>> l'apprenant
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> 
>>> Atri
>>> l'apprenant
>> 
> 
> 
> 
> -- 
> Regards,
> 
> Atri
> l'apprenant


Re: Spatial indexes

Posted by Liangfei Su <su...@gmail.com>.
+1 for the feature, sounds interesting, thanks Julian. As a flink user,
watch to see more how this would be applied to flink.

On Thu, Jun 29, 2017 at 3:22 AM, Julian Hyde <jh...@apache.org> wrote:

> PS The author of the paper referenced in that StackOverflow article[4],
> John Skilling, was my applied mathematics supervisor in Cambridge. It’s a
> small world.
>
> [4] http://ratml.org/misc/hilbert_curve.pdf <http://ratml.org/misc/
> hilbert_curve.pdf>
>
>
> > On Jun 28, 2017, at 11:42 AM, Atri Sharma <at...@gmail.com> wrote:
> >
> > How do you propose we start here? I can start writing a spec based on
> > the implementation, based on how Postgres and others implement them.
> >
> > On Thu, Jun 29, 2017 at 12:08 AM, Julian Hyde <jh...@apache.org> wrote:
> >> No, anyone brave enough to be an early adopter would already be on this
> list.
> >>
> >> James (Phoenix), Jacques (Dremio), Fabian (Flink), Jinfeng/Aman (Drill)
> can be proxies for their their communities. And others who are building
> interesting stuff I don’t know about.
> >>
> >> Julian
> >>
> >>> On Jun 28, 2017, at 10:29 AM, Atri Sharma <at...@gmail.com> wrote:
> >>>
> >>> Should we start a thread with potential users, eg Phoenix community?
> >>>
> >>> On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >>>> I’d also like to hear from potential users of this feature. They
> could try this functionality as it becomes available, and help us
> prioritize features.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <at...@gmail.com>
> wrote:
> >>>>>
> >>>>> And I have created the JIRA:
> >>>>>
> >>>>> https://issues.apache.org/jira/browse/CALCITE-1861
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <jh...@apache.org>
> wrote:
> >>>>>> Is anyone looking for a neat project in Calcite that would have a
> big
> >>>>>> impact? I'm thinking that we could add support for spatial indexes
> to
> >>>>>> Calcite in such a way that downstream projects such as Phoenix and
> >>>>>> Flink could easily benefit from it.
> >>>>>>
> >>>>>> GIS (Geographic Information Systems, aka Spatial database) is really
> >>>>>> useful functionality to have in your database. To find restaurants
> >>>>>> less than 1 km from downtown San Francisco, you could run
> >>>>>>
> >>>>>> select *
> >>>>>> from restaurants as r
> >>>>>> where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
> >>>>>>
> >>>>>> There are mature SQL implementations of GIS in PostGIS, Oracle
> Spatial
> >>>>>> and Microsoft SQL Server; and OpenGIS has standardized SQL
> >>>>>> extensions[1].
> >>>>>>
> >>>>>> Now, the SQL-GIS standard is rather large, and involves implementing
> >>>>>> lots of data types and scalar functions. We could get to that
> >>>>>> eventually. But I contend that many, many applications would be
> >>>>>> satisfied by points and distances (like the query above) and a
> spatial
> >>>>>> index to make them run quickly. And I believe that we can add
> spatial
> >>>>>> index support to Calcite using a logical rewrite rule.
> >>>>>>
> >>>>>> Rewriting spatial queries to indexes on space-filling curves is a
> >>>>>> well-established technique [2].
> >>>>>>
> >>>>>> Suppose that the restaurants table, above, had columns latitude and
> >>>>>> longitude and a computed numeric column h = hilbert(latitude,
> >>>>>> longitude). Hilbert curves are space-filling curves such that if two
> >>>>>> points are close in space then their h values will be close. So, if
> >>>>>> there is an index on h, we can find all restaurants close to a given
> >>>>>> point using a range scan of the index.
> >>>>>>
> >>>>>> So, the above query could be rewritten to something like
> >>>>>>
> >>>>>> select *
> >>>>>> from restaurants as r
> >>>>>> where (r.h between 123456 and 123599
> >>>>>>    or r.h between 256789 and 259887)
> >>>>>>  and st_distance_internal(point(-122.4194, 37.7749),
> r.coordinates) <= 1;
> >>>>>>
> >>>>>> The range predicates on r.h quickly eliminate 99.9% of the rows in
> the
> >>>>>> database, and the call to st_distance_internal eliminates the
> >>>>>> remaining false positives.
> >>>>>>
> >>>>>> That rewrite can be done using a logical rewrite rule, and the
> >>>>>> resulting query will be faster on just about any database, but
> >>>>>> especially one with key-sorted tables (like Phoenix/HBase) or
> >>>>>> range-partitioned tables. The database does not need to have a
> >>>>>> dedicated "spatial index" data structure.
> >>>>>>
> >>>>>> Julian
> >>>>>>
> >>>>>> [1] http://www.opengeospatial.org/standards/sfs
> >>>>>>
> >>>>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Regards,
> >>>>>
> >>>>> Atri
> >>>>> l'apprenant
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>>
> >>> Atri
> >>> l'apprenant
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Atri
> > l'apprenant
>
>

Re: Spatial indexes

Posted by Julian Hyde <jh...@apache.org>.
PS The author of the paper referenced in that StackOverflow article[4], John Skilling, was my applied mathematics supervisor in Cambridge. It’s a small world. 

[4] http://ratml.org/misc/hilbert_curve.pdf <http://ratml.org/misc/hilbert_curve.pdf>


> On Jun 28, 2017, at 11:42 AM, Atri Sharma <at...@gmail.com> wrote:
> 
> How do you propose we start here? I can start writing a spec based on
> the implementation, based on how Postgres and others implement them.
> 
> On Thu, Jun 29, 2017 at 12:08 AM, Julian Hyde <jh...@apache.org> wrote:
>> No, anyone brave enough to be an early adopter would already be on this list.
>> 
>> James (Phoenix), Jacques (Dremio), Fabian (Flink), Jinfeng/Aman (Drill) can be proxies for their their communities. And others who are building interesting stuff I don’t know about.
>> 
>> Julian
>> 
>>> On Jun 28, 2017, at 10:29 AM, Atri Sharma <at...@gmail.com> wrote:
>>> 
>>> Should we start a thread with potential users, eg Phoenix community?
>>> 
>>> On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> I’d also like to hear from potential users of this feature. They could try this functionality as it becomes available, and help us prioritize features.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <at...@gmail.com> wrote:
>>>>> 
>>>>> And I have created the JIRA:
>>>>> 
>>>>> https://issues.apache.org/jira/browse/CALCITE-1861
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <jh...@apache.org> wrote:
>>>>>> Is anyone looking for a neat project in Calcite that would have a big
>>>>>> impact? I'm thinking that we could add support for spatial indexes to
>>>>>> Calcite in such a way that downstream projects such as Phoenix and
>>>>>> Flink could easily benefit from it.
>>>>>> 
>>>>>> GIS (Geographic Information Systems, aka Spatial database) is really
>>>>>> useful functionality to have in your database. To find restaurants
>>>>>> less than 1 km from downtown San Francisco, you could run
>>>>>> 
>>>>>> select *
>>>>>> from restaurants as r
>>>>>> where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>>>> 
>>>>>> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
>>>>>> and Microsoft SQL Server; and OpenGIS has standardized SQL
>>>>>> extensions[1].
>>>>>> 
>>>>>> Now, the SQL-GIS standard is rather large, and involves implementing
>>>>>> lots of data types and scalar functions. We could get to that
>>>>>> eventually. But I contend that many, many applications would be
>>>>>> satisfied by points and distances (like the query above) and a spatial
>>>>>> index to make them run quickly. And I believe that we can add spatial
>>>>>> index support to Calcite using a logical rewrite rule.
>>>>>> 
>>>>>> Rewriting spatial queries to indexes on space-filling curves is a
>>>>>> well-established technique [2].
>>>>>> 
>>>>>> Suppose that the restaurants table, above, had columns latitude and
>>>>>> longitude and a computed numeric column h = hilbert(latitude,
>>>>>> longitude). Hilbert curves are space-filling curves such that if two
>>>>>> points are close in space then their h values will be close. So, if
>>>>>> there is an index on h, we can find all restaurants close to a given
>>>>>> point using a range scan of the index.
>>>>>> 
>>>>>> So, the above query could be rewritten to something like
>>>>>> 
>>>>>> select *
>>>>>> from restaurants as r
>>>>>> where (r.h between 123456 and 123599
>>>>>>    or r.h between 256789 and 259887)
>>>>>>  and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>>>> 
>>>>>> The range predicates on r.h quickly eliminate 99.9% of the rows in the
>>>>>> database, and the call to st_distance_internal eliminates the
>>>>>> remaining false positives.
>>>>>> 
>>>>>> That rewrite can be done using a logical rewrite rule, and the
>>>>>> resulting query will be faster on just about any database, but
>>>>>> especially one with key-sorted tables (like Phoenix/HBase) or
>>>>>> range-partitioned tables. The database does not need to have a
>>>>>> dedicated "spatial index" data structure.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> [1] http://www.opengeospatial.org/standards/sfs
>>>>>> 
>>>>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> 
>>>>> Atri
>>>>> l'apprenant
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> 
>>> Atri
>>> l'apprenant
>> 
> 
> 
> 
> -- 
> Regards,
> 
> Atri
> l'apprenant


Re: Spatial indexes

Posted by Atri Sharma <at...@gmail.com>.
How do you propose we start here? I can start writing a spec based on
the implementation, based on how Postgres and others implement them.

On Thu, Jun 29, 2017 at 12:08 AM, Julian Hyde <jh...@apache.org> wrote:
> No, anyone brave enough to be an early adopter would already be on this list.
>
> James (Phoenix), Jacques (Dremio), Fabian (Flink), Jinfeng/Aman (Drill) can be proxies for their their communities. And others who are building interesting stuff I don’t know about.
>
> Julian
>
>> On Jun 28, 2017, at 10:29 AM, Atri Sharma <at...@gmail.com> wrote:
>>
>> Should we start a thread with potential users, eg Phoenix community?
>>
>> On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <jh...@apache.org> wrote:
>>> I’d also like to hear from potential users of this feature. They could try this functionality as it becomes available, and help us prioritize features.
>>>
>>> Julian
>>>
>>>
>>>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <at...@gmail.com> wrote:
>>>>
>>>> And I have created the JIRA:
>>>>
>>>> https://issues.apache.org/jira/browse/CALCITE-1861
>>>>
>>>>
>>>>
>>>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <jh...@apache.org> wrote:
>>>>> Is anyone looking for a neat project in Calcite that would have a big
>>>>> impact? I'm thinking that we could add support for spatial indexes to
>>>>> Calcite in such a way that downstream projects such as Phoenix and
>>>>> Flink could easily benefit from it.
>>>>>
>>>>> GIS (Geographic Information Systems, aka Spatial database) is really
>>>>> useful functionality to have in your database. To find restaurants
>>>>> less than 1 km from downtown San Francisco, you could run
>>>>>
>>>>> select *
>>>>> from restaurants as r
>>>>> where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>>>
>>>>> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
>>>>> and Microsoft SQL Server; and OpenGIS has standardized SQL
>>>>> extensions[1].
>>>>>
>>>>> Now, the SQL-GIS standard is rather large, and involves implementing
>>>>> lots of data types and scalar functions. We could get to that
>>>>> eventually. But I contend that many, many applications would be
>>>>> satisfied by points and distances (like the query above) and a spatial
>>>>> index to make them run quickly. And I believe that we can add spatial
>>>>> index support to Calcite using a logical rewrite rule.
>>>>>
>>>>> Rewriting spatial queries to indexes on space-filling curves is a
>>>>> well-established technique [2].
>>>>>
>>>>> Suppose that the restaurants table, above, had columns latitude and
>>>>> longitude and a computed numeric column h = hilbert(latitude,
>>>>> longitude). Hilbert curves are space-filling curves such that if two
>>>>> points are close in space then their h values will be close. So, if
>>>>> there is an index on h, we can find all restaurants close to a given
>>>>> point using a range scan of the index.
>>>>>
>>>>> So, the above query could be rewritten to something like
>>>>>
>>>>> select *
>>>>> from restaurants as r
>>>>> where (r.h between 123456 and 123599
>>>>>     or r.h between 256789 and 259887)
>>>>>   and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>>>
>>>>> The range predicates on r.h quickly eliminate 99.9% of the rows in the
>>>>> database, and the call to st_distance_internal eliminates the
>>>>> remaining false positives.
>>>>>
>>>>> That rewrite can be done using a logical rewrite rule, and the
>>>>> resulting query will be faster on just about any database, but
>>>>> especially one with key-sorted tables (like Phoenix/HBase) or
>>>>> range-partitioned tables. The database does not need to have a
>>>>> dedicated "spatial index" data structure.
>>>>>
>>>>> Julian
>>>>>
>>>>> [1] http://www.opengeospatial.org/standards/sfs
>>>>>
>>>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Atri
>>>> l'apprenant
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Atri
>> l'apprenant
>



-- 
Regards,

Atri
l'apprenant

Re: Spatial indexes

Posted by Julian Hyde <jh...@apache.org>.
No, anyone brave enough to be an early adopter would already be on this list.

James (Phoenix), Jacques (Dremio), Fabian (Flink), Jinfeng/Aman (Drill) can be proxies for their their communities. And others who are building interesting stuff I don’t know about.

Julian

> On Jun 28, 2017, at 10:29 AM, Atri Sharma <at...@gmail.com> wrote:
> 
> Should we start a thread with potential users, eg Phoenix community?
> 
> On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <jh...@apache.org> wrote:
>> I’d also like to hear from potential users of this feature. They could try this functionality as it becomes available, and help us prioritize features.
>> 
>> Julian
>> 
>> 
>>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <at...@gmail.com> wrote:
>>> 
>>> And I have created the JIRA:
>>> 
>>> https://issues.apache.org/jira/browse/CALCITE-1861
>>> 
>>> 
>>> 
>>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <jh...@apache.org> wrote:
>>>> Is anyone looking for a neat project in Calcite that would have a big
>>>> impact? I'm thinking that we could add support for spatial indexes to
>>>> Calcite in such a way that downstream projects such as Phoenix and
>>>> Flink could easily benefit from it.
>>>> 
>>>> GIS (Geographic Information Systems, aka Spatial database) is really
>>>> useful functionality to have in your database. To find restaurants
>>>> less than 1 km from downtown San Francisco, you could run
>>>> 
>>>> select *
>>>> from restaurants as r
>>>> where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>> 
>>>> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
>>>> and Microsoft SQL Server; and OpenGIS has standardized SQL
>>>> extensions[1].
>>>> 
>>>> Now, the SQL-GIS standard is rather large, and involves implementing
>>>> lots of data types and scalar functions. We could get to that
>>>> eventually. But I contend that many, many applications would be
>>>> satisfied by points and distances (like the query above) and a spatial
>>>> index to make them run quickly. And I believe that we can add spatial
>>>> index support to Calcite using a logical rewrite rule.
>>>> 
>>>> Rewriting spatial queries to indexes on space-filling curves is a
>>>> well-established technique [2].
>>>> 
>>>> Suppose that the restaurants table, above, had columns latitude and
>>>> longitude and a computed numeric column h = hilbert(latitude,
>>>> longitude). Hilbert curves are space-filling curves such that if two
>>>> points are close in space then their h values will be close. So, if
>>>> there is an index on h, we can find all restaurants close to a given
>>>> point using a range scan of the index.
>>>> 
>>>> So, the above query could be rewritten to something like
>>>> 
>>>> select *
>>>> from restaurants as r
>>>> where (r.h between 123456 and 123599
>>>>     or r.h between 256789 and 259887)
>>>>   and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>> 
>>>> The range predicates on r.h quickly eliminate 99.9% of the rows in the
>>>> database, and the call to st_distance_internal eliminates the
>>>> remaining false positives.
>>>> 
>>>> That rewrite can be done using a logical rewrite rule, and the
>>>> resulting query will be faster on just about any database, but
>>>> especially one with key-sorted tables (like Phoenix/HBase) or
>>>> range-partitioned tables. The database does not need to have a
>>>> dedicated "spatial index" data structure.
>>>> 
>>>> Julian
>>>> 
>>>> [1] http://www.opengeospatial.org/standards/sfs
>>>> 
>>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> 
>>> Atri
>>> l'apprenant
>> 
> 
> 
> 
> -- 
> Regards,
> 
> Atri
> l'apprenant


Re: Spatial indexes

Posted by Atri Sharma <at...@gmail.com>.
Should we start a thread with potential users, eg Phoenix community?

On Wed, Jun 28, 2017 at 10:55 PM, Julian Hyde <jh...@apache.org> wrote:
> I’d also like to hear from potential users of this feature. They could try this functionality as it becomes available, and help us prioritize features.
>
> Julian
>
>
>> On Jun 28, 2017, at 9:10 AM, Atri Sharma <at...@gmail.com> wrote:
>>
>> And I have created the JIRA:
>>
>> https://issues.apache.org/jira/browse/CALCITE-1861
>>
>>
>>
>> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <jh...@apache.org> wrote:
>>> Is anyone looking for a neat project in Calcite that would have a big
>>> impact? I'm thinking that we could add support for spatial indexes to
>>> Calcite in such a way that downstream projects such as Phoenix and
>>> Flink could easily benefit from it.
>>>
>>> GIS (Geographic Information Systems, aka Spatial database) is really
>>> useful functionality to have in your database. To find restaurants
>>> less than 1 km from downtown San Francisco, you could run
>>>
>>>  select *
>>>  from restaurants as r
>>>  where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>
>>> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
>>> and Microsoft SQL Server; and OpenGIS has standardized SQL
>>> extensions[1].
>>>
>>> Now, the SQL-GIS standard is rather large, and involves implementing
>>> lots of data types and scalar functions. We could get to that
>>> eventually. But I contend that many, many applications would be
>>> satisfied by points and distances (like the query above) and a spatial
>>> index to make them run quickly. And I believe that we can add spatial
>>> index support to Calcite using a logical rewrite rule.
>>>
>>> Rewriting spatial queries to indexes on space-filling curves is a
>>> well-established technique [2].
>>>
>>> Suppose that the restaurants table, above, had columns latitude and
>>> longitude and a computed numeric column h = hilbert(latitude,
>>> longitude). Hilbert curves are space-filling curves such that if two
>>> points are close in space then their h values will be close. So, if
>>> there is an index on h, we can find all restaurants close to a given
>>> point using a range scan of the index.
>>>
>>> So, the above query could be rewritten to something like
>>>
>>>  select *
>>>  from restaurants as r
>>>  where (r.h between 123456 and 123599
>>>      or r.h between 256789 and 259887)
>>>    and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;
>>>
>>> The range predicates on r.h quickly eliminate 99.9% of the rows in the
>>> database, and the call to st_distance_internal eliminates the
>>> remaining false positives.
>>>
>>> That rewrite can be done using a logical rewrite rule, and the
>>> resulting query will be faster on just about any database, but
>>> especially one with key-sorted tables (like Phoenix/HBase) or
>>> range-partitioned tables. The database does not need to have a
>>> dedicated "spatial index" data structure.
>>>
>>> Julian
>>>
>>> [1] http://www.opengeospatial.org/standards/sfs
>>>
>>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
>>
>>
>>
>> --
>> Regards,
>>
>> Atri
>> l'apprenant
>



-- 
Regards,

Atri
l'apprenant

Re: Spatial indexes

Posted by Julian Hyde <jh...@apache.org>.
I’d also like to hear from potential users of this feature. They could try this functionality as it becomes available, and help us prioritize features.

Julian


> On Jun 28, 2017, at 9:10 AM, Atri Sharma <at...@gmail.com> wrote:
> 
> And I have created the JIRA:
> 
> https://issues.apache.org/jira/browse/CALCITE-1861
> 
> 
> 
> On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <jh...@apache.org> wrote:
>> Is anyone looking for a neat project in Calcite that would have a big
>> impact? I'm thinking that we could add support for spatial indexes to
>> Calcite in such a way that downstream projects such as Phoenix and
>> Flink could easily benefit from it.
>> 
>> GIS (Geographic Information Systems, aka Spatial database) is really
>> useful functionality to have in your database. To find restaurants
>> less than 1 km from downtown San Francisco, you could run
>> 
>>  select *
>>  from restaurants as r
>>  where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>> 
>> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
>> and Microsoft SQL Server; and OpenGIS has standardized SQL
>> extensions[1].
>> 
>> Now, the SQL-GIS standard is rather large, and involves implementing
>> lots of data types and scalar functions. We could get to that
>> eventually. But I contend that many, many applications would be
>> satisfied by points and distances (like the query above) and a spatial
>> index to make them run quickly. And I believe that we can add spatial
>> index support to Calcite using a logical rewrite rule.
>> 
>> Rewriting spatial queries to indexes on space-filling curves is a
>> well-established technique [2].
>> 
>> Suppose that the restaurants table, above, had columns latitude and
>> longitude and a computed numeric column h = hilbert(latitude,
>> longitude). Hilbert curves are space-filling curves such that if two
>> points are close in space then their h values will be close. So, if
>> there is an index on h, we can find all restaurants close to a given
>> point using a range scan of the index.
>> 
>> So, the above query could be rewritten to something like
>> 
>>  select *
>>  from restaurants as r
>>  where (r.h between 123456 and 123599
>>      or r.h between 256789 and 259887)
>>    and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;
>> 
>> The range predicates on r.h quickly eliminate 99.9% of the rows in the
>> database, and the call to st_distance_internal eliminates the
>> remaining false positives.
>> 
>> That rewrite can be done using a logical rewrite rule, and the
>> resulting query will be faster on just about any database, but
>> especially one with key-sorted tables (like Phoenix/HBase) or
>> range-partitioned tables. The database does not need to have a
>> dedicated "spatial index" data structure.
>> 
>> Julian
>> 
>> [1] http://www.opengeospatial.org/standards/sfs
>> 
>> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf
> 
> 
> 
> -- 
> Regards,
> 
> Atri
> l'apprenant


Re: Spatial indexes

Posted by Atri Sharma <at...@gmail.com>.
And I have created the JIRA:

https://issues.apache.org/jira/browse/CALCITE-1861



On Wed, Jun 28, 2017 at 7:02 AM, Julian Hyde <jh...@apache.org> wrote:
> Is anyone looking for a neat project in Calcite that would have a big
> impact? I'm thinking that we could add support for spatial indexes to
> Calcite in such a way that downstream projects such as Phoenix and
> Flink could easily benefit from it.
>
> GIS (Geographic Information Systems, aka Spatial database) is really
> useful functionality to have in your database. To find restaurants
> less than 1 km from downtown San Francisco, you could run
>
>   select *
>   from restaurants as r
>   where st_distance(point(-122.4194, 37.7749), r.coordinates) <= 1;
>
> There are mature SQL implementations of GIS in PostGIS, Oracle Spatial
> and Microsoft SQL Server; and OpenGIS has standardized SQL
> extensions[1].
>
> Now, the SQL-GIS standard is rather large, and involves implementing
> lots of data types and scalar functions. We could get to that
> eventually. But I contend that many, many applications would be
> satisfied by points and distances (like the query above) and a spatial
> index to make them run quickly. And I believe that we can add spatial
> index support to Calcite using a logical rewrite rule.
>
> Rewriting spatial queries to indexes on space-filling curves is a
> well-established technique [2].
>
> Suppose that the restaurants table, above, had columns latitude and
> longitude and a computed numeric column h = hilbert(latitude,
> longitude). Hilbert curves are space-filling curves such that if two
> points are close in space then their h values will be close. So, if
> there is an index on h, we can find all restaurants close to a given
> point using a range scan of the index.
>
> So, the above query could be rewritten to something like
>
>   select *
>   from restaurants as r
>   where (r.h between 123456 and 123599
>       or r.h between 256789 and 259887)
>     and st_distance_internal(point(-122.4194, 37.7749), r.coordinates) <= 1;
>
> The range predicates on r.h quickly eliminate 99.9% of the rows in the
> database, and the call to st_distance_internal eliminates the
> remaining false positives.
>
> That rewrite can be done using a logical rewrite rule, and the
> resulting query will be faster on just about any database, but
> especially one with key-sorted tables (like Phoenix/HBase) or
> range-partitioned tables. The database does not need to have a
> dedicated "spatial index" data structure.
>
> Julian
>
> [1] http://www.opengeospatial.org/standards/sfs
>
> [2] http://math.bme.hu/~gnagy/mmsz/eloadasok/BisztrayDenes2014.pdf



-- 
Regards,

Atri
l'apprenant