You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Tin Vu <tv...@ucr.edu> on 2020/08/24 17:55:16 UTC

Substitute SQL++ sub-query

Hi all,

I'm working on a project which aims to improve performance of spatial join
query in AsterixDB.

The problem can be described as follows:

Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom) with
geom is a spatial data type (point, rectangle, polygon). This is a join
query to list all intersected pairs of these two datasets:
```

*SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
spatial_intersect(ps.geom, ls.geom);*
*```*

I rewrote this query in a more complicated SQL++ query(*) but has a better
performance.

I took a look at FuzzyJoinRule
<https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java>
implementation
<https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java>
(a
rule for AQL+) in AsterixDB and I'm trying to make a similar implementation
called SpatalJoinRule
<https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java>
(a
rule for SQL++). Now I'm stuck with some questions and I would
really appreciate if you can help me to answer them:

1. What is the syntax to make placeholders for input operators and
variables in a SQL++ template? In the FuzzyJoinRule
<https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java>
implementation
<https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java>,
I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
left input variable. Is it similar in SQL++?

2. Does the query template for substitution must be a (full) compilable
query? Or it can be just a part of a full query?

3. Can the substitute query contain UDF functions?

Thanks,

Tin

(*) Rewitten query: in short, it partitioned the datasets by a grid (red
statement) then we compute the intersection pairs for each cell of the
grid.
- 'mytileids' UDF function returns the corresponding cell ID of a spatial
object.
- 'referencepointtileid' UDF function is used to ignore the duplications in
the final result.
- (-180.0,83.0,180.0,90.0) is space MBR.
- (100,100) is grid size (rows x columns).

```





*SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
lakesPartitioned.tile as tile2,
test#referencepointtileid(parksPartitioned.geom,
lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
ParkSet as ps) AS p UNNEST p.tileids AS tile) AS parksPartitioned,(SELECT
tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
parksPartitioned.tile = lakesPartitioned.tileAND
spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS result
WHERE result.tile1 = result.ref_tile;*
*```*

Re: Substitute SQL++ sub-query

Posted by Tin Vu <tv...@ucr.edu>.
Thanks Mike and Taewoo for your comments. I'll give it a try.

Tin

On Mon, Aug 24, 2020 at 12:17 PM Taewoo Kim <wa...@gmail.com> wrote:

> Agreed with Mike regarding a native approach to spatial joins. :-) I just
> tried to point to a location for FuzzyJoinRule.
>
> Best,
> Taewoo
>
>
> On Mon, Aug 24, 2020 at 12:10 PM Mike Carey <dt...@gmail.com> wrote:
>
> > Tin,
> >
> > 1. There is no support for SQL++ in templates. Just AQL.  And AQL is now
> > deprecated, in a mode where it is only supported to keep that one fuzzy
> > join rule alive via AQL+.  The most recent public release of AsterixDB
> > is the last one where AQL+ will work - and then AQL will completely
> > disappear from the code base (as will the scripting framework) going
> > forward, as it is totally unsupported.  (Its inventor left with his PhD
> > in 2011.  Taewoo Kim, below, did a really nice refactoring revision of
> > the framework more recently, to remove lots of replicated code between
> > AQL and AQL+, but he is also no longer focused on AsterixDB work - his
> > post-PhD day job seems to be a distraction. :-))
> >
> > 2. More information about the scripting framework and its use for the
> > fuzzy join rule is in Chapter 5 of the aforementioned PhD student's
> > thesis:
> > http://asterix.ics.uci.edu//thesis/Rares_Vernica_PhD_thesis_2011.pdf.
> >
> > 3. I suspect it would be significantly messy work to create a SQL++
> > alternative to AQL as the basis for the template framework...  If you
> > want to explore a template-based approach, I'd stick with AQL+ and just
> > be aware that it's kind of a "dead end" approach long-term in the
> > AsterixDB code base - but could be explored as a research branch using
> > 0.9.5 as the branching point.
> >
> > I think a native approach to spatial joins would be better than a
> > template-based approach - along the lines of what Preston Carmon et al
> > are doing for interval joins up at Walla Walla U in Washington state.
> >
> > Cheers,
> >
> > Mike
> >
> > On 8/24/20 11:11 AM, Taewoo Kim wrote:
> > > Hi Tin,
> > >
> > > Here is a paper that explains FuzzyJoinRule in general. Hope this
> helps.
> > >
> > >
> >
> https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG=
> > >
> > > 1. What is the syntax to make placeholders for input operators and
> > > variables in a SQL++ template? In the FuzzyJoinRule
> > > <
> > >
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > > implementation
> > > <
> > >
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >> ,
> > > I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0
> for
> > > left input variable. Is it similar in SQL++?
> > > You need to implement an extended version of SQL++ (maybe SQL+++) that
> > > recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+.
> > >
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj
> > >
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java
> > >
> > > 2. Does the query template for substitution must be a (full) compilable
> > > query? Or it can be just a part of a full query?
> > > It has to be a complete query that is executable in SQL+++ (if you
> create
> > > one).
> > >
> > > 3. Can the substitute query contain UDF functions?
> > > Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an
> > extended
> > > version.
> > >
> > > Best,
> > > Taewoo
> > >
> > >
> > > On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <tv...@ucr.edu> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'm working on a project which aims to improve performance of spatial
> > join
> > >> query in AsterixDB.
> > >>
> > >> The problem can be described as follows:
> > >>
> > >> Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom)
> > with
> > >> geom is a spatial data type (point, rectangle, polygon). This is a
> join
> > >> query to list all intersected pairs of these two datasets:
> > >> ```
> > >>
> > >> *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
> > >> spatial_intersect(ps.geom, ls.geom);*
> > >> *```*
> > >>
> > >> I rewrote this query in a more complicated SQL++ query(*) but has a
> > better
> > >> performance.
> > >>
> > >> I took a look at FuzzyJoinRule
> > >> <
> > >>
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >> implementation
> > >> <
> > >>
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >> (a
> > >> rule for AQL+) in AsterixDB and I'm trying to make a similar
> > implementation
> > >> called SpatalJoinRule
> > >> <
> > >>
> >
> https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java
> > >> (a
> > >> rule for SQL++). Now I'm stuck with some questions and I would
> > >> really appreciate if you can help me to answer them:
> > >>
> > >> 1. What is the syntax to make placeholders for input operators and
> > >> variables in a SQL++ template? In the FuzzyJoinRule
> > >> <
> > >>
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >> implementation
> > >> <
> > >>
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > >>> ,
> > >> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0
> for
> > >> left input variable. Is it similar in SQL++?
> > >>
> > >> 2. Does the query template for substitution must be a (full)
> compilable
> > >> query? Or it can be just a part of a full query?
> > >>
> > >> 3. Can the substitute query contain UDF functions?
> > >>
> > >> Thanks,
> > >>
> > >> Tin
> > >>
> > >> (*) Rewitten query: in short, it partitioned the datasets by a grid
> (red
> > >> statement) then we compute the intersection pairs for each cell of the
> > >> grid.
> > >> - 'mytileids' UDF function returns the corresponding cell ID of a
> > spatial
> > >> object.
> > >> - 'referencepointtileid' UDF function is used to ignore the
> > duplications in
> > >> the final result.
> > >> - (-180.0,83.0,180.0,90.0) is space MBR.
> > >> - (100,100) is grid size (rows x columns).
> > >>
> > >> ```
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
> > >> lakesPartitioned.tile as tile2,
> > >> test#referencepointtileid(parksPartitioned.geom,
> > >> lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
> > >> ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM
> > >> (SELECT
> > >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps
> > FROM
> > >> ParkSet as ps) AS p UNNEST p.tileids AS tile) AS
> > parksPartitioned,(SELECT
> > >> tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
> > >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps
> > FROM
> > >> LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
> > >> parksPartitioned.tile = lakesPartitioned.tileAND
> > >> spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS
> > result
> > >> WHERE result.tile1 = result.ref_tile;*
> > >> *```*
> > >>
> >
>

Re: Substitute SQL++ sub-query

Posted by Taewoo Kim <wa...@gmail.com>.
Agreed with Mike regarding a native approach to spatial joins. :-) I just
tried to point to a location for FuzzyJoinRule.

Best,
Taewoo


On Mon, Aug 24, 2020 at 12:10 PM Mike Carey <dt...@gmail.com> wrote:

> Tin,
>
> 1. There is no support for SQL++ in templates. Just AQL.  And AQL is now
> deprecated, in a mode where it is only supported to keep that one fuzzy
> join rule alive via AQL+.  The most recent public release of AsterixDB
> is the last one where AQL+ will work - and then AQL will completely
> disappear from the code base (as will the scripting framework) going
> forward, as it is totally unsupported.  (Its inventor left with his PhD
> in 2011.  Taewoo Kim, below, did a really nice refactoring revision of
> the framework more recently, to remove lots of replicated code between
> AQL and AQL+, but he is also no longer focused on AsterixDB work - his
> post-PhD day job seems to be a distraction. :-))
>
> 2. More information about the scripting framework and its use for the
> fuzzy join rule is in Chapter 5 of the aforementioned PhD student's
> thesis:
> http://asterix.ics.uci.edu//thesis/Rares_Vernica_PhD_thesis_2011.pdf.
>
> 3. I suspect it would be significantly messy work to create a SQL++
> alternative to AQL as the basis for the template framework...  If you
> want to explore a template-based approach, I'd stick with AQL+ and just
> be aware that it's kind of a "dead end" approach long-term in the
> AsterixDB code base - but could be explored as a research branch using
> 0.9.5 as the branching point.
>
> I think a native approach to spatial joins would be better than a
> template-based approach - along the lines of what Preston Carmon et al
> are doing for interval joins up at Walla Walla U in Washington state.
>
> Cheers,
>
> Mike
>
> On 8/24/20 11:11 AM, Taewoo Kim wrote:
> > Hi Tin,
> >
> > Here is a paper that explains FuzzyJoinRule in general. Hope this helps.
> >
> >
> https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG=
> >
> > 1. What is the syntax to make placeholders for input operators and
> > variables in a SQL++ template? In the FuzzyJoinRule
> > <
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> > implementation
> > <
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >> ,
> > I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
> > left input variable. Is it similar in SQL++?
> > You need to implement an extended version of SQL++ (maybe SQL+++) that
> > recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+.
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java
> >
> > 2. Does the query template for substitution must be a (full) compilable
> > query? Or it can be just a part of a full query?
> > It has to be a complete query that is executable in SQL+++ (if you create
> > one).
> >
> > 3. Can the substitute query contain UDF functions?
> > Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an
> extended
> > version.
> >
> > Best,
> > Taewoo
> >
> >
> > On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <tv...@ucr.edu> wrote:
> >
> >> Hi all,
> >>
> >> I'm working on a project which aims to improve performance of spatial
> join
> >> query in AsterixDB.
> >>
> >> The problem can be described as follows:
> >>
> >> Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom)
> with
> >> geom is a spatial data type (point, rectangle, polygon). This is a join
> >> query to list all intersected pairs of these two datasets:
> >> ```
> >>
> >> *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
> >> spatial_intersect(ps.geom, ls.geom);*
> >> *```*
> >>
> >> I rewrote this query in a more complicated SQL++ query(*) but has a
> better
> >> performance.
> >>
> >> I took a look at FuzzyJoinRule
> >> <
> >>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >> implementation
> >> <
> >>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >> (a
> >> rule for AQL+) in AsterixDB and I'm trying to make a similar
> implementation
> >> called SpatalJoinRule
> >> <
> >>
> https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java
> >> (a
> >> rule for SQL++). Now I'm stuck with some questions and I would
> >> really appreciate if you can help me to answer them:
> >>
> >> 1. What is the syntax to make placeholders for input operators and
> >> variables in a SQL++ template? In the FuzzyJoinRule
> >> <
> >>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >> implementation
> >> <
> >>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >>> ,
> >> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
> >> left input variable. Is it similar in SQL++?
> >>
> >> 2. Does the query template for substitution must be a (full) compilable
> >> query? Or it can be just a part of a full query?
> >>
> >> 3. Can the substitute query contain UDF functions?
> >>
> >> Thanks,
> >>
> >> Tin
> >>
> >> (*) Rewitten query: in short, it partitioned the datasets by a grid (red
> >> statement) then we compute the intersection pairs for each cell of the
> >> grid.
> >> - 'mytileids' UDF function returns the corresponding cell ID of a
> spatial
> >> object.
> >> - 'referencepointtileid' UDF function is used to ignore the
> duplications in
> >> the final result.
> >> - (-180.0,83.0,180.0,90.0) is space MBR.
> >> - (100,100) is grid size (rows x columns).
> >>
> >> ```
> >>
> >>
> >>
> >>
> >>
> >> *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
> >> lakesPartitioned.tile as tile2,
> >> test#referencepointtileid(parksPartitioned.geom,
> >> lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
> >> ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM
> >> (SELECT
> >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps
> FROM
> >> ParkSet as ps) AS p UNNEST p.tileids AS tile) AS
> parksPartitioned,(SELECT
> >> tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
> >> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps
> FROM
> >> LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
> >> parksPartitioned.tile = lakesPartitioned.tileAND
> >> spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS
> result
> >> WHERE result.tile1 = result.ref_tile;*
> >> *```*
> >>
>

Re: Substitute SQL++ sub-query

Posted by Mike Carey <dt...@gmail.com>.
Tin,

1. There is no support for SQL++ in templates. Just AQL.  And AQL is now 
deprecated, in a mode where it is only supported to keep that one fuzzy 
join rule alive via AQL+.  The most recent public release of AsterixDB 
is the last one where AQL+ will work - and then AQL will completely 
disappear from the code base (as will the scripting framework) going 
forward, as it is totally unsupported.  (Its inventor left with his PhD 
in 2011.  Taewoo Kim, below, did a really nice refactoring revision of 
the framework more recently, to remove lots of replicated code between 
AQL and AQL+, but he is also no longer focused on AsterixDB work - his 
post-PhD day job seems to be a distraction. :-))

2. More information about the scripting framework and its use for the 
fuzzy join rule is in Chapter 5 of the aforementioned PhD student's 
thesis: 
http://asterix.ics.uci.edu//thesis/Rares_Vernica_PhD_thesis_2011.pdf.

3. I suspect it would be significantly messy work to create a SQL++ 
alternative to AQL as the basis for the template framework...  If you 
want to explore a template-based approach, I'd stick with AQL+ and just 
be aware that it's kind of a "dead end" approach long-term in the 
AsterixDB code base - but could be explored as a research branch using 
0.9.5 as the branching point.

I think a native approach to spatial joins would be better than a 
template-based approach - along the lines of what Preston Carmon et al 
are doing for interval joins up at Walla Walla U in Washington state.

Cheers,

Mike

On 8/24/20 11:11 AM, Taewoo Kim wrote:
> Hi Tin,
>
> Here is a paper that explains FuzzyJoinRule in general. Hope this helps.
>
> https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG=
>
> 1. What is the syntax to make placeholders for input operators and
> variables in a SQL++ template? In the FuzzyJoinRule
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> implementation
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
>> ,
> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
> left input variable. Is it similar in SQL++?
> You need to implement an extended version of SQL++ (maybe SQL+++) that
> recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+.
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java
>
> 2. Does the query template for substitution must be a (full) compilable
> query? Or it can be just a part of a full query?
> It has to be a complete query that is executable in SQL+++ (if you create
> one).
>
> 3. Can the substitute query contain UDF functions?
> Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an extended
> version.
>
> Best,
> Taewoo
>
>
> On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <tv...@ucr.edu> wrote:
>
>> Hi all,
>>
>> I'm working on a project which aims to improve performance of spatial join
>> query in AsterixDB.
>>
>> The problem can be described as follows:
>>
>> Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom) with
>> geom is a spatial data type (point, rectangle, polygon). This is a join
>> query to list all intersected pairs of these two datasets:
>> ```
>>
>> *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
>> spatial_intersect(ps.geom, ls.geom);*
>> *```*
>>
>> I rewrote this query in a more complicated SQL++ query(*) but has a better
>> performance.
>>
>> I took a look at FuzzyJoinRule
>> <
>> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
>> implementation
>> <
>> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
>> (a
>> rule for AQL+) in AsterixDB and I'm trying to make a similar implementation
>> called SpatalJoinRule
>> <
>> https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java
>> (a
>> rule for SQL++). Now I'm stuck with some questions and I would
>> really appreciate if you can help me to answer them:
>>
>> 1. What is the syntax to make placeholders for input operators and
>> variables in a SQL++ template? In the FuzzyJoinRule
>> <
>> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
>> implementation
>> <
>> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
>>> ,
>> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
>> left input variable. Is it similar in SQL++?
>>
>> 2. Does the query template for substitution must be a (full) compilable
>> query? Or it can be just a part of a full query?
>>
>> 3. Can the substitute query contain UDF functions?
>>
>> Thanks,
>>
>> Tin
>>
>> (*) Rewitten query: in short, it partitioned the datasets by a grid (red
>> statement) then we compute the intersection pairs for each cell of the
>> grid.
>> - 'mytileids' UDF function returns the corresponding cell ID of a spatial
>> object.
>> - 'referencepointtileid' UDF function is used to ignore the duplications in
>> the final result.
>> - (-180.0,83.0,180.0,90.0) is space MBR.
>> - (100,100) is grid size (rows x columns).
>>
>> ```
>>
>>
>>
>>
>>
>> *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
>> lakesPartitioned.tile as tile2,
>> test#referencepointtileid(parksPartitioned.geom,
>> lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
>> ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM
>> (SELECT
>> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
>> ParkSet as ps) AS p UNNEST p.tileids AS tile) AS parksPartitioned,(SELECT
>> tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
>> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
>> LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
>> parksPartitioned.tile = lakesPartitioned.tileAND
>> spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS result
>> WHERE result.tile1 = result.ref_tile;*
>> *```*
>>

Re: Substitute SQL++ sub-query

Posted by Taewoo Kim <wa...@gmail.com>.
Hi Tin,

Here is a paper that explains FuzzyJoinRule in general. Hope this helps.

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Similarity+query+support+in+big+data+management+systems&btnG=

1. What is the syntax to make placeholders for input operators and
variables in a SQL++ template? In the FuzzyJoinRule
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
>
implementation
<
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
>,
I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
left input variable. Is it similar in SQL++?
You need to implement an extended version of SQL++ (maybe SQL+++) that
recognizes ##LEFT_0, etc. FuzzyJoinRule uses AQL+.
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/javacc/AQLPlusExtension.jj
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/translator/AqlPlusExpressionToPlanTranslator.java

2. Does the query template for substitution must be a (full) compilable
query? Or it can be just a part of a full query?
It has to be a complete query that is executable in SQL+++ (if you create
one).

3. Can the substitute query contain UDF functions?
Whatever support that SQL++ has, SQL+++ inherits since SQL++ is an extended
version.

Best,
Taewoo


On Mon, Aug 24, 2020 at 10:55 AM Tin Vu <tv...@ucr.edu> wrote:

> Hi all,
>
> I'm working on a project which aims to improve performance of spatial join
> query in AsterixDB.
>
> The problem can be described as follows:
>
> Assume that we have 2 dataset ParkSet(id, geom) and LakeSet(id, geom) with
> geom is a spatial data type (point, rectangle, polygon). This is a join
> query to list all intersected pairs of these two datasets:
> ```
>
> *SELECT COUNT(*) FROM ParkSet AS ps, LakeSet AS lsWHERE
> spatial_intersect(ps.geom, ls.geom);*
> *```*
>
> I rewrote this query in a more complicated SQL++ query(*) but has a better
> performance.
>
> I took a look at FuzzyJoinRule
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >
> implementation
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >
> (a
> rule for AQL+) in AsterixDB and I'm trying to make a similar implementation
> called SpatalJoinRule
> <
> https://github.com/tinvukhac/asterixdb/blob/cartilage/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/SpatialJoinRule.java
> >
> (a
> rule for SQL++). Now I'm stuck with some questions and I would
> really appreciate if you can help me to answer them:
>
> 1. What is the syntax to make placeholders for input operators and
> variables in a SQL++ template? In the FuzzyJoinRule
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >
> implementation
> <
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java
> >,
> I saw that it is using ##LEFT_0 for left input operator and $$LEFT_0 for
> left input variable. Is it similar in SQL++?
>
> 2. Does the query template for substitution must be a (full) compilable
> query? Or it can be just a part of a full query?
>
> 3. Can the substitute query contain UDF functions?
>
> Thanks,
>
> Tin
>
> (*) Rewitten query: in short, it partitioned the datasets by a grid (red
> statement) then we compute the intersection pairs for each cell of the
> grid.
> - 'mytileids' UDF function returns the corresponding cell ID of a spatial
> object.
> - 'referencepointtileid' UDF function is used to ignore the duplications in
> the final result.
> - (-180.0,83.0,180.0,90.0) is space MBR.
> - (100,100) is grid size (rows x columns).
>
> ```
>
>
>
>
>
> *SELECT COUNT(*) FROM (SELECT parksPartitioned.tile as tile1,
> lakesPartitioned.tile as tile2,
> test#referencepointtileid(parksPartitioned.geom,
> lakesPartitioned.geom,-180.0,83.0,180.0,90.0,100,100) as
> ref_tileFROM(SELECT tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM
> (SELECT
> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
> ParkSet as ps) AS p UNNEST p.tileids AS tile) AS parksPartitioned,(SELECT
> tile, p.ps.id <http://p.ps.id/>, p.ps.geom FROM (SELECT
> test#mytileids(ps.geom,-180.0,-83.0,180.0,90.0,100,100) AS tileids, ps FROM
> LakeSet as ps) AS p UNNEST p.tileids AS tile) AS lakesPartitionedWHERE
> parksPartitioned.tile = lakesPartitioned.tileAND
> spatial_intersect(parksPartitioned.geom, lakesPartitioned.geom)) AS result
> WHERE result.tile1 = result.ref_tile;*
> *```*
>