You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Yanjing Wang <zh...@gmail.com> on 2022/01/19 10:35:36 UTC

New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Hi, community

Here I recommend a new rule for converting UNION ALL sub plan to a single
input with an OR filter, the following is its conversion diagram.
[image: UnionAllToOrRule.jpg]

The conversion prerequisites are
1. left filter range has no intersection with right filter range.
2. Project and Input Sub Tree must be identical.

The rule will be used when Input Sub Tree is a computing-intensive or large
IO operation.

I don't know whether the community supports it or not, any suggestions will
be appreciated.

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Yanjing Wang <zh...@gmail.com>.
The previous commit is obsolete, Please use the latest commit on branch
union_all_fusion_to_single
<https://github.com/hannerwang/calcite/tree/union_all_fusion_to_single>,
thanks.

Yanjing Wang <zh...@gmail.com> 于2022年1月26日周三 12:17写道:

> Hi Community,
>
> I make a commit
> <https://github.com/hannerwang/calcite/commit/50fdf37c0c9d560cee3a1ce0d24b83e0346ecd90> for
> UNION ALL Converting, Please review if you are available, thanks.
>
> Alessandro Solimando <al...@gmail.com> 于2022年1月24日周一
> 00:42写道:
>
>> Hi Justin,
>> the same table could be used to describe different concepts/classes.
>>
>> For instance, you could have a single table "Employee" where the column
>> "job_type" is used to describe the different kinds of jobs (and therefore
>> employees) you have.
>>
>> At that point, you could have "White Collar Employee" with subclasses
>> "Secretary", "Manager", etc., and an ontology2db mapping like:
>>
>> secretary := select * from employee where job_type=x
>> ...
>> manager := select * from employee where job_type=y
>>
>> Then any request involving the "white_collar" concept would rewrite to a
>> union over the subclasses, this should benefit from this optimization if I
>> am not mistaken.
>>
>> Not sure how "IO intensive" such subqueries can be though, but apart from
>> the toy example, there are complex cases where the subquery cannot benefit
>> much from indexes and the data can be pretty massive depending on the
>> scenario.
>>
>> The problem is that, for complex databases which has been evolving over
>> several years, the logical concepts from the ontology do not easily match
>> concepts from ER model of the DB (if one ever existed :)), and in general
>> when this happens you are also hitting the RDBMs with unforeseen query
>> patterns for which the data model hasn't been optimised.
>>
>> Best regards,
>> Alessandro
>>
>> On Sun, 23 Jan 2022 at 13:33, Justin Swanhart <gr...@gmail.com>
>> wrote:
>>
>> > Hi Alessandro,
>> >
>> > That is very fascinating.  Most of those situations won't apply to this
>> > optimization though, since it requires the tables of the union to be
>> > identical rather than different tables representing unions of different
>> > ontologies, that is, this will only apply optimizations when an
>> ontology is
>> > used more than once in a union.  Unless I am mistaken.
>> >
>> > --Justin
>> >
>> > On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
>> > alessandro.solimando@gmail.com> wrote:
>> >
>> > > Hello everyone,
>> > > SQL queries automatically generated by semantic queries (see [1] for
>> > > instance) are pretty generous on UNIONs, so I'd say that in this area,
>> > you
>> > > will have many practical cases matching the patterns highlighted here.
>> > >
>> > > The main idea is that when you have a concept hierarchy in your
>> ontology
>> > > (isA relationship), where different concepts are generally mapped to
>> > > different tables (or subparts of a single table selected by changing
>> the
>> > > filter condition), when you query an ancestor in that hierarchy (say:
>> > > Person), in order to account for implicit facts, you generally UNION
>> all
>> > > the subtables (say, Employee, Customer, etc, which are also Person) in
>> > your
>> > > final SQL query.
>> > >
>> > > Best regards,
>> > > Alessandro
>> > >
>> > > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
>> > >
>> > > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <zhuangzixiaoyou@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > Thanks Albert.
>> > > >
>> > > >  you could probably refer to osdi paper "resin general sub query
>> > fusion"
>> > > > > which has chapters talking about general union fusion that deals
>> with
>> > > the
>> > > > > case Julian pointed out.
>> > > > >
>> > > >
>> > > > The optimization seems very helpful, but it introduces new operators
>> > and
>> > > > the rule for new operators may be more complex.
>> > > > Hi, Community, What do you think?
>> > > >
>> > > > Btw, per my understanding cases of union fusion is not that many in
>> > > > > practice.
>> > > > >
>> > > >
>> > > > In particular situations such as VDT and BI tools which generates
>> sql
>> > > > according to a fixed pattern we will meet many such union all
>> queries.
>> > > >
>> > > > Albert <zi...@gmail.com> 于2022年1月23日周日 11:22写道:
>> > > >
>> > > > > On a more general fusion case, you could probably refer to osdi
>> paper
>> > > > > "resin general sub query fusion" which has chapters talking about
>> > > general
>> > > > > union fusion that deals with the case Julian pointed out.
>> > > > >
>> > > > > Btw, per my understanding cases of union fusion is not that many
>> in
>> > > > > practice. Curious to know the real cases
>> > > > >
>> > > > > 在 2022年1月21日星期五,Yanjing Wang <zh...@gmail.com> 写道:
>> > > > >
>> > > > > > Hi Julian, I have logged a jira CALCITE-4990
>> > > > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this
>> new
>> > > > rule,
>> > > > > > later I will submit a pr for this.
>> > > > > >
>> > > > > > Julian Hyde <jh...@gmail.com> 于2022年1月21日周五 01:37写道:
>> > > > > >
>> > > > > > > Process logic sounds great. Please log a Jira case.
>> > > > > > >
>> > > > > > > There’s a lot here, so it’s probably wise to split into a
>> simple
>> > PR
>> > > > > that
>> > > > > > > just does the simple case (UNION ALL, no Project) and extend
>> > later.
>> > > > > > >
>> > > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
>> > > > zhuangzixiaoyou@gmail.com
>> > > > > >
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > Thanks Julian and Justin.
>> > > > > > > >
>> > > > > > > > What do you think the rule should be called?
>> > > > > UnionFilterTransposeRule,
>> > > > > > > >> perhaps?
>> > > > > > > >>
>> > > > > > > > SetOpFilterMergeRule?
>> > > > > > > >
>> > > > > > > > Maybe that problem does not occur when applied to UNION than
>> > when
>> > > > > > applied
>> > > > > > > >> to UNION ALL.
>> > > > > > > >
>> > > > > > > > Yes, This is very important.
>> > > > > > > >
>> > > > > > > > There would seem to be analogous rules for INTERSECT
>> (combine
>> > the
>> > > > > > > >> conditions using AND) and EXCEPT (combine the conditions
>> using
>> > > AND
>> > > > > > NOT).
>> > > > > > > >>
>> > > > > > > > Excellent extensions, all the three operators process logic
>> > are:
>> > > > > > > >
>> > > > > > > > For UNION:
>> > > > > > > > New Filter = left Filter *OR* right Filter.
>> > > > > > > >
>> > > > > > > > For INTERSECT:
>> > > > > > > > New Filter = left Filter *AND* right Filter.
>> > > > > > > >
>> > > > > > > > For EXCEPT:
>> > > > > > > > If left Filter, right Filter have no overlap, transform to
>> left
>> > > > child
>> > > > > > > tree,
>> > > > > > > > Otherwise
>> > > > > > > > New Filter = left Filter *AND NOT *right Filter.
>> > > > > > > >
>> > > > > > > > For UNION ALL:
>> > > > > > > > Add prerequisites:
>> > > > > > > > left Filter, right Filter must have no overlap.
>> > > > > > > >
>> > > > > > > > For INTERSECT ALL:
>> > > > > > > > Add prerequisites:
>> > > > > > > > If left Filter, right Filter have no overlap, transform to
>> > empty
>> > > > > > values.
>> > > > > > > > Otherwise
>> > > > > > > > the rule can't be applied.
>> > > > > > > >
>> > > > > > > > For EXCEPT ALL:
>> > > > > > > > same for  EXCEPT.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > work for N-way Union, not just 2-way Union.
>> > > > > > > >>
>> > > > > > > > Yes, I will add tests for this.
>> > > > > > > >
>> > > > > > > > And I think you should make it work whether or not a
>> Project is
>> > > > > > present.
>> > > > > > > >
>> > > > > > > > Ok, It seems I need construct several different operand
>> match
>> > > trees
>> > > > > for
>> > > > > > > > plan.
>> > > > > > > >
>> > > > > > > > Hi Julian, what do you think of the above process logic?
>> > > > > > > >
>> > > > > > > > Julian Hyde <jhyde.apache@gmail.com <mailto:
>> > > jhyde.apache@gmail.com
>> > > > >>
>> > > > > > > 于2022年1月20日周四 10:18写道:
>> > > > > > > >
>> > > > > > > >> Justin,
>> > > > > > > >>
>> > > > > > > >> For planning table or index scans, I would recommend using
>> a
>> > > > single
>> > > > > > > >> TableScan with a Filter that uses a Sarg, rather than using
>> > > > multiple
>> > > > > > > >> TableScans connected by a Union. So I think this rule will
>> be
>> > > > > useful.
>> > > > > > > >>
>> > > > > > > >> But I do agree that this proposed rule is not a “no
>> brainer”.
>> > It
>> > > > may
>> > > > > > not
>> > > > > > > >> do what people want/expect in all cases, and therefore it
>> > > probably
>> > > > > > > should
>> > > > > > > >> not be enabled it by default.
>> > > > > > > >>
>> > > > > > > >> Julian
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <
>> > > > greenlion@gmail.com>
>> > > > > > > >> wrote:
>> > > > > > > >>>
>> > > > > > > >>> Hi,
>> > > > > > > >>>
>> > > > > > > >>> Note that this will negate the optimization that one
>> usually
>> > is
>> > > > > > looking
>> > > > > > > >> for
>> > > > > > > >>> when writing such queries:
>> > > > > > > >>>
>> > > > > > > >>> Select * from TAB where a = 1
>> > > > > > > >>> UNION ALL
>> > > > > > > >>> Select * from TAB where b = 1
>> > > > > > > >>>
>> > > > > > > >>> In a database with indexes (most databases) this will
>> allow
>> > > > indexes
>> > > > > > to
>> > > > > > > be
>> > > > > > > >>> used on both the a column and the b column.
>> > > > > > > >>> Databases with bitmap indexes or without indexes would
>> > benefit
>> > > > from
>> > > > > > the
>> > > > > > > >>> rule.
>> > > > > > > >>>
>> > > > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
>> > > > > jhyde.apache@gmail.com
>> > > > > > > >> <mailto:jhyde.apache@gmail.com <mailto:
>> jhyde.apache@gmail.com
>> > > >>>
>> > > > > > wrote:
>> > > > > > > >>>
>> > > > > > > >>>> Can you log a Jira case for this?
>> > > > > > > >>>>
>> > > > > > > >>>> I think you should make your rule work for N-way Union,
>> not
>> > > just
>> > > > > > 2-way
>> > > > > > > >>>> Union. And I think you should make it work whether or
>> not a
>> > > > > Project
>> > > > > > is
>> > > > > > > >>>> present.
>> > > > > > > >>>>
>> > > > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <
>> > > > jhyde.apache@gmail.com
>> > > > > > > <ma...@gmail.com>>
>> > > > > > > >> wrote:
>> > > > > > > >>>>>
>> > > > > > > >>>>> It sounds useful.
>> > > > > > > >>>>>
>> > > > > > > >>>>> What do you think the rule should be called?
>> > > > > > > UnionFilterTransposeRule,
>> > > > > > > >>>> perhaps?
>> > > > > > > >>>>>
>> > > > > > > >>>>> A challenge when writing the rule will be to ensure that
>> > all
>> > > of
>> > > > > the
>> > > > > > > >>>> inputs to the Union are the same. The Volcano framework
>> is
>> > not
>> > > > > very
>> > > > > > > >> good at
>> > > > > > > >>>> that.
>> > > > > > > >>>>>
>> > > > > > > >>>>> You should be careful of the case that the conditions
>> > > overlap.
>> > > > > For
>> > > > > > > >>>> example, the rewrite
>> > > > > > > >>>>>
>> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30
>> > > > > > > >>>>> UNION ALL
>> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
>> > > > > > > >>>>>
>> > > > > > > >>>>> to
>> > > > > > > >>>>>
>> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25,
>> 35,
>> > 45)
>> > > > > > > >>>>>
>> > > > > > > >>>>> Is not valid, because rows with deptno = 25 will appear
>> > twice
>> > > > in
>> > > > > > the
>> > > > > > > >>>> first query, once in the second. Maybe that problem does
>> not
>> > > > occur
>> > > > > > > when
>> > > > > > > >>>> applied to UNION than when applied to UNION ALL.
>> > > > > > > >>>>>
>> > > > > > > >>>>> There would seem to be analogous rules for INTERSECT
>> > (combine
>> > > > the
>> > > > > > > >>>> conditions using AND) and EXCEPT (combine the conditions
>> > using
>> > > > AND
>> > > > > > > NOT).
>> > > > > > > >>>> Perhaps one rule could cover all set operations (see
>> > > > > > > >>>> FilterSetOpTransposeRule).
>> > > > > > > >>>>>
>> > > > > > > >>>>> Julian
>> > > > > > > >>>>>
>> > > > > > > >>>>>
>> > > > > > > >>>>>
>> > > > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
>> > > > > > > zhuangzixiaoyou@gmail.com <ma...@gmail.com>
>> > > > > > > >>>> <mailto:zhuangzixiaoyou@gmail.com <mailto:
>> > > > > zhuangzixiaoyou@gmail.com
>> > > > > > >
>> > > > > > > <mailto:zhuangzixiaoyou@gmail.com <mailto:
>> > > zhuangzixiaoyou@gmail.com
>> > > > > >>>>
>> > > > > > > >> wrote:
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> A simple example is converting SELECT a, b FROM t
>> WHERE c
>> > =
>> > > 1
>> > > > > > UNION
>> > > > > > > >> ALL
>> > > > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t
>> WHERE c
>> > > in
>> > > > > (1,
>> > > > > > 2)
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
>> > > > > > > zhuangzixiaoyou@gmail.com> <mailto:
>> > > > > > > >> zhuangzixiaoyou@gmail.com <mailto:
>> zhuangzixiaoyou@gmail.com>>
>> > > > > > <mailto:
>> > > > > > > >>>> zhuangzixiaoyou@gmail.com <mailto:
>> zhuangzixiaoyou@gmail.com
>> > > >>>
>> > > > > > > >> 于2022年1月19日周三 18:35写道:
>> > > > > > > >>>>>> Hi, community
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> Here I recommend a new rule for converting UNION ALL
>> sub
>> > > plan
>> > > > > to a
>> > > > > > > >>>> single input with an OR filter, the following is its
>> > > conversion
>> > > > > > > diagram.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> The conversion prerequisites are
>> > > > > > > >>>>>> 1. left filter range has no intersection with right
>> filter
>> > > > > range.
>> > > > > > > >>>>>> 2. Project and Input Sub Tree must be identical.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> The rule will be used when Input Sub Tree is a
>> > > > > computing-intensive
>> > > > > > > or
>> > > > > > > >>>> large IO operation.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> I don't know whether the community supports it or not,
>> any
>> > > > > > > suggestions
>> > > > > > > >>>> will be appreciated.
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > ~~~~~~~~~~~~~~~
>> > > > > no mistakes
>> > > > > ~~~~~~~~~~~~~~~~~~
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Yanjing Wang <zh...@gmail.com>.
Hi Community,

I make a commit
<https://github.com/hannerwang/calcite/commit/50fdf37c0c9d560cee3a1ce0d24b83e0346ecd90>
for
UNION ALL Converting, Please review if you are available, thanks.

Alessandro Solimando <al...@gmail.com> 于2022年1月24日周一 00:42写道:

> Hi Justin,
> the same table could be used to describe different concepts/classes.
>
> For instance, you could have a single table "Employee" where the column
> "job_type" is used to describe the different kinds of jobs (and therefore
> employees) you have.
>
> At that point, you could have "White Collar Employee" with subclasses
> "Secretary", "Manager", etc., and an ontology2db mapping like:
>
> secretary := select * from employee where job_type=x
> ...
> manager := select * from employee where job_type=y
>
> Then any request involving the "white_collar" concept would rewrite to a
> union over the subclasses, this should benefit from this optimization if I
> am not mistaken.
>
> Not sure how "IO intensive" such subqueries can be though, but apart from
> the toy example, there are complex cases where the subquery cannot benefit
> much from indexes and the data can be pretty massive depending on the
> scenario.
>
> The problem is that, for complex databases which has been evolving over
> several years, the logical concepts from the ontology do not easily match
> concepts from ER model of the DB (if one ever existed :)), and in general
> when this happens you are also hitting the RDBMs with unforeseen query
> patterns for which the data model hasn't been optimised.
>
> Best regards,
> Alessandro
>
> On Sun, 23 Jan 2022 at 13:33, Justin Swanhart <gr...@gmail.com> wrote:
>
> > Hi Alessandro,
> >
> > That is very fascinating.  Most of those situations won't apply to this
> > optimization though, since it requires the tables of the union to be
> > identical rather than different tables representing unions of different
> > ontologies, that is, this will only apply optimizations when an ontology
> is
> > used more than once in a union.  Unless I am mistaken.
> >
> > --Justin
> >
> > On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
> > alessandro.solimando@gmail.com> wrote:
> >
> > > Hello everyone,
> > > SQL queries automatically generated by semantic queries (see [1] for
> > > instance) are pretty generous on UNIONs, so I'd say that in this area,
> > you
> > > will have many practical cases matching the patterns highlighted here.
> > >
> > > The main idea is that when you have a concept hierarchy in your
> ontology
> > > (isA relationship), where different concepts are generally mapped to
> > > different tables (or subparts of a single table selected by changing
> the
> > > filter condition), when you query an ancestor in that hierarchy (say:
> > > Person), in order to account for implicit facts, you generally UNION
> all
> > > the subtables (say, Employee, Customer, etc, which are also Person) in
> > your
> > > final SQL query.
> > >
> > > Best regards,
> > > Alessandro
> > >
> > > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
> > >
> > > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <zh...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Albert.
> > > >
> > > >  you could probably refer to osdi paper "resin general sub query
> > fusion"
> > > > > which has chapters talking about general union fusion that deals
> with
> > > the
> > > > > case Julian pointed out.
> > > > >
> > > >
> > > > The optimization seems very helpful, but it introduces new operators
> > and
> > > > the rule for new operators may be more complex.
> > > > Hi, Community, What do you think?
> > > >
> > > > Btw, per my understanding cases of union fusion is not that many in
> > > > > practice.
> > > > >
> > > >
> > > > In particular situations such as VDT and BI tools which generates sql
> > > > according to a fixed pattern we will meet many such union all
> queries.
> > > >
> > > > Albert <zi...@gmail.com> 于2022年1月23日周日 11:22写道:
> > > >
> > > > > On a more general fusion case, you could probably refer to osdi
> paper
> > > > > "resin general sub query fusion" which has chapters talking about
> > > general
> > > > > union fusion that deals with the case Julian pointed out.
> > > > >
> > > > > Btw, per my understanding cases of union fusion is not that many in
> > > > > practice. Curious to know the real cases
> > > > >
> > > > > 在 2022年1月21日星期五,Yanjing Wang <zh...@gmail.com> 写道:
> > > > >
> > > > > > Hi Julian, I have logged a jira CALCITE-4990
> > > > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this
> new
> > > > rule,
> > > > > > later I will submit a pr for this.
> > > > > >
> > > > > > Julian Hyde <jh...@gmail.com> 于2022年1月21日周五 01:37写道:
> > > > > >
> > > > > > > Process logic sounds great. Please log a Jira case.
> > > > > > >
> > > > > > > There’s a lot here, so it’s probably wise to split into a
> simple
> > PR
> > > > > that
> > > > > > > just does the simple case (UNION ALL, no Project) and extend
> > later.
> > > > > > >
> > > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> > > > zhuangzixiaoyou@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks Julian and Justin.
> > > > > > > >
> > > > > > > > What do you think the rule should be called?
> > > > > UnionFilterTransposeRule,
> > > > > > > >> perhaps?
> > > > > > > >>
> > > > > > > > SetOpFilterMergeRule?
> > > > > > > >
> > > > > > > > Maybe that problem does not occur when applied to UNION than
> > when
> > > > > > applied
> > > > > > > >> to UNION ALL.
> > > > > > > >
> > > > > > > > Yes, This is very important.
> > > > > > > >
> > > > > > > > There would seem to be analogous rules for INTERSECT (combine
> > the
> > > > > > > >> conditions using AND) and EXCEPT (combine the conditions
> using
> > > AND
> > > > > > NOT).
> > > > > > > >>
> > > > > > > > Excellent extensions, all the three operators process logic
> > are:
> > > > > > > >
> > > > > > > > For UNION:
> > > > > > > > New Filter = left Filter *OR* right Filter.
> > > > > > > >
> > > > > > > > For INTERSECT:
> > > > > > > > New Filter = left Filter *AND* right Filter.
> > > > > > > >
> > > > > > > > For EXCEPT:
> > > > > > > > If left Filter, right Filter have no overlap, transform to
> left
> > > > child
> > > > > > > tree,
> > > > > > > > Otherwise
> > > > > > > > New Filter = left Filter *AND NOT *right Filter.
> > > > > > > >
> > > > > > > > For UNION ALL:
> > > > > > > > Add prerequisites:
> > > > > > > > left Filter, right Filter must have no overlap.
> > > > > > > >
> > > > > > > > For INTERSECT ALL:
> > > > > > > > Add prerequisites:
> > > > > > > > If left Filter, right Filter have no overlap, transform to
> > empty
> > > > > > values.
> > > > > > > > Otherwise
> > > > > > > > the rule can't be applied.
> > > > > > > >
> > > > > > > > For EXCEPT ALL:
> > > > > > > > same for  EXCEPT.
> > > > > > > >
> > > > > > > >
> > > > > > > > work for N-way Union, not just 2-way Union.
> > > > > > > >>
> > > > > > > > Yes, I will add tests for this.
> > > > > > > >
> > > > > > > > And I think you should make it work whether or not a Project
> is
> > > > > > present.
> > > > > > > >
> > > > > > > > Ok, It seems I need construct several different operand match
> > > trees
> > > > > for
> > > > > > > > plan.
> > > > > > > >
> > > > > > > > Hi Julian, what do you think of the above process logic?
> > > > > > > >
> > > > > > > > Julian Hyde <jhyde.apache@gmail.com <mailto:
> > > jhyde.apache@gmail.com
> > > > >>
> > > > > > > 于2022年1月20日周四 10:18写道:
> > > > > > > >
> > > > > > > >> Justin,
> > > > > > > >>
> > > > > > > >> For planning table or index scans, I would recommend using a
> > > > single
> > > > > > > >> TableScan with a Filter that uses a Sarg, rather than using
> > > > multiple
> > > > > > > >> TableScans connected by a Union. So I think this rule will
> be
> > > > > useful.
> > > > > > > >>
> > > > > > > >> But I do agree that this proposed rule is not a “no
> brainer”.
> > It
> > > > may
> > > > > > not
> > > > > > > >> do what people want/expect in all cases, and therefore it
> > > probably
> > > > > > > should
> > > > > > > >> not be enabled it by default.
> > > > > > > >>
> > > > > > > >> Julian
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <
> > > > greenlion@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>>
> > > > > > > >>> Hi,
> > > > > > > >>>
> > > > > > > >>> Note that this will negate the optimization that one
> usually
> > is
> > > > > > looking
> > > > > > > >> for
> > > > > > > >>> when writing such queries:
> > > > > > > >>>
> > > > > > > >>> Select * from TAB where a = 1
> > > > > > > >>> UNION ALL
> > > > > > > >>> Select * from TAB where b = 1
> > > > > > > >>>
> > > > > > > >>> In a database with indexes (most databases) this will allow
> > > > indexes
> > > > > > to
> > > > > > > be
> > > > > > > >>> used on both the a column and the b column.
> > > > > > > >>> Databases with bitmap indexes or without indexes would
> > benefit
> > > > from
> > > > > > the
> > > > > > > >>> rule.
> > > > > > > >>>
> > > > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
> > > > > jhyde.apache@gmail.com
> > > > > > > >> <mailto:jhyde.apache@gmail.com <mailto:
> jhyde.apache@gmail.com
> > > >>>
> > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>>> Can you log a Jira case for this?
> > > > > > > >>>>
> > > > > > > >>>> I think you should make your rule work for N-way Union,
> not
> > > just
> > > > > > 2-way
> > > > > > > >>>> Union. And I think you should make it work whether or not
> a
> > > > > Project
> > > > > > is
> > > > > > > >>>> present.
> > > > > > > >>>>
> > > > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <
> > > > jhyde.apache@gmail.com
> > > > > > > <ma...@gmail.com>>
> > > > > > > >> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>> It sounds useful.
> > > > > > > >>>>>
> > > > > > > >>>>> What do you think the rule should be called?
> > > > > > > UnionFilterTransposeRule,
> > > > > > > >>>> perhaps?
> > > > > > > >>>>>
> > > > > > > >>>>> A challenge when writing the rule will be to ensure that
> > all
> > > of
> > > > > the
> > > > > > > >>>> inputs to the Union are the same. The Volcano framework is
> > not
> > > > > very
> > > > > > > >> good at
> > > > > > > >>>> that.
> > > > > > > >>>>>
> > > > > > > >>>>> You should be careful of the case that the conditions
> > > overlap.
> > > > > For
> > > > > > > >>>> example, the rewrite
> > > > > > > >>>>>
> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30
> > > > > > > >>>>> UNION ALL
> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > > > > > > >>>>>
> > > > > > > >>>>> to
> > > > > > > >>>>>
> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35,
> > 45)
> > > > > > > >>>>>
> > > > > > > >>>>> Is not valid, because rows with deptno = 25 will appear
> > twice
> > > > in
> > > > > > the
> > > > > > > >>>> first query, once in the second. Maybe that problem does
> not
> > > > occur
> > > > > > > when
> > > > > > > >>>> applied to UNION than when applied to UNION ALL.
> > > > > > > >>>>>
> > > > > > > >>>>> There would seem to be analogous rules for INTERSECT
> > (combine
> > > > the
> > > > > > > >>>> conditions using AND) and EXCEPT (combine the conditions
> > using
> > > > AND
> > > > > > > NOT).
> > > > > > > >>>> Perhaps one rule could cover all set operations (see
> > > > > > > >>>> FilterSetOpTransposeRule).
> > > > > > > >>>>>
> > > > > > > >>>>> Julian
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> > > > > > > zhuangzixiaoyou@gmail.com <ma...@gmail.com>
> > > > > > > >>>> <mailto:zhuangzixiaoyou@gmail.com <mailto:
> > > > > zhuangzixiaoyou@gmail.com
> > > > > > >
> > > > > > > <mailto:zhuangzixiaoyou@gmail.com <mailto:
> > > zhuangzixiaoyou@gmail.com
> > > > > >>>>
> > > > > > > >> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>> A simple example is converting SELECT a, b FROM t WHERE
> c
> > =
> > > 1
> > > > > > UNION
> > > > > > > >> ALL
> > > > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t
> WHERE c
> > > in
> > > > > (1,
> > > > > > 2)
> > > > > > > >>>>>>
> > > > > > > >>>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
> > > > > > > zhuangzixiaoyou@gmail.com> <mailto:
> > > > > > > >> zhuangzixiaoyou@gmail.com <mailto:zhuangzixiaoyou@gmail.com
> >>
> > > > > > <mailto:
> > > > > > > >>>> zhuangzixiaoyou@gmail.com <mailto:
> zhuangzixiaoyou@gmail.com
> > > >>>
> > > > > > > >> 于2022年1月19日周三 18:35写道:
> > > > > > > >>>>>> Hi, community
> > > > > > > >>>>>>
> > > > > > > >>>>>> Here I recommend a new rule for converting UNION ALL sub
> > > plan
> > > > > to a
> > > > > > > >>>> single input with an OR filter, the following is its
> > > conversion
> > > > > > > diagram.
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>> The conversion prerequisites are
> > > > > > > >>>>>> 1. left filter range has no intersection with right
> filter
> > > > > range.
> > > > > > > >>>>>> 2. Project and Input Sub Tree must be identical.
> > > > > > > >>>>>>
> > > > > > > >>>>>> The rule will be used when Input Sub Tree is a
> > > > > computing-intensive
> > > > > > > or
> > > > > > > >>>> large IO operation.
> > > > > > > >>>>>>
> > > > > > > >>>>>> I don't know whether the community supports it or not,
> any
> > > > > > > suggestions
> > > > > > > >>>> will be appreciated.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > ~~~~~~~~~~~~~~~
> > > > > no mistakes
> > > > > ~~~~~~~~~~~~~~~~~~
> > > > >
> > > >
> > >
> >
>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Alessandro Solimando <al...@gmail.com>.
Hi Justin,
the same table could be used to describe different concepts/classes.

For instance, you could have a single table "Employee" where the column
"job_type" is used to describe the different kinds of jobs (and therefore
employees) you have.

At that point, you could have "White Collar Employee" with subclasses
"Secretary", "Manager", etc., and an ontology2db mapping like:

secretary := select * from employee where job_type=x
...
manager := select * from employee where job_type=y

Then any request involving the "white_collar" concept would rewrite to a
union over the subclasses, this should benefit from this optimization if I
am not mistaken.

Not sure how "IO intensive" such subqueries can be though, but apart from
the toy example, there are complex cases where the subquery cannot benefit
much from indexes and the data can be pretty massive depending on the
scenario.

The problem is that, for complex databases which has been evolving over
several years, the logical concepts from the ontology do not easily match
concepts from ER model of the DB (if one ever existed :)), and in general
when this happens you are also hitting the RDBMs with unforeseen query
patterns for which the data model hasn't been optimised.

Best regards,
Alessandro

On Sun, 23 Jan 2022 at 13:33, Justin Swanhart <gr...@gmail.com> wrote:

> Hi Alessandro,
>
> That is very fascinating.  Most of those situations won't apply to this
> optimization though, since it requires the tables of the union to be
> identical rather than different tables representing unions of different
> ontologies, that is, this will only apply optimizations when an ontology is
> used more than once in a union.  Unless I am mistaken.
>
> --Justin
>
> On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
> alessandro.solimando@gmail.com> wrote:
>
> > Hello everyone,
> > SQL queries automatically generated by semantic queries (see [1] for
> > instance) are pretty generous on UNIONs, so I'd say that in this area,
> you
> > will have many practical cases matching the patterns highlighted here.
> >
> > The main idea is that when you have a concept hierarchy in your ontology
> > (isA relationship), where different concepts are generally mapped to
> > different tables (or subparts of a single table selected by changing the
> > filter condition), when you query an ancestor in that hierarchy (say:
> > Person), in order to account for implicit facts, you generally UNION all
> > the subtables (say, Employee, Customer, etc, which are also Person) in
> your
> > final SQL query.
> >
> > Best regards,
> > Alessandro
> >
> > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
> >
> > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <zh...@gmail.com>
> > wrote:
> >
> > > Thanks Albert.
> > >
> > >  you could probably refer to osdi paper "resin general sub query
> fusion"
> > > > which has chapters talking about general union fusion that deals with
> > the
> > > > case Julian pointed out.
> > > >
> > >
> > > The optimization seems very helpful, but it introduces new operators
> and
> > > the rule for new operators may be more complex.
> > > Hi, Community, What do you think?
> > >
> > > Btw, per my understanding cases of union fusion is not that many in
> > > > practice.
> > > >
> > >
> > > In particular situations such as VDT and BI tools which generates sql
> > > according to a fixed pattern we will meet many such union all queries.
> > >
> > > Albert <zi...@gmail.com> 于2022年1月23日周日 11:22写道:
> > >
> > > > On a more general fusion case, you could probably refer to osdi paper
> > > > "resin general sub query fusion" which has chapters talking about
> > general
> > > > union fusion that deals with the case Julian pointed out.
> > > >
> > > > Btw, per my understanding cases of union fusion is not that many in
> > > > practice. Curious to know the real cases
> > > >
> > > > 在 2022年1月21日星期五,Yanjing Wang <zh...@gmail.com> 写道:
> > > >
> > > > > Hi Julian, I have logged a jira CALCITE-4990
> > > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this new
> > > rule,
> > > > > later I will submit a pr for this.
> > > > >
> > > > > Julian Hyde <jh...@gmail.com> 于2022年1月21日周五 01:37写道:
> > > > >
> > > > > > Process logic sounds great. Please log a Jira case.
> > > > > >
> > > > > > There’s a lot here, so it’s probably wise to split into a simple
> PR
> > > > that
> > > > > > just does the simple case (UNION ALL, no Project) and extend
> later.
> > > > > >
> > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> > > zhuangzixiaoyou@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Thanks Julian and Justin.
> > > > > > >
> > > > > > > What do you think the rule should be called?
> > > > UnionFilterTransposeRule,
> > > > > > >> perhaps?
> > > > > > >>
> > > > > > > SetOpFilterMergeRule?
> > > > > > >
> > > > > > > Maybe that problem does not occur when applied to UNION than
> when
> > > > > applied
> > > > > > >> to UNION ALL.
> > > > > > >
> > > > > > > Yes, This is very important.
> > > > > > >
> > > > > > > There would seem to be analogous rules for INTERSECT (combine
> the
> > > > > > >> conditions using AND) and EXCEPT (combine the conditions using
> > AND
> > > > > NOT).
> > > > > > >>
> > > > > > > Excellent extensions, all the three operators process logic
> are:
> > > > > > >
> > > > > > > For UNION:
> > > > > > > New Filter = left Filter *OR* right Filter.
> > > > > > >
> > > > > > > For INTERSECT:
> > > > > > > New Filter = left Filter *AND* right Filter.
> > > > > > >
> > > > > > > For EXCEPT:
> > > > > > > If left Filter, right Filter have no overlap, transform to left
> > > child
> > > > > > tree,
> > > > > > > Otherwise
> > > > > > > New Filter = left Filter *AND NOT *right Filter.
> > > > > > >
> > > > > > > For UNION ALL:
> > > > > > > Add prerequisites:
> > > > > > > left Filter, right Filter must have no overlap.
> > > > > > >
> > > > > > > For INTERSECT ALL:
> > > > > > > Add prerequisites:
> > > > > > > If left Filter, right Filter have no overlap, transform to
> empty
> > > > > values.
> > > > > > > Otherwise
> > > > > > > the rule can't be applied.
> > > > > > >
> > > > > > > For EXCEPT ALL:
> > > > > > > same for  EXCEPT.
> > > > > > >
> > > > > > >
> > > > > > > work for N-way Union, not just 2-way Union.
> > > > > > >>
> > > > > > > Yes, I will add tests for this.
> > > > > > >
> > > > > > > And I think you should make it work whether or not a Project is
> > > > > present.
> > > > > > >
> > > > > > > Ok, It seems I need construct several different operand match
> > trees
> > > > for
> > > > > > > plan.
> > > > > > >
> > > > > > > Hi Julian, what do you think of the above process logic?
> > > > > > >
> > > > > > > Julian Hyde <jhyde.apache@gmail.com <mailto:
> > jhyde.apache@gmail.com
> > > >>
> > > > > > 于2022年1月20日周四 10:18写道:
> > > > > > >
> > > > > > >> Justin,
> > > > > > >>
> > > > > > >> For planning table or index scans, I would recommend using a
> > > single
> > > > > > >> TableScan with a Filter that uses a Sarg, rather than using
> > > multiple
> > > > > > >> TableScans connected by a Union. So I think this rule will be
> > > > useful.
> > > > > > >>
> > > > > > >> But I do agree that this proposed rule is not a “no brainer”.
> It
> > > may
> > > > > not
> > > > > > >> do what people want/expect in all cases, and therefore it
> > probably
> > > > > > should
> > > > > > >> not be enabled it by default.
> > > > > > >>
> > > > > > >> Julian
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <
> > > greenlion@gmail.com>
> > > > > > >> wrote:
> > > > > > >>>
> > > > > > >>> Hi,
> > > > > > >>>
> > > > > > >>> Note that this will negate the optimization that one usually
> is
> > > > > looking
> > > > > > >> for
> > > > > > >>> when writing such queries:
> > > > > > >>>
> > > > > > >>> Select * from TAB where a = 1
> > > > > > >>> UNION ALL
> > > > > > >>> Select * from TAB where b = 1
> > > > > > >>>
> > > > > > >>> In a database with indexes (most databases) this will allow
> > > indexes
> > > > > to
> > > > > > be
> > > > > > >>> used on both the a column and the b column.
> > > > > > >>> Databases with bitmap indexes or without indexes would
> benefit
> > > from
> > > > > the
> > > > > > >>> rule.
> > > > > > >>>
> > > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
> > > > jhyde.apache@gmail.com
> > > > > > >> <mailto:jhyde.apache@gmail.com <mailto:jhyde.apache@gmail.com
> > >>>
> > > > > wrote:
> > > > > > >>>
> > > > > > >>>> Can you log a Jira case for this?
> > > > > > >>>>
> > > > > > >>>> I think you should make your rule work for N-way Union, not
> > just
> > > > > 2-way
> > > > > > >>>> Union. And I think you should make it work whether or not a
> > > > Project
> > > > > is
> > > > > > >>>> present.
> > > > > > >>>>
> > > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <
> > > jhyde.apache@gmail.com
> > > > > > <ma...@gmail.com>>
> > > > > > >> wrote:
> > > > > > >>>>>
> > > > > > >>>>> It sounds useful.
> > > > > > >>>>>
> > > > > > >>>>> What do you think the rule should be called?
> > > > > > UnionFilterTransposeRule,
> > > > > > >>>> perhaps?
> > > > > > >>>>>
> > > > > > >>>>> A challenge when writing the rule will be to ensure that
> all
> > of
> > > > the
> > > > > > >>>> inputs to the Union are the same. The Volcano framework is
> not
> > > > very
> > > > > > >> good at
> > > > > > >>>> that.
> > > > > > >>>>>
> > > > > > >>>>> You should be careful of the case that the conditions
> > overlap.
> > > > For
> > > > > > >>>> example, the rewrite
> > > > > > >>>>>
> > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30
> > > > > > >>>>> UNION ALL
> > > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > > > > > >>>>>
> > > > > > >>>>> to
> > > > > > >>>>>
> > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35,
> 45)
> > > > > > >>>>>
> > > > > > >>>>> Is not valid, because rows with deptno = 25 will appear
> twice
> > > in
> > > > > the
> > > > > > >>>> first query, once in the second. Maybe that problem does not
> > > occur
> > > > > > when
> > > > > > >>>> applied to UNION than when applied to UNION ALL.
> > > > > > >>>>>
> > > > > > >>>>> There would seem to be analogous rules for INTERSECT
> (combine
> > > the
> > > > > > >>>> conditions using AND) and EXCEPT (combine the conditions
> using
> > > AND
> > > > > > NOT).
> > > > > > >>>> Perhaps one rule could cover all set operations (see
> > > > > > >>>> FilterSetOpTransposeRule).
> > > > > > >>>>>
> > > > > > >>>>> Julian
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> > > > > > zhuangzixiaoyou@gmail.com <ma...@gmail.com>
> > > > > > >>>> <mailto:zhuangzixiaoyou@gmail.com <mailto:
> > > > zhuangzixiaoyou@gmail.com
> > > > > >
> > > > > > <mailto:zhuangzixiaoyou@gmail.com <mailto:
> > zhuangzixiaoyou@gmail.com
> > > > >>>>
> > > > > > >> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>> A simple example is converting SELECT a, b FROM t WHERE c
> =
> > 1
> > > > > UNION
> > > > > > >> ALL
> > > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c
> > in
> > > > (1,
> > > > > 2)
> > > > > > >>>>>>
> > > > > > >>>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
> > > > > > zhuangzixiaoyou@gmail.com> <mailto:
> > > > > > >> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>
> > > > > <mailto:
> > > > > > >>>> zhuangzixiaoyou@gmail.com <mailto:zhuangzixiaoyou@gmail.com
> > >>>
> > > > > > >> 于2022年1月19日周三 18:35写道:
> > > > > > >>>>>> Hi, community
> > > > > > >>>>>>
> > > > > > >>>>>> Here I recommend a new rule for converting UNION ALL sub
> > plan
> > > > to a
> > > > > > >>>> single input with an OR filter, the following is its
> > conversion
> > > > > > diagram.
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> The conversion prerequisites are
> > > > > > >>>>>> 1. left filter range has no intersection with right filter
> > > > range.
> > > > > > >>>>>> 2. Project and Input Sub Tree must be identical.
> > > > > > >>>>>>
> > > > > > >>>>>> The rule will be used when Input Sub Tree is a
> > > > computing-intensive
> > > > > > or
> > > > > > >>>> large IO operation.
> > > > > > >>>>>>
> > > > > > >>>>>> I don't know whether the community supports it or not, any
> > > > > > suggestions
> > > > > > >>>> will be appreciated.
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > ~~~~~~~~~~~~~~~
> > > > no mistakes
> > > > ~~~~~~~~~~~~~~~~~~
> > > >
> > >
> >
>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Justin Swanhart <gr...@gmail.com>.
Hi Alessandro,

That is very fascinating.  Most of those situations won't apply to this
optimization though, since it requires the tables of the union to be
identical rather than different tables representing unions of different
ontologies, that is, this will only apply optimizations when an ontology is
used more than once in a union.  Unless I am mistaken.

--Justin

On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
alessandro.solimando@gmail.com> wrote:

> Hello everyone,
> SQL queries automatically generated by semantic queries (see [1] for
> instance) are pretty generous on UNIONs, so I'd say that in this area, you
> will have many practical cases matching the patterns highlighted here.
>
> The main idea is that when you have a concept hierarchy in your ontology
> (isA relationship), where different concepts are generally mapped to
> different tables (or subparts of a single table selected by changing the
> filter condition), when you query an ancestor in that hierarchy (say:
> Person), in order to account for implicit facts, you generally UNION all
> the subtables (say, Employee, Customer, etc, which are also Person) in your
> final SQL query.
>
> Best regards,
> Alessandro
>
> [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
>
> On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <zh...@gmail.com>
> wrote:
>
> > Thanks Albert.
> >
> >  you could probably refer to osdi paper "resin general sub query fusion"
> > > which has chapters talking about general union fusion that deals with
> the
> > > case Julian pointed out.
> > >
> >
> > The optimization seems very helpful, but it introduces new operators and
> > the rule for new operators may be more complex.
> > Hi, Community, What do you think?
> >
> > Btw, per my understanding cases of union fusion is not that many in
> > > practice.
> > >
> >
> > In particular situations such as VDT and BI tools which generates sql
> > according to a fixed pattern we will meet many such union all queries.
> >
> > Albert <zi...@gmail.com> 于2022年1月23日周日 11:22写道:
> >
> > > On a more general fusion case, you could probably refer to osdi paper
> > > "resin general sub query fusion" which has chapters talking about
> general
> > > union fusion that deals with the case Julian pointed out.
> > >
> > > Btw, per my understanding cases of union fusion is not that many in
> > > practice. Curious to know the real cases
> > >
> > > 在 2022年1月21日星期五,Yanjing Wang <zh...@gmail.com> 写道:
> > >
> > > > Hi Julian, I have logged a jira CALCITE-4990
> > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this new
> > rule,
> > > > later I will submit a pr for this.
> > > >
> > > > Julian Hyde <jh...@gmail.com> 于2022年1月21日周五 01:37写道:
> > > >
> > > > > Process logic sounds great. Please log a Jira case.
> > > > >
> > > > > There’s a lot here, so it’s probably wise to split into a simple PR
> > > that
> > > > > just does the simple case (UNION ALL, no Project) and extend later.
> > > > >
> > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> > zhuangzixiaoyou@gmail.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > > Thanks Julian and Justin.
> > > > > >
> > > > > > What do you think the rule should be called?
> > > UnionFilterTransposeRule,
> > > > > >> perhaps?
> > > > > >>
> > > > > > SetOpFilterMergeRule?
> > > > > >
> > > > > > Maybe that problem does not occur when applied to UNION than when
> > > > applied
> > > > > >> to UNION ALL.
> > > > > >
> > > > > > Yes, This is very important.
> > > > > >
> > > > > > There would seem to be analogous rules for INTERSECT (combine the
> > > > > >> conditions using AND) and EXCEPT (combine the conditions using
> AND
> > > > NOT).
> > > > > >>
> > > > > > Excellent extensions, all the three operators process logic are:
> > > > > >
> > > > > > For UNION:
> > > > > > New Filter = left Filter *OR* right Filter.
> > > > > >
> > > > > > For INTERSECT:
> > > > > > New Filter = left Filter *AND* right Filter.
> > > > > >
> > > > > > For EXCEPT:
> > > > > > If left Filter, right Filter have no overlap, transform to left
> > child
> > > > > tree,
> > > > > > Otherwise
> > > > > > New Filter = left Filter *AND NOT *right Filter.
> > > > > >
> > > > > > For UNION ALL:
> > > > > > Add prerequisites:
> > > > > > left Filter, right Filter must have no overlap.
> > > > > >
> > > > > > For INTERSECT ALL:
> > > > > > Add prerequisites:
> > > > > > If left Filter, right Filter have no overlap, transform to empty
> > > > values.
> > > > > > Otherwise
> > > > > > the rule can't be applied.
> > > > > >
> > > > > > For EXCEPT ALL:
> > > > > > same for  EXCEPT.
> > > > > >
> > > > > >
> > > > > > work for N-way Union, not just 2-way Union.
> > > > > >>
> > > > > > Yes, I will add tests for this.
> > > > > >
> > > > > > And I think you should make it work whether or not a Project is
> > > > present.
> > > > > >
> > > > > > Ok, It seems I need construct several different operand match
> trees
> > > for
> > > > > > plan.
> > > > > >
> > > > > > Hi Julian, what do you think of the above process logic?
> > > > > >
> > > > > > Julian Hyde <jhyde.apache@gmail.com <mailto:
> jhyde.apache@gmail.com
> > >>
> > > > > 于2022年1月20日周四 10:18写道:
> > > > > >
> > > > > >> Justin,
> > > > > >>
> > > > > >> For planning table or index scans, I would recommend using a
> > single
> > > > > >> TableScan with a Filter that uses a Sarg, rather than using
> > multiple
> > > > > >> TableScans connected by a Union. So I think this rule will be
> > > useful.
> > > > > >>
> > > > > >> But I do agree that this proposed rule is not a “no brainer”. It
> > may
> > > > not
> > > > > >> do what people want/expect in all cases, and therefore it
> probably
> > > > > should
> > > > > >> not be enabled it by default.
> > > > > >>
> > > > > >> Julian
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <
> > greenlion@gmail.com>
> > > > > >> wrote:
> > > > > >>>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> Note that this will negate the optimization that one usually is
> > > > looking
> > > > > >> for
> > > > > >>> when writing such queries:
> > > > > >>>
> > > > > >>> Select * from TAB where a = 1
> > > > > >>> UNION ALL
> > > > > >>> Select * from TAB where b = 1
> > > > > >>>
> > > > > >>> In a database with indexes (most databases) this will allow
> > indexes
> > > > to
> > > > > be
> > > > > >>> used on both the a column and the b column.
> > > > > >>> Databases with bitmap indexes or without indexes would benefit
> > from
> > > > the
> > > > > >>> rule.
> > > > > >>>
> > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
> > > jhyde.apache@gmail.com
> > > > > >> <mailto:jhyde.apache@gmail.com <mailto:jhyde.apache@gmail.com
> >>>
> > > > wrote:
> > > > > >>>
> > > > > >>>> Can you log a Jira case for this?
> > > > > >>>>
> > > > > >>>> I think you should make your rule work for N-way Union, not
> just
> > > > 2-way
> > > > > >>>> Union. And I think you should make it work whether or not a
> > > Project
> > > > is
> > > > > >>>> present.
> > > > > >>>>
> > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <
> > jhyde.apache@gmail.com
> > > > > <ma...@gmail.com>>
> > > > > >> wrote:
> > > > > >>>>>
> > > > > >>>>> It sounds useful.
> > > > > >>>>>
> > > > > >>>>> What do you think the rule should be called?
> > > > > UnionFilterTransposeRule,
> > > > > >>>> perhaps?
> > > > > >>>>>
> > > > > >>>>> A challenge when writing the rule will be to ensure that all
> of
> > > the
> > > > > >>>> inputs to the Union are the same. The Volcano framework is not
> > > very
> > > > > >> good at
> > > > > >>>> that.
> > > > > >>>>>
> > > > > >>>>> You should be careful of the case that the conditions
> overlap.
> > > For
> > > > > >>>> example, the rewrite
> > > > > >>>>>
> > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30
> > > > > >>>>> UNION ALL
> > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > > > > >>>>>
> > > > > >>>>> to
> > > > > >>>>>
> > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> > > > > >>>>>
> > > > > >>>>> Is not valid, because rows with deptno = 25 will appear twice
> > in
> > > > the
> > > > > >>>> first query, once in the second. Maybe that problem does not
> > occur
> > > > > when
> > > > > >>>> applied to UNION than when applied to UNION ALL.
> > > > > >>>>>
> > > > > >>>>> There would seem to be analogous rules for INTERSECT (combine
> > the
> > > > > >>>> conditions using AND) and EXCEPT (combine the conditions using
> > AND
> > > > > NOT).
> > > > > >>>> Perhaps one rule could cover all set operations (see
> > > > > >>>> FilterSetOpTransposeRule).
> > > > > >>>>>
> > > > > >>>>> Julian
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> > > > > zhuangzixiaoyou@gmail.com <ma...@gmail.com>
> > > > > >>>> <mailto:zhuangzixiaoyou@gmail.com <mailto:
> > > zhuangzixiaoyou@gmail.com
> > > > >
> > > > > <mailto:zhuangzixiaoyou@gmail.com <mailto:
> zhuangzixiaoyou@gmail.com
> > > >>>>
> > > > > >> wrote:
> > > > > >>>>>>
> > > > > >>>>>> A simple example is converting SELECT a, b FROM t WHERE c =
> 1
> > > > UNION
> > > > > >> ALL
> > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c
> in
> > > (1,
> > > > 2)
> > > > > >>>>>>
> > > > > >>>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
> > > > > zhuangzixiaoyou@gmail.com> <mailto:
> > > > > >> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>
> > > > <mailto:
> > > > > >>>> zhuangzixiaoyou@gmail.com <mailto:zhuangzixiaoyou@gmail.com
> >>>
> > > > > >> 于2022年1月19日周三 18:35写道:
> > > > > >>>>>> Hi, community
> > > > > >>>>>>
> > > > > >>>>>> Here I recommend a new rule for converting UNION ALL sub
> plan
> > > to a
> > > > > >>>> single input with an OR filter, the following is its
> conversion
> > > > > diagram.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> The conversion prerequisites are
> > > > > >>>>>> 1. left filter range has no intersection with right filter
> > > range.
> > > > > >>>>>> 2. Project and Input Sub Tree must be identical.
> > > > > >>>>>>
> > > > > >>>>>> The rule will be used when Input Sub Tree is a
> > > computing-intensive
> > > > > or
> > > > > >>>> large IO operation.
> > > > > >>>>>>
> > > > > >>>>>> I don't know whether the community supports it or not, any
> > > > > suggestions
> > > > > >>>> will be appreciated.
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > ~~~~~~~~~~~~~~~
> > > no mistakes
> > > ~~~~~~~~~~~~~~~~~~
> > >
> >
>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Alessandro Solimando <al...@gmail.com>.
Hello everyone,
SQL queries automatically generated by semantic queries (see [1] for
instance) are pretty generous on UNIONs, so I'd say that in this area, you
will have many practical cases matching the patterns highlighted here.

The main idea is that when you have a concept hierarchy in your ontology
(isA relationship), where different concepts are generally mapped to
different tables (or subparts of a single table selected by changing the
filter condition), when you query an ancestor in that hierarchy (say:
Person), in order to account for implicit facts, you generally UNION all
the subtables (say, Employee, Customer, etc, which are also Person) in your
final SQL query.

Best regards,
Alessandro

[1] https://en.wikipedia.org/wiki/Ontology-based_data_integration

On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <zh...@gmail.com>
wrote:

> Thanks Albert.
>
>  you could probably refer to osdi paper "resin general sub query fusion"
> > which has chapters talking about general union fusion that deals with the
> > case Julian pointed out.
> >
>
> The optimization seems very helpful, but it introduces new operators and
> the rule for new operators may be more complex.
> Hi, Community, What do you think?
>
> Btw, per my understanding cases of union fusion is not that many in
> > practice.
> >
>
> In particular situations such as VDT and BI tools which generates sql
> according to a fixed pattern we will meet many such union all queries.
>
> Albert <zi...@gmail.com> 于2022年1月23日周日 11:22写道:
>
> > On a more general fusion case, you could probably refer to osdi paper
> > "resin general sub query fusion" which has chapters talking about general
> > union fusion that deals with the case Julian pointed out.
> >
> > Btw, per my understanding cases of union fusion is not that many in
> > practice. Curious to know the real cases
> >
> > 在 2022年1月21日星期五,Yanjing Wang <zh...@gmail.com> 写道:
> >
> > > Hi Julian, I have logged a jira CALCITE-4990
> > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this new
> rule,
> > > later I will submit a pr for this.
> > >
> > > Julian Hyde <jh...@gmail.com> 于2022年1月21日周五 01:37写道:
> > >
> > > > Process logic sounds great. Please log a Jira case.
> > > >
> > > > There’s a lot here, so it’s probably wise to split into a simple PR
> > that
> > > > just does the simple case (UNION ALL, no Project) and extend later.
> > > >
> > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> zhuangzixiaoyou@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > Thanks Julian and Justin.
> > > > >
> > > > > What do you think the rule should be called?
> > UnionFilterTransposeRule,
> > > > >> perhaps?
> > > > >>
> > > > > SetOpFilterMergeRule?
> > > > >
> > > > > Maybe that problem does not occur when applied to UNION than when
> > > applied
> > > > >> to UNION ALL.
> > > > >
> > > > > Yes, This is very important.
> > > > >
> > > > > There would seem to be analogous rules for INTERSECT (combine the
> > > > >> conditions using AND) and EXCEPT (combine the conditions using AND
> > > NOT).
> > > > >>
> > > > > Excellent extensions, all the three operators process logic are:
> > > > >
> > > > > For UNION:
> > > > > New Filter = left Filter *OR* right Filter.
> > > > >
> > > > > For INTERSECT:
> > > > > New Filter = left Filter *AND* right Filter.
> > > > >
> > > > > For EXCEPT:
> > > > > If left Filter, right Filter have no overlap, transform to left
> child
> > > > tree,
> > > > > Otherwise
> > > > > New Filter = left Filter *AND NOT *right Filter.
> > > > >
> > > > > For UNION ALL:
> > > > > Add prerequisites:
> > > > > left Filter, right Filter must have no overlap.
> > > > >
> > > > > For INTERSECT ALL:
> > > > > Add prerequisites:
> > > > > If left Filter, right Filter have no overlap, transform to empty
> > > values.
> > > > > Otherwise
> > > > > the rule can't be applied.
> > > > >
> > > > > For EXCEPT ALL:
> > > > > same for  EXCEPT.
> > > > >
> > > > >
> > > > > work for N-way Union, not just 2-way Union.
> > > > >>
> > > > > Yes, I will add tests for this.
> > > > >
> > > > > And I think you should make it work whether or not a Project is
> > > present.
> > > > >
> > > > > Ok, It seems I need construct several different operand match trees
> > for
> > > > > plan.
> > > > >
> > > > > Hi Julian, what do you think of the above process logic?
> > > > >
> > > > > Julian Hyde <jhyde.apache@gmail.com <mailto:jhyde.apache@gmail.com
> >>
> > > > 于2022年1月20日周四 10:18写道:
> > > > >
> > > > >> Justin,
> > > > >>
> > > > >> For planning table or index scans, I would recommend using a
> single
> > > > >> TableScan with a Filter that uses a Sarg, rather than using
> multiple
> > > > >> TableScans connected by a Union. So I think this rule will be
> > useful.
> > > > >>
> > > > >> But I do agree that this proposed rule is not a “no brainer”. It
> may
> > > not
> > > > >> do what people want/expect in all cases, and therefore it probably
> > > > should
> > > > >> not be enabled it by default.
> > > > >>
> > > > >> Julian
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <
> greenlion@gmail.com>
> > > > >> wrote:
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> Note that this will negate the optimization that one usually is
> > > looking
> > > > >> for
> > > > >>> when writing such queries:
> > > > >>>
> > > > >>> Select * from TAB where a = 1
> > > > >>> UNION ALL
> > > > >>> Select * from TAB where b = 1
> > > > >>>
> > > > >>> In a database with indexes (most databases) this will allow
> indexes
> > > to
> > > > be
> > > > >>> used on both the a column and the b column.
> > > > >>> Databases with bitmap indexes or without indexes would benefit
> from
> > > the
> > > > >>> rule.
> > > > >>>
> > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
> > jhyde.apache@gmail.com
> > > > >> <mailto:jhyde.apache@gmail.com <ma...@gmail.com>>>
> > > wrote:
> > > > >>>
> > > > >>>> Can you log a Jira case for this?
> > > > >>>>
> > > > >>>> I think you should make your rule work for N-way Union, not just
> > > 2-way
> > > > >>>> Union. And I think you should make it work whether or not a
> > Project
> > > is
> > > > >>>> present.
> > > > >>>>
> > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <
> jhyde.apache@gmail.com
> > > > <ma...@gmail.com>>
> > > > >> wrote:
> > > > >>>>>
> > > > >>>>> It sounds useful.
> > > > >>>>>
> > > > >>>>> What do you think the rule should be called?
> > > > UnionFilterTransposeRule,
> > > > >>>> perhaps?
> > > > >>>>>
> > > > >>>>> A challenge when writing the rule will be to ensure that all of
> > the
> > > > >>>> inputs to the Union are the same. The Volcano framework is not
> > very
> > > > >> good at
> > > > >>>> that.
> > > > >>>>>
> > > > >>>>> You should be careful of the case that the conditions overlap.
> > For
> > > > >>>> example, the rewrite
> > > > >>>>>
> > > > >>>>> SELECT * FROM Emp WHERE deptno < 30
> > > > >>>>> UNION ALL
> > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > > > >>>>>
> > > > >>>>> to
> > > > >>>>>
> > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> > > > >>>>>
> > > > >>>>> Is not valid, because rows with deptno = 25 will appear twice
> in
> > > the
> > > > >>>> first query, once in the second. Maybe that problem does not
> occur
> > > > when
> > > > >>>> applied to UNION than when applied to UNION ALL.
> > > > >>>>>
> > > > >>>>> There would seem to be analogous rules for INTERSECT (combine
> the
> > > > >>>> conditions using AND) and EXCEPT (combine the conditions using
> AND
> > > > NOT).
> > > > >>>> Perhaps one rule could cover all set operations (see
> > > > >>>> FilterSetOpTransposeRule).
> > > > >>>>>
> > > > >>>>> Julian
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> > > > zhuangzixiaoyou@gmail.com <ma...@gmail.com>
> > > > >>>> <mailto:zhuangzixiaoyou@gmail.com <mailto:
> > zhuangzixiaoyou@gmail.com
> > > >
> > > > <mailto:zhuangzixiaoyou@gmail.com <mailto:zhuangzixiaoyou@gmail.com
> > >>>>
> > > > >> wrote:
> > > > >>>>>>
> > > > >>>>>> A simple example is converting SELECT a, b FROM t WHERE c = 1
> > > UNION
> > > > >> ALL
> > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in
> > (1,
> > > 2)
> > > > >>>>>>
> > > > >>>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
> > > > zhuangzixiaoyou@gmail.com> <mailto:
> > > > >> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>
> > > <mailto:
> > > > >>>> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>
> > > > >> 于2022年1月19日周三 18:35写道:
> > > > >>>>>> Hi, community
> > > > >>>>>>
> > > > >>>>>> Here I recommend a new rule for converting UNION ALL sub plan
> > to a
> > > > >>>> single input with an OR filter, the following is its conversion
> > > > diagram.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> The conversion prerequisites are
> > > > >>>>>> 1. left filter range has no intersection with right filter
> > range.
> > > > >>>>>> 2. Project and Input Sub Tree must be identical.
> > > > >>>>>>
> > > > >>>>>> The rule will be used when Input Sub Tree is a
> > computing-intensive
> > > > or
> > > > >>>> large IO operation.
> > > > >>>>>>
> > > > >>>>>> I don't know whether the community supports it or not, any
> > > > suggestions
> > > > >>>> will be appreciated.
> > > >
> > > >
> > >
> >
> >
> > --
> > ~~~~~~~~~~~~~~~
> > no mistakes
> > ~~~~~~~~~~~~~~~~~~
> >
>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Yanjing Wang <zh...@gmail.com>.
Thanks Albert.

 you could probably refer to osdi paper "resin general sub query fusion"
> which has chapters talking about general union fusion that deals with the
> case Julian pointed out.
>

The optimization seems very helpful, but it introduces new operators and
the rule for new operators may be more complex.
Hi, Community, What do you think?

Btw, per my understanding cases of union fusion is not that many in
> practice.
>

In particular situations such as VDT and BI tools which generates sql
according to a fixed pattern we will meet many such union all queries.

Albert <zi...@gmail.com> 于2022年1月23日周日 11:22写道:

> On a more general fusion case, you could probably refer to osdi paper
> "resin general sub query fusion" which has chapters talking about general
> union fusion that deals with the case Julian pointed out.
>
> Btw, per my understanding cases of union fusion is not that many in
> practice. Curious to know the real cases
>
> 在 2022年1月21日星期五,Yanjing Wang <zh...@gmail.com> 写道:
>
> > Hi Julian, I have logged a jira CALCITE-4990
> > <https://issues.apache.org/jira/browse/CALCITE-4990> for this new rule,
> > later I will submit a pr for this.
> >
> > Julian Hyde <jh...@gmail.com> 于2022年1月21日周五 01:37写道:
> >
> > > Process logic sounds great. Please log a Jira case.
> > >
> > > There’s a lot here, so it’s probably wise to split into a simple PR
> that
> > > just does the simple case (UNION ALL, no Project) and extend later.
> > >
> > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <zhuangzixiaoyou@gmail.com
> >
> > > wrote:
> > > >
> > > > Thanks Julian and Justin.
> > > >
> > > > What do you think the rule should be called?
> UnionFilterTransposeRule,
> > > >> perhaps?
> > > >>
> > > > SetOpFilterMergeRule?
> > > >
> > > > Maybe that problem does not occur when applied to UNION than when
> > applied
> > > >> to UNION ALL.
> > > >
> > > > Yes, This is very important.
> > > >
> > > > There would seem to be analogous rules for INTERSECT (combine the
> > > >> conditions using AND) and EXCEPT (combine the conditions using AND
> > NOT).
> > > >>
> > > > Excellent extensions, all the three operators process logic are:
> > > >
> > > > For UNION:
> > > > New Filter = left Filter *OR* right Filter.
> > > >
> > > > For INTERSECT:
> > > > New Filter = left Filter *AND* right Filter.
> > > >
> > > > For EXCEPT:
> > > > If left Filter, right Filter have no overlap, transform to left child
> > > tree,
> > > > Otherwise
> > > > New Filter = left Filter *AND NOT *right Filter.
> > > >
> > > > For UNION ALL:
> > > > Add prerequisites:
> > > > left Filter, right Filter must have no overlap.
> > > >
> > > > For INTERSECT ALL:
> > > > Add prerequisites:
> > > > If left Filter, right Filter have no overlap, transform to empty
> > values.
> > > > Otherwise
> > > > the rule can't be applied.
> > > >
> > > > For EXCEPT ALL:
> > > > same for  EXCEPT.
> > > >
> > > >
> > > > work for N-way Union, not just 2-way Union.
> > > >>
> > > > Yes, I will add tests for this.
> > > >
> > > > And I think you should make it work whether or not a Project is
> > present.
> > > >
> > > > Ok, It seems I need construct several different operand match trees
> for
> > > > plan.
> > > >
> > > > Hi Julian, what do you think of the above process logic?
> > > >
> > > > Julian Hyde <jhyde.apache@gmail.com <ma...@gmail.com>>
> > > 于2022年1月20日周四 10:18写道:
> > > >
> > > >> Justin,
> > > >>
> > > >> For planning table or index scans, I would recommend using a single
> > > >> TableScan with a Filter that uses a Sarg, rather than using multiple
> > > >> TableScans connected by a Union. So I think this rule will be
> useful.
> > > >>
> > > >> But I do agree that this proposed rule is not a “no brainer”. It may
> > not
> > > >> do what people want/expect in all cases, and therefore it probably
> > > should
> > > >> not be enabled it by default.
> > > >>
> > > >> Julian
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <gr...@gmail.com>
> > > >> wrote:
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> Note that this will negate the optimization that one usually is
> > looking
> > > >> for
> > > >>> when writing such queries:
> > > >>>
> > > >>> Select * from TAB where a = 1
> > > >>> UNION ALL
> > > >>> Select * from TAB where b = 1
> > > >>>
> > > >>> In a database with indexes (most databases) this will allow indexes
> > to
> > > be
> > > >>> used on both the a column and the b column.
> > > >>> Databases with bitmap indexes or without indexes would benefit from
> > the
> > > >>> rule.
> > > >>>
> > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
> jhyde.apache@gmail.com
> > > >> <mailto:jhyde.apache@gmail.com <ma...@gmail.com>>>
> > wrote:
> > > >>>
> > > >>>> Can you log a Jira case for this?
> > > >>>>
> > > >>>> I think you should make your rule work for N-way Union, not just
> > 2-way
> > > >>>> Union. And I think you should make it work whether or not a
> Project
> > is
> > > >>>> present.
> > > >>>>
> > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <jhyde.apache@gmail.com
> > > <ma...@gmail.com>>
> > > >> wrote:
> > > >>>>>
> > > >>>>> It sounds useful.
> > > >>>>>
> > > >>>>> What do you think the rule should be called?
> > > UnionFilterTransposeRule,
> > > >>>> perhaps?
> > > >>>>>
> > > >>>>> A challenge when writing the rule will be to ensure that all of
> the
> > > >>>> inputs to the Union are the same. The Volcano framework is not
> very
> > > >> good at
> > > >>>> that.
> > > >>>>>
> > > >>>>> You should be careful of the case that the conditions overlap.
> For
> > > >>>> example, the rewrite
> > > >>>>>
> > > >>>>> SELECT * FROM Emp WHERE deptno < 30
> > > >>>>> UNION ALL
> > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > > >>>>>
> > > >>>>> to
> > > >>>>>
> > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> > > >>>>>
> > > >>>>> Is not valid, because rows with deptno = 25 will appear twice in
> > the
> > > >>>> first query, once in the second. Maybe that problem does not occur
> > > when
> > > >>>> applied to UNION than when applied to UNION ALL.
> > > >>>>>
> > > >>>>> There would seem to be analogous rules for INTERSECT (combine the
> > > >>>> conditions using AND) and EXCEPT (combine the conditions using AND
> > > NOT).
> > > >>>> Perhaps one rule could cover all set operations (see
> > > >>>> FilterSetOpTransposeRule).
> > > >>>>>
> > > >>>>> Julian
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> > > zhuangzixiaoyou@gmail.com <ma...@gmail.com>
> > > >>>> <mailto:zhuangzixiaoyou@gmail.com <mailto:
> zhuangzixiaoyou@gmail.com
> > >
> > > <mailto:zhuangzixiaoyou@gmail.com <mailto:zhuangzixiaoyou@gmail.com
> >>>>
> > > >> wrote:
> > > >>>>>>
> > > >>>>>> A simple example is converting SELECT a, b FROM t WHERE c = 1
> > UNION
> > > >> ALL
> > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in
> (1,
> > 2)
> > > >>>>>>
> > > >>>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
> > > zhuangzixiaoyou@gmail.com> <mailto:
> > > >> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>
> > <mailto:
> > > >>>> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>
> > > >> 于2022年1月19日周三 18:35写道:
> > > >>>>>> Hi, community
> > > >>>>>>
> > > >>>>>> Here I recommend a new rule for converting UNION ALL sub plan
> to a
> > > >>>> single input with an OR filter, the following is its conversion
> > > diagram.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> The conversion prerequisites are
> > > >>>>>> 1. left filter range has no intersection with right filter
> range.
> > > >>>>>> 2. Project and Input Sub Tree must be identical.
> > > >>>>>>
> > > >>>>>> The rule will be used when Input Sub Tree is a
> computing-intensive
> > > or
> > > >>>> large IO operation.
> > > >>>>>>
> > > >>>>>> I don't know whether the community supports it or not, any
> > > suggestions
> > > >>>> will be appreciated.
> > >
> > >
> >
>
>
> --
> ~~~~~~~~~~~~~~~
> no mistakes
> ~~~~~~~~~~~~~~~~~~
>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Albert <zi...@gmail.com>.
On a more general fusion case, you could probably refer to osdi paper
"resin general sub query fusion" which has chapters talking about general
union fusion that deals with the case Julian pointed out.

Btw, per my understanding cases of union fusion is not that many in
practice. Curious to know the real cases

在 2022年1月21日星期五,Yanjing Wang <zh...@gmail.com> 写道:

> Hi Julian, I have logged a jira CALCITE-4990
> <https://issues.apache.org/jira/browse/CALCITE-4990> for this new rule,
> later I will submit a pr for this.
>
> Julian Hyde <jh...@gmail.com> 于2022年1月21日周五 01:37写道:
>
> > Process logic sounds great. Please log a Jira case.
> >
> > There’s a lot here, so it’s probably wise to split into a simple PR that
> > just does the simple case (UNION ALL, no Project) and extend later.
> >
> > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <zh...@gmail.com>
> > wrote:
> > >
> > > Thanks Julian and Justin.
> > >
> > > What do you think the rule should be called? UnionFilterTransposeRule,
> > >> perhaps?
> > >>
> > > SetOpFilterMergeRule?
> > >
> > > Maybe that problem does not occur when applied to UNION than when
> applied
> > >> to UNION ALL.
> > >
> > > Yes, This is very important.
> > >
> > > There would seem to be analogous rules for INTERSECT (combine the
> > >> conditions using AND) and EXCEPT (combine the conditions using AND
> NOT).
> > >>
> > > Excellent extensions, all the three operators process logic are:
> > >
> > > For UNION:
> > > New Filter = left Filter *OR* right Filter.
> > >
> > > For INTERSECT:
> > > New Filter = left Filter *AND* right Filter.
> > >
> > > For EXCEPT:
> > > If left Filter, right Filter have no overlap, transform to left child
> > tree,
> > > Otherwise
> > > New Filter = left Filter *AND NOT *right Filter.
> > >
> > > For UNION ALL:
> > > Add prerequisites:
> > > left Filter, right Filter must have no overlap.
> > >
> > > For INTERSECT ALL:
> > > Add prerequisites:
> > > If left Filter, right Filter have no overlap, transform to empty
> values.
> > > Otherwise
> > > the rule can't be applied.
> > >
> > > For EXCEPT ALL:
> > > same for  EXCEPT.
> > >
> > >
> > > work for N-way Union, not just 2-way Union.
> > >>
> > > Yes, I will add tests for this.
> > >
> > > And I think you should make it work whether or not a Project is
> present.
> > >
> > > Ok, It seems I need construct several different operand match trees for
> > > plan.
> > >
> > > Hi Julian, what do you think of the above process logic?
> > >
> > > Julian Hyde <jhyde.apache@gmail.com <ma...@gmail.com>>
> > 于2022年1月20日周四 10:18写道:
> > >
> > >> Justin,
> > >>
> > >> For planning table or index scans, I would recommend using a single
> > >> TableScan with a Filter that uses a Sarg, rather than using multiple
> > >> TableScans connected by a Union. So I think this rule will be useful.
> > >>
> > >> But I do agree that this proposed rule is not a “no brainer”. It may
> not
> > >> do what people want/expect in all cases, and therefore it probably
> > should
> > >> not be enabled it by default.
> > >>
> > >> Julian
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <gr...@gmail.com>
> > >> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> Note that this will negate the optimization that one usually is
> looking
> > >> for
> > >>> when writing such queries:
> > >>>
> > >>> Select * from TAB where a = 1
> > >>> UNION ALL
> > >>> Select * from TAB where b = 1
> > >>>
> > >>> In a database with indexes (most databases) this will allow indexes
> to
> > be
> > >>> used on both the a column and the b column.
> > >>> Databases with bitmap indexes or without indexes would benefit from
> the
> > >>> rule.
> > >>>
> > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <jhyde.apache@gmail.com
> > >> <mailto:jhyde.apache@gmail.com <ma...@gmail.com>>>
> wrote:
> > >>>
> > >>>> Can you log a Jira case for this?
> > >>>>
> > >>>> I think you should make your rule work for N-way Union, not just
> 2-way
> > >>>> Union. And I think you should make it work whether or not a Project
> is
> > >>>> present.
> > >>>>
> > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <jhyde.apache@gmail.com
> > <ma...@gmail.com>>
> > >> wrote:
> > >>>>>
> > >>>>> It sounds useful.
> > >>>>>
> > >>>>> What do you think the rule should be called?
> > UnionFilterTransposeRule,
> > >>>> perhaps?
> > >>>>>
> > >>>>> A challenge when writing the rule will be to ensure that all of the
> > >>>> inputs to the Union are the same. The Volcano framework is not very
> > >> good at
> > >>>> that.
> > >>>>>
> > >>>>> You should be careful of the case that the conditions overlap. For
> > >>>> example, the rewrite
> > >>>>>
> > >>>>> SELECT * FROM Emp WHERE deptno < 30
> > >>>>> UNION ALL
> > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > >>>>>
> > >>>>> to
> > >>>>>
> > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> > >>>>>
> > >>>>> Is not valid, because rows with deptno = 25 will appear twice in
> the
> > >>>> first query, once in the second. Maybe that problem does not occur
> > when
> > >>>> applied to UNION than when applied to UNION ALL.
> > >>>>>
> > >>>>> There would seem to be analogous rules for INTERSECT (combine the
> > >>>> conditions using AND) and EXCEPT (combine the conditions using AND
> > NOT).
> > >>>> Perhaps one rule could cover all set operations (see
> > >>>> FilterSetOpTransposeRule).
> > >>>>>
> > >>>>> Julian
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> > zhuangzixiaoyou@gmail.com <ma...@gmail.com>
> > >>>> <mailto:zhuangzixiaoyou@gmail.com <mailto:zhuangzixiaoyou@gmail.com
> >
> > <mailto:zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>>
> > >> wrote:
> > >>>>>>
> > >>>>>> A simple example is converting SELECT a, b FROM t WHERE c = 1
> UNION
> > >> ALL
> > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1,
> 2)
> > >>>>>>
> > >>>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
> > zhuangzixiaoyou@gmail.com> <mailto:
> > >> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>
> <mailto:
> > >>>> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>
> > >> 于2022年1月19日周三 18:35写道:
> > >>>>>> Hi, community
> > >>>>>>
> > >>>>>> Here I recommend a new rule for converting UNION ALL sub plan to a
> > >>>> single input with an OR filter, the following is its conversion
> > diagram.
> > >>>>>>
> > >>>>>>
> > >>>>>> The conversion prerequisites are
> > >>>>>> 1. left filter range has no intersection with right filter range.
> > >>>>>> 2. Project and Input Sub Tree must be identical.
> > >>>>>>
> > >>>>>> The rule will be used when Input Sub Tree is a computing-intensive
> > or
> > >>>> large IO operation.
> > >>>>>>
> > >>>>>> I don't know whether the community supports it or not, any
> > suggestions
> > >>>> will be appreciated.
> >
> >
>


-- 
~~~~~~~~~~~~~~~
no mistakes
~~~~~~~~~~~~~~~~~~

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Yanjing Wang <zh...@gmail.com>.
Hi Julian, I have logged a jira CALCITE-4990
<https://issues.apache.org/jira/browse/CALCITE-4990> for this new rule,
later I will submit a pr for this.

Julian Hyde <jh...@gmail.com> 于2022年1月21日周五 01:37写道:

> Process logic sounds great. Please log a Jira case.
>
> There’s a lot here, so it’s probably wise to split into a simple PR that
> just does the simple case (UNION ALL, no Project) and extend later.
>
> > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <zh...@gmail.com>
> wrote:
> >
> > Thanks Julian and Justin.
> >
> > What do you think the rule should be called? UnionFilterTransposeRule,
> >> perhaps?
> >>
> > SetOpFilterMergeRule?
> >
> > Maybe that problem does not occur when applied to UNION than when applied
> >> to UNION ALL.
> >
> > Yes, This is very important.
> >
> > There would seem to be analogous rules for INTERSECT (combine the
> >> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
> >>
> > Excellent extensions, all the three operators process logic are:
> >
> > For UNION:
> > New Filter = left Filter *OR* right Filter.
> >
> > For INTERSECT:
> > New Filter = left Filter *AND* right Filter.
> >
> > For EXCEPT:
> > If left Filter, right Filter have no overlap, transform to left child
> tree,
> > Otherwise
> > New Filter = left Filter *AND NOT *right Filter.
> >
> > For UNION ALL:
> > Add prerequisites:
> > left Filter, right Filter must have no overlap.
> >
> > For INTERSECT ALL:
> > Add prerequisites:
> > If left Filter, right Filter have no overlap, transform to empty values.
> > Otherwise
> > the rule can't be applied.
> >
> > For EXCEPT ALL:
> > same for  EXCEPT.
> >
> >
> > work for N-way Union, not just 2-way Union.
> >>
> > Yes, I will add tests for this.
> >
> > And I think you should make it work whether or not a Project is present.
> >
> > Ok, It seems I need construct several different operand match trees for
> > plan.
> >
> > Hi Julian, what do you think of the above process logic?
> >
> > Julian Hyde <jhyde.apache@gmail.com <ma...@gmail.com>>
> 于2022年1月20日周四 10:18写道:
> >
> >> Justin,
> >>
> >> For planning table or index scans, I would recommend using a single
> >> TableScan with a Filter that uses a Sarg, rather than using multiple
> >> TableScans connected by a Union. So I think this rule will be useful.
> >>
> >> But I do agree that this proposed rule is not a “no brainer”. It may not
> >> do what people want/expect in all cases, and therefore it probably
> should
> >> not be enabled it by default.
> >>
> >> Julian
> >>
> >>
> >>
> >>
> >>
> >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <gr...@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Note that this will negate the optimization that one usually is looking
> >> for
> >>> when writing such queries:
> >>>
> >>> Select * from TAB where a = 1
> >>> UNION ALL
> >>> Select * from TAB where b = 1
> >>>
> >>> In a database with indexes (most databases) this will allow indexes to
> be
> >>> used on both the a column and the b column.
> >>> Databases with bitmap indexes or without indexes would benefit from the
> >>> rule.
> >>>
> >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <jhyde.apache@gmail.com
> >> <mailto:jhyde.apache@gmail.com <ma...@gmail.com>>> wrote:
> >>>
> >>>> Can you log a Jira case for this?
> >>>>
> >>>> I think you should make your rule work for N-way Union, not just 2-way
> >>>> Union. And I think you should make it work whether or not a Project is
> >>>> present.
> >>>>
> >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <jhyde.apache@gmail.com
> <ma...@gmail.com>>
> >> wrote:
> >>>>>
> >>>>> It sounds useful.
> >>>>>
> >>>>> What do you think the rule should be called?
> UnionFilterTransposeRule,
> >>>> perhaps?
> >>>>>
> >>>>> A challenge when writing the rule will be to ensure that all of the
> >>>> inputs to the Union are the same. The Volcano framework is not very
> >> good at
> >>>> that.
> >>>>>
> >>>>> You should be careful of the case that the conditions overlap. For
> >>>> example, the rewrite
> >>>>>
> >>>>> SELECT * FROM Emp WHERE deptno < 30
> >>>>> UNION ALL
> >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> >>>>>
> >>>>> to
> >>>>>
> >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> >>>>>
> >>>>> Is not valid, because rows with deptno = 25 will appear twice in the
> >>>> first query, once in the second. Maybe that problem does not occur
> when
> >>>> applied to UNION than when applied to UNION ALL.
> >>>>>
> >>>>> There would seem to be analogous rules for INTERSECT (combine the
> >>>> conditions using AND) and EXCEPT (combine the conditions using AND
> NOT).
> >>>> Perhaps one rule could cover all set operations (see
> >>>> FilterSetOpTransposeRule).
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> zhuangzixiaoyou@gmail.com <ma...@gmail.com>
> >>>> <mailto:zhuangzixiaoyou@gmail.com <ma...@gmail.com>
> <mailto:zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>>
> >> wrote:
> >>>>>>
> >>>>>> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION
> >> ALL
> >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
> >>>>>>
> >>>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
> zhuangzixiaoyou@gmail.com> <mailto:
> >> zhuangzixiaoyou@gmail.com <ma...@gmail.com>> <mailto:
> >>>> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>
> >> 于2022年1月19日周三 18:35写道:
> >>>>>> Hi, community
> >>>>>>
> >>>>>> Here I recommend a new rule for converting UNION ALL sub plan to a
> >>>> single input with an OR filter, the following is its conversion
> diagram.
> >>>>>>
> >>>>>>
> >>>>>> The conversion prerequisites are
> >>>>>> 1. left filter range has no intersection with right filter range.
> >>>>>> 2. Project and Input Sub Tree must be identical.
> >>>>>>
> >>>>>> The rule will be used when Input Sub Tree is a computing-intensive
> or
> >>>> large IO operation.
> >>>>>>
> >>>>>> I don't know whether the community supports it or not, any
> suggestions
> >>>> will be appreciated.
>
>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Julian Hyde <jh...@gmail.com>.
Process logic sounds great. Please log a Jira case.

There’s a lot here, so it’s probably wise to split into a simple PR that just does the simple case (UNION ALL, no Project) and extend later.

> On Jan 20, 2022, at 7:07 AM, Yanjing Wang <zh...@gmail.com> wrote:
> 
> Thanks Julian and Justin.
> 
> What do you think the rule should be called? UnionFilterTransposeRule,
>> perhaps?
>> 
> SetOpFilterMergeRule?
> 
> Maybe that problem does not occur when applied to UNION than when applied
>> to UNION ALL.
> 
> Yes, This is very important.
> 
> There would seem to be analogous rules for INTERSECT (combine the
>> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
>> 
> Excellent extensions, all the three operators process logic are:
> 
> For UNION:
> New Filter = left Filter *OR* right Filter.
> 
> For INTERSECT:
> New Filter = left Filter *AND* right Filter.
> 
> For EXCEPT:
> If left Filter, right Filter have no overlap, transform to left child tree,
> Otherwise
> New Filter = left Filter *AND NOT *right Filter.
> 
> For UNION ALL:
> Add prerequisites:
> left Filter, right Filter must have no overlap.
> 
> For INTERSECT ALL:
> Add prerequisites:
> If left Filter, right Filter have no overlap, transform to empty values.
> Otherwise
> the rule can't be applied.
> 
> For EXCEPT ALL:
> same for  EXCEPT.
> 
> 
> work for N-way Union, not just 2-way Union.
>> 
> Yes, I will add tests for this.
> 
> And I think you should make it work whether or not a Project is present.
> 
> Ok, It seems I need construct several different operand match trees for
> plan.
> 
> Hi Julian, what do you think of the above process logic?
> 
> Julian Hyde <jhyde.apache@gmail.com <ma...@gmail.com>> 于2022年1月20日周四 10:18写道:
> 
>> Justin,
>> 
>> For planning table or index scans, I would recommend using a single
>> TableScan with a Filter that uses a Sarg, rather than using multiple
>> TableScans connected by a Union. So I think this rule will be useful.
>> 
>> But I do agree that this proposed rule is not a “no brainer”. It may not
>> do what people want/expect in all cases, and therefore it probably should
>> not be enabled it by default.
>> 
>> Julian
>> 
>> 
>> 
>> 
>> 
>>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <gr...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Note that this will negate the optimization that one usually is looking
>> for
>>> when writing such queries:
>>> 
>>> Select * from TAB where a = 1
>>> UNION ALL
>>> Select * from TAB where b = 1
>>> 
>>> In a database with indexes (most databases) this will allow indexes to be
>>> used on both the a column and the b column.
>>> Databases with bitmap indexes or without indexes would benefit from the
>>> rule.
>>> 
>>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <jhyde.apache@gmail.com
>> <mailto:jhyde.apache@gmail.com <ma...@gmail.com>>> wrote:
>>> 
>>>> Can you log a Jira case for this?
>>>> 
>>>> I think you should make your rule work for N-way Union, not just 2-way
>>>> Union. And I think you should make it work whether or not a Project is
>>>> present.
>>>> 
>>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <jhyde.apache@gmail.com <ma...@gmail.com>>
>> wrote:
>>>>> 
>>>>> It sounds useful.
>>>>> 
>>>>> What do you think the rule should be called? UnionFilterTransposeRule,
>>>> perhaps?
>>>>> 
>>>>> A challenge when writing the rule will be to ensure that all of the
>>>> inputs to the Union are the same. The Volcano framework is not very
>> good at
>>>> that.
>>>>> 
>>>>> You should be careful of the case that the conditions overlap. For
>>>> example, the rewrite
>>>>> 
>>>>> SELECT * FROM Emp WHERE deptno < 30
>>>>> UNION ALL
>>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
>>>>> 
>>>>> to
>>>>> 
>>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
>>>>> 
>>>>> Is not valid, because rows with deptno = 25 will appear twice in the
>>>> first query, once in the second. Maybe that problem does not occur when
>>>> applied to UNION than when applied to UNION ALL.
>>>>> 
>>>>> There would seem to be analogous rules for INTERSECT (combine the
>>>> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
>>>> Perhaps one rule could cover all set operations (see
>>>> FilterSetOpTransposeRule).
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <zhuangzixiaoyou@gmail.com <ma...@gmail.com>
>>>> <mailto:zhuangzixiaoyou@gmail.com <ma...@gmail.com> <mailto:zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>>
>> wrote:
>>>>>> 
>>>>>> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION
>> ALL
>>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
>>>>>> 
>>>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <ma...@gmail.com> <mailto:
>> zhuangzixiaoyou@gmail.com <ma...@gmail.com>> <mailto:
>>>> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>
>> 于2022年1月19日周三 18:35写道:
>>>>>> Hi, community
>>>>>> 
>>>>>> Here I recommend a new rule for converting UNION ALL sub plan to a
>>>> single input with an OR filter, the following is its conversion diagram.
>>>>>> 
>>>>>> 
>>>>>> The conversion prerequisites are
>>>>>> 1. left filter range has no intersection with right filter range.
>>>>>> 2. Project and Input Sub Tree must be identical.
>>>>>> 
>>>>>> The rule will be used when Input Sub Tree is a computing-intensive or
>>>> large IO operation.
>>>>>> 
>>>>>> I don't know whether the community supports it or not, any suggestions
>>>> will be appreciated.


Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Yanjing Wang <zh...@gmail.com>.
Thanks Julian and Justin.

What do you think the rule should be called? UnionFilterTransposeRule,
> perhaps?
>
SetOpFilterMergeRule?

 Maybe that problem does not occur when applied to UNION than when applied
> to UNION ALL.

Yes, This is very important.

There would seem to be analogous rules for INTERSECT (combine the
> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
>
Excellent extensions, all the three operators process logic are:

For UNION:
New Filter = left Filter *OR* right Filter.

For INTERSECT:
New Filter = left Filter *AND* right Filter.

For EXCEPT:
If left Filter, right Filter have no overlap, transform to left child tree,
Otherwise
New Filter = left Filter *AND NOT *right Filter.

For UNION ALL:
Add prerequisites:
left Filter, right Filter must have no overlap.

For INTERSECT ALL:
Add prerequisites:
If left Filter, right Filter have no overlap, transform to empty values.
Otherwise
the rule can't be applied.

For EXCEPT ALL:
same for  EXCEPT.


work for N-way Union, not just 2-way Union.
>
Yes, I will add tests for this.

 And I think you should make it work whether or not a Project is present.

Ok, It seems I need construct several different operand match trees for
plan.

Hi Julian, what do you think of the above process logic?

Julian Hyde <jh...@gmail.com> 于2022年1月20日周四 10:18写道:

> Justin,
>
> For planning table or index scans, I would recommend using a single
> TableScan with a Filter that uses a Sarg, rather than using multiple
> TableScans connected by a Union. So I think this rule will be useful.
>
> But I do agree that this proposed rule is not a “no brainer”. It may not
> do what people want/expect in all cases, and therefore it probably should
> not be enabled it by default.
>
> Julian
>
>
>
>
>
> > On Jan 19, 2022, at 3:38 PM, Justin Swanhart <gr...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > Note that this will negate the optimization that one usually is looking
> for
> > when writing such queries:
> >
> > Select * from TAB where a = 1
> > UNION ALL
> > Select * from TAB where b = 1
> >
> > In a database with indexes (most databases) this will allow indexes to be
> > used on both the a column and the b column.
> > Databases with bitmap indexes or without indexes would benefit from the
> > rule.
> >
> > On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <jhyde.apache@gmail.com
> <ma...@gmail.com>> wrote:
> >
> >> Can you log a Jira case for this?
> >>
> >> I think you should make your rule work for N-way Union, not just 2-way
> >> Union. And I think you should make it work whether or not a Project is
> >> present.
> >>
> >>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <jh...@gmail.com>
> wrote:
> >>>
> >>> It sounds useful.
> >>>
> >>> What do you think the rule should be called? UnionFilterTransposeRule,
> >> perhaps?
> >>>
> >>> A challenge when writing the rule will be to ensure that all of the
> >> inputs to the Union are the same. The Volcano framework is not very
> good at
> >> that.
> >>>
> >>> You should be careful of the case that the conditions overlap. For
> >> example, the rewrite
> >>>
> >>>  SELECT * FROM Emp WHERE deptno < 30
> >>>  UNION ALL
> >>>  SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> >>>
> >>> to
> >>>
> >>>  SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> >>>
> >>> Is not valid, because rows with deptno = 25 will appear twice in the
> >> first query, once in the second. Maybe that problem does not occur when
> >> applied to UNION than when applied to UNION ALL.
> >>>
> >>> There would seem to be analogous rules for INTERSECT (combine the
> >> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
> >> Perhaps one rule could cover all set operations (see
> >> FilterSetOpTransposeRule).
> >>>
> >>> Julian
> >>>
> >>>
> >>>
> >>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <zhuangzixiaoyou@gmail.com
> >> <mailto:zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>
> wrote:
> >>>>
> >>>> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION
> ALL
> >> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
> >>>>
> >>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
> zhuangzixiaoyou@gmail.com> <mailto:
> >> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>>
> 于2022年1月19日周三 18:35写道:
> >>>> Hi, community
> >>>>
> >>>> Here I recommend a new rule for converting UNION ALL sub plan to a
> >> single input with an OR filter, the following is its conversion diagram.
> >>>>
> >>>>
> >>>> The conversion prerequisites are
> >>>> 1. left filter range has no intersection with right filter range.
> >>>> 2. Project and Input Sub Tree must be identical.
> >>>>
> >>>> The rule will be used when Input Sub Tree is a computing-intensive or
> >> large IO operation.
> >>>>
> >>>> I don't know whether the community supports it or not, any suggestions
> >> will be appreciated.
>
>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Julian Hyde <jh...@gmail.com>.
Justin,

For planning table or index scans, I would recommend using a single TableScan with a Filter that uses a Sarg, rather than using multiple TableScans connected by a Union. So I think this rule will be useful.

But I do agree that this proposed rule is not a “no brainer”. It may not do what people want/expect in all cases, and therefore it probably should not be enabled it by default.

Julian





> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <gr...@gmail.com> wrote:
> 
> Hi,
> 
> Note that this will negate the optimization that one usually is looking for
> when writing such queries:
> 
> Select * from TAB where a = 1
> UNION ALL
> Select * from TAB where b = 1
> 
> In a database with indexes (most databases) this will allow indexes to be
> used on both the a column and the b column.
> Databases with bitmap indexes or without indexes would benefit from the
> rule.
> 
> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <jhyde.apache@gmail.com <ma...@gmail.com>> wrote:
> 
>> Can you log a Jira case for this?
>> 
>> I think you should make your rule work for N-way Union, not just 2-way
>> Union. And I think you should make it work whether or not a Project is
>> present.
>> 
>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <jh...@gmail.com> wrote:
>>> 
>>> It sounds useful.
>>> 
>>> What do you think the rule should be called? UnionFilterTransposeRule,
>> perhaps?
>>> 
>>> A challenge when writing the rule will be to ensure that all of the
>> inputs to the Union are the same. The Volcano framework is not very good at
>> that.
>>> 
>>> You should be careful of the case that the conditions overlap. For
>> example, the rewrite
>>> 
>>>  SELECT * FROM Emp WHERE deptno < 30
>>>  UNION ALL
>>>  SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
>>> 
>>> to
>>> 
>>>  SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
>>> 
>>> Is not valid, because rows with deptno = 25 will appear twice in the
>> first query, once in the second. Maybe that problem does not occur when
>> applied to UNION than when applied to UNION ALL.
>>> 
>>> There would seem to be analogous rules for INTERSECT (combine the
>> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
>> Perhaps one rule could cover all set operations (see
>> FilterSetOpTransposeRule).
>>> 
>>> Julian
>>> 
>>> 
>>> 
>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <zhuangzixiaoyou@gmail.com
>> <mailto:zhuangzixiaoyou@gmail.com <ma...@gmail.com>>> wrote:
>>>> 
>>>> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION ALL
>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
>>>> 
>>>> Yanjing Wang <zhuangzixiaoyou@gmail.com <ma...@gmail.com> <mailto:
>> zhuangzixiaoyou@gmail.com <ma...@gmail.com>>> 于2022年1月19日周三 18:35写道:
>>>> Hi, community
>>>> 
>>>> Here I recommend a new rule for converting UNION ALL sub plan to a
>> single input with an OR filter, the following is its conversion diagram.
>>>> 
>>>> 
>>>> The conversion prerequisites are
>>>> 1. left filter range has no intersection with right filter range.
>>>> 2. Project and Input Sub Tree must be identical.
>>>> 
>>>> The rule will be used when Input Sub Tree is a computing-intensive or
>> large IO operation.
>>>> 
>>>> I don't know whether the community supports it or not, any suggestions
>> will be appreciated.


Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Justin Swanhart <gr...@gmail.com>.
Hi,

Note that this will negate the optimization that one usually is looking for
when writing such queries:

Select * from TAB where a = 1
UNION ALL
Select * from TAB where b = 1

In a database with indexes (most databases) this will allow indexes to be
used on both the a column and the b column.
Databases with bitmap indexes or without indexes would benefit from the
rule.

On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <jh...@gmail.com> wrote:

> Can you log a Jira case for this?
>
> I think you should make your rule work for N-way Union, not just 2-way
> Union. And I think you should make it work whether or not a Project is
> present.
>
> > On Jan 19, 2022, at 1:25 PM, Julian Hyde <jh...@gmail.com> wrote:
> >
> > It sounds useful.
> >
> > What do you think the rule should be called? UnionFilterTransposeRule,
> perhaps?
> >
> > A challenge when writing the rule will be to ensure that all of the
> inputs to the Union are the same. The Volcano framework is not very good at
> that.
> >
> > You should be careful of the case that the conditions overlap. For
> example, the rewrite
> >
> >   SELECT * FROM Emp WHERE deptno < 30
> >   UNION ALL
> >   SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> >
> > to
> >
> >   SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> >
> > Is not valid, because rows with deptno = 25 will appear twice in the
> first query, once in the second. Maybe that problem does not occur when
> applied to UNION than when applied to UNION ALL.
> >
> > There would seem to be analogous rules for INTERSECT (combine the
> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
> Perhaps one rule could cover all set operations (see
> FilterSetOpTransposeRule).
> >
> > Julian
> >
> >
> >
> >> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <zhuangzixiaoyou@gmail.com
> <ma...@gmail.com>> wrote:
> >>
> >> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION ALL
> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
> >>
> >> Yanjing Wang <zhuangzixiaoyou@gmail.com <mailto:
> zhuangzixiaoyou@gmail.com>> 于2022年1月19日周三 18:35写道:
> >> Hi, community
> >>
> >> Here I recommend a new rule for converting UNION ALL sub plan to a
> single input with an OR filter, the following is its conversion diagram.
> >>
> >>
> >> The conversion prerequisites are
> >> 1. left filter range has no intersection with right filter range.
> >> 2. Project and Input Sub Tree must be identical.
> >>
> >> The rule will be used when Input Sub Tree is a computing-intensive or
> large IO operation.
> >>
> >> I don't know whether the community supports it or not, any suggestions
> will be appreciated.
> >>
> >>
> >
>
>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Julian Hyde <jh...@gmail.com>.
Can you log a Jira case for this?

I think you should make your rule work for N-way Union, not just 2-way Union. And I think you should make it work whether or not a Project is present.

> On Jan 19, 2022, at 1:25 PM, Julian Hyde <jh...@gmail.com> wrote:
> 
> It sounds useful.
> 
> What do you think the rule should be called? UnionFilterTransposeRule, perhaps?
> 
> A challenge when writing the rule will be to ensure that all of the inputs to the Union are the same. The Volcano framework is not very good at that.
> 
> You should be careful of the case that the conditions overlap. For example, the rewrite
> 
>   SELECT * FROM Emp WHERE deptno < 30
>   UNION ALL
>   SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> 
> to
> 
>   SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45) 
> 
> Is not valid, because rows with deptno = 25 will appear twice in the first query, once in the second. Maybe that problem does not occur when applied to UNION than when applied to UNION ALL.
> 
> There would seem to be analogous rules for INTERSECT (combine the conditions using AND) and EXCEPT (combine the conditions using AND NOT). Perhaps one rule could cover all set operations (see FilterSetOpTransposeRule).
> 
> Julian
> 
> 
> 
>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <zhuangzixiaoyou@gmail.com <ma...@gmail.com>> wrote:
>> 
>> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION ALL SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
>> 
>> Yanjing Wang <zhuangzixiaoyou@gmail.com <ma...@gmail.com>> 于2022年1月19日周三 18:35写道:
>> Hi, community
>> 
>> Here I recommend a new rule for converting UNION ALL sub plan to a single input with an OR filter, the following is its conversion diagram.
>> 
>>  
>> The conversion prerequisites are 
>> 1. left filter range has no intersection with right filter range.
>> 2. Project and Input Sub Tree must be identical.
>> 
>> The rule will be used when Input Sub Tree is a computing-intensive or large IO operation.
>> 
>> I don't know whether the community supports it or not, any suggestions will be appreciated.
>> 
>> 
> 


Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Julian Hyde <jh...@gmail.com>.
It sounds useful.

What do you think the rule should be called? UnionFilterTransposeRule, perhaps?

A challenge when writing the rule will be to ensure that all of the inputs to the Union are the same. The Volcano framework is not very good at that.

You should be careful of the case that the conditions overlap. For example, the rewrite

  SELECT * FROM Emp WHERE deptno < 30
  UNION ALL
  SELECT * FROM Emp WHERE deptno IN (25, 35, 45)

to

  SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45) 

Is not valid, because rows with deptno = 25 will appear twice in the first query, once in the second. Maybe that problem does not occur when applied to UNION than when applied to UNION ALL.

There would seem to be analogous rules for INTERSECT (combine the conditions using AND) and EXCEPT (combine the conditions using AND NOT). Perhaps one rule could cover all set operations (see FilterSetOpTransposeRule).

Julian



> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <zh...@gmail.com> wrote:
> 
> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION ALL SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
> 
> Yanjing Wang <zhuangzixiaoyou@gmail.com <ma...@gmail.com>> 于2022年1月19日周三 18:35写道:
> Hi, community
> 
> Here I recommend a new rule for converting UNION ALL sub plan to a single input with an OR filter, the following is its conversion diagram.
> 
>  
> The conversion prerequisites are 
> 1. left filter range has no intersection with right filter range.
> 2. Project and Input Sub Tree must be identical.
> 
> The rule will be used when Input Sub Tree is a computing-intensive or large IO operation.
> 
> I don't know whether the community supports it or not, any suggestions will be appreciated.
> 
> 


Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

Posted by Yanjing Wang <zh...@gmail.com>.
A simple example is converting *SELECT a, b FROM t WHERE c = 1 UNION
ALL SELECT a, b FROM t WHERE c = 2 *to *SELECT a, b FROM t WHERE c in (1,
2)*

Yanjing Wang <zh...@gmail.com> 于2022年1月19日周三 18:35写道:

> Hi, community
>
> Here I recommend a new rule for converting UNION ALL sub plan to a single
> input with an OR filter, the following is its conversion diagram.
> [image: UnionAllToOrRule.jpg]
>
> The conversion prerequisites are
> 1. left filter range has no intersection with right filter range.
> 2. Project and Input Sub Tree must be identical.
>
> The rule will be used when Input Sub Tree is a computing-intensive or
> large IO operation.
>
> I don't know whether the community supports it or not, any suggestions
> will be appreciated.
>
>
>