You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Roman Kondakov <ko...@mail.ru.INVALID> on 2019/12/10 12:55:52 UTC

Adding support for Ignite secondary indexes to Apache Calcite planner

Hi all!

As you may know there is an activity on integration of Apache Calcite
query optimizer into Ignite codebase is being carried out [1],[2].

One of a bunch of problems in this integration is the absence of
out-of-the-box support for secondary indexes in Apache Calcite. After
some research I came to conclusion that this problem has a couple of
workarounds. Let's name them
1. Phoenix-style approach - representing secondary indexes as
materialized views which are natively supported by Calcite engine [3]
2. Drill-style approach - pushing filters into the table scans and
choose appropriate index for lookups when possible [4]

Both these approaches have advantages and disadvantages:

Phoenix style pros:
- natural way of adding indexes as an alternative source of rows: index
can be considered as a kind of sorted materialized view.
- possibility of using index sortedness for stream aggregates,
deduplication (DISTINCT operator), merge joins, etc.
- ability to support other types of indexes (i.e. functional indexes).

Phoenix style cons:
- polluting optimizer's search space extra table scans hence increasing
the planning time.

Drill style pros:
- easier to implement (although it's questionable).
- search space is not inflated.

Drill style cons:
- missed opportunity to exploit sortedness.

There is a good discussion about using both approaches can be found in [5].

I made a small sketch [6] in order to demonstrate the applicability of
the Phoenix approach to Ignite. Key design concepts are:
1. On creating indexes are registered as tables in Calcite schema. This
step is needed for internal Calcite's routines.
2. On planner initialization we register these indexes as materialized
views in Calcite's optimizer using VolcanoPlanner#addMaterialization method.
3. Right before the query execution Calcite selects all materialized
views (indexes) which can be potentially used in query.
4. During the query optimization indexes are registered by planner as
usual TableScans and hence can be chosen by optimizer if they have lower
cost.

This sketch shows the ability to exploit index sortedness only. So the
future work in this direction should be focused on using indexes for
fast index lookups. At first glance FilterableTable and
FilterTableScanRule are good points to start. We can push Filter into
the TableScan and then use FilterableTable for fast index lookups
avoiding reading the whole index on TableScan step and then filtering
its output on the Filter step.

What do you think?

[1]
http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
[2]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
[3] https://issues.apache.org/jira/browse/PHOENIX-2047
[4] https://issues.apache.org/jira/browse/DRILL-6381
[5] https://issues.apache.org/jira/browse/DRILL-3929
[6] https://github.com/apache/ignite/pull/7115

--
Kind Regards
Roman Kondakov

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Alexey,

from my point of view Drill's approach looks like somewhat a hack:
sortedness and index lookups added to a removed from the query plan by
the special rules (which look very messy and complicated). Compare it to
the Phoneix approach where index is added to optimizer as a sorted view
of a table.


-- 
Kind Regards
Roman Kondakov


On 10.12.2019 17:44, Alexey Zinoviev wrote:
> I'd like Drill approach, worked and debugged with something similar, it's
> more easy to support
> 
> 
> Buuut, you have an implemented prototype (it votes for Phoenix in my mind)
> 
> вт, 10 дек. 2019 г. в 17:19, Vladimir Ozerov <pp...@gmail.com>:
> 
>> Hi Roman,
>>
>> Why do you think that Drill-style will not let you exploit collation?
>> Collation should be propagated from the index scan in the same way as in
>> other sorted operators, such as merge join or streaming aggregate. Provided
>> that you use converter-hack (or any alternative solution to trigger parent
>> re-analysis).
>> In other words, propagation of collation from Drill-style indexes should be
>> no different from other sorted operators.
>>
>> Regards,
>> Vladimir.
>>
>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>>> :
>>
>>>
>>> Roman just as fast remark, Phoenix builds their approach on
>>> already existing monolith HBase architecture, most cases it`s just a stub
>>> for someone who wants use secondary indexes with a base with no
>>> native support of it. Don`t think it`s good idea here.
>>>
>>>>
>>>>
>>>> ------- Forwarded message -------
>>>> From: "Roman Kondakov" < kondakov87@mail.ru.invalid >
>>>> To:  dev@ignite.apache.org
>>>> Cc:
>>>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
>>>> planner
>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>>
>>>> Hi all!
>>>>
>>>> As you may know there is an activity on integration of Apache Calcite
>>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>>
>>>> One of a bunch of problems in this integration is the absence of
>>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>>> some research I came to conclusion that this problem has a couple of
>>>> workarounds. Let's name them
>>>> 1. Phoenix-style approach - representing secondary indexes as
>>>> materialized views which are natively supported by Calcite engine [3]
>>>> 2. Drill-style approach - pushing filters into the table scans and
>>>> choose appropriate index for lookups when possible [4]
>>>>
>>>> Both these approaches have advantages and disadvantages:
>>>>
>>>> Phoenix style pros:
>>>> - natural way of adding indexes as an alternative source of rows: index
>>>> can be considered as a kind of sorted materialized view.
>>>> - possibility of using index sortedness for stream aggregates,
>>>> deduplication (DISTINCT operator), merge joins, etc.
>>>> - ability to support other types of indexes (i.e. functional indexes).
>>>>
>>>> Phoenix style cons:
>>>> - polluting optimizer's search space extra table scans hence increasing
>>>> the planning time.
>>>>
>>>> Drill style pros:
>>>> - easier to implement (although it's questionable).
>>>> - search space is not inflated.
>>>>
>>>> Drill style cons:
>>>> - missed opportunity to exploit sortedness.
>>>>
>>>> There is a good discussion about using both approaches can be found in
>>> [5].
>>>>
>>>> I made a small sketch [6] in order to demonstrate the applicability of
>>>> the Phoenix approach to Ignite. Key design concepts are:
>>>> 1. On creating indexes are registered as tables in Calcite schema. This
>>>> step is needed for internal Calcite's routines.
>>>> 2. On planner initialization we register these indexes as materialized
>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>>> method.
>>>> 3. Right before the query execution Calcite selects all materialized
>>>> views (indexes) which can be potentially used in query.
>>>> 4. During the query optimization indexes are registered by planner as
>>>> usual TableScans and hence can be chosen by optimizer if they have lower
>>>> cost.
>>>>
>>>> This sketch shows the ability to exploit index sortedness only. So the
>>>> future work in this direction should be focused on using indexes for
>>>> fast index lookups. At first glance FilterableTable and
>>>> FilterTableScanRule are good points to start. We can push Filter into
>>>> the TableScan and then use FilterableTable for fast index lookups
>>>> avoiding reading the whole index on TableScan step and then filtering
>>>> its output on the Filter step.
>>>>
>>>> What do you think?
>>>>
>>>>
>>>>
>>>> [1]
>>>>
>>>
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>>> [2]
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>>> [6]  https://github.com/apache/ignite/pull/7115
>>>
>>>
>>>
>>>
>>
>

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Posted by Alexey Zinoviev <za...@gmail.com>.

I'd like Drill approach, worked and debugged with something similar, it's
more easy to support


Buuut, you have an implemented prototype (it votes for Phoenix in my mind)

вт, 10 дек. 2019 г. в 17:19, Vladimir Ozerov <pp...@gmail.com>:

> Hi Roman,
>
> Why do you think that Drill-style will not let you exploit collation?
> Collation should be propagated from the index scan in the same way as in
> other sorted operators, such as merge join or streaming aggregate. Provided
> that you use converter-hack (or any alternative solution to trigger parent
> re-analysis).
> In other words, propagation of collation from Drill-style indexes should be
> no different from other sorted operators.
>
> Regards,
> Vladimir.
>
> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
> >:
>
> >
> > Roman just as fast remark, Phoenix builds their approach on
> > already existing monolith HBase architecture, most cases it`s just a stub
> > for someone who wants use secondary indexes with a base with no
> > native support of it. Don`t think it`s good idea here.
> >
> > >
> > >
> > >------- Forwarded message -------
> > >From: "Roman Kondakov" < kondakov87@mail.ru.invalid >
> > >To:  dev@ignite.apache.org
> > >Cc:
> > >Subject: Adding support for Ignite secondary indexes to Apache Calcite
> > >planner
> > >Date: Tue, 10 Dec 2019 15:55:52 +0300
> > >
> > >Hi all!
> > >
> > >As you may know there is an activity on integration of Apache Calcite
> > >query optimizer into Ignite codebase is being carried out [1],[2].
> > >
> > >One of a bunch of problems in this integration is the absence of
> > >out-of-the-box support for secondary indexes in Apache Calcite. After
> > >some research I came to conclusion that this problem has a couple of
> > >workarounds. Let's name them
> > >1. Phoenix-style approach - representing secondary indexes as
> > >materialized views which are natively supported by Calcite engine [3]
> > >2. Drill-style approach - pushing filters into the table scans and
> > >choose appropriate index for lookups when possible [4]
> > >
> > >Both these approaches have advantages and disadvantages:
> > >
> > >Phoenix style pros:
> > >- natural way of adding indexes as an alternative source of rows: index
> > >can be considered as a kind of sorted materialized view.
> > >- possibility of using index sortedness for stream aggregates,
> > >deduplication (DISTINCT operator), merge joins, etc.
> > >- ability to support other types of indexes (i.e. functional indexes).
> > >
> > >Phoenix style cons:
> > >- polluting optimizer's search space extra table scans hence increasing
> > >the planning time.
> > >
> > >Drill style pros:
> > >- easier to implement (although it's questionable).
> > >- search space is not inflated.
> > >
> > >Drill style cons:
> > >- missed opportunity to exploit sortedness.
> > >
> > >There is a good discussion about using both approaches can be found in
> > [5].
> > >
> > >I made a small sketch [6] in order to demonstrate the applicability of
> > >the Phoenix approach to Ignite. Key design concepts are:
> > >1. On creating indexes are registered as tables in Calcite schema. This
> > >step is needed for internal Calcite's routines.
> > >2. On planner initialization we register these indexes as materialized
> > >views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> > >method.
> > >3. Right before the query execution Calcite selects all materialized
> > >views (indexes) which can be potentially used in query.
> > >4. During the query optimization indexes are registered by planner as
> > >usual TableScans and hence can be chosen by optimizer if they have lower
> > >cost.
> > >
> > >This sketch shows the ability to exploit index sortedness only. So the
> > >future work in this direction should be focused on using indexes for
> > >fast index lookups. At first glance FilterableTable and
> > >FilterTableScanRule are good points to start. We can push Filter into
> > >the TableScan and then use FilterableTable for fast index lookups
> > >avoiding reading the whole index on TableScan step and then filtering
> > >its output on the Filter step.
> > >
> > >What do you think?
> > >
> > >
> > >
> > >[1]
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
> > >[2]
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
> > >[3]  https://issues.apache.org/jira/browse/PHOENIX-2047
> > >[4]  https://issues.apache.org/jira/browse/DRILL-6381
> > >[5]  https://issues.apache.org/jira/browse/DRILL-3929
> > >[6]  https://github.com/apache/ignite/pull/7115
> >
> >
> >
> >
>

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Posted by Vladimir Ozerov <pp...@gmail.com>.

Hi Roman,

Why do you think that Drill-style will not let you exploit collation?
Collation should be propagated from the index scan in the same way as in
other sorted operators, such as merge join or streaming aggregate. Provided
that you use converter-hack (or any alternative solution to trigger parent
re-analysis).
In other words, propagation of collation from Drill-style indexes should be
no different from other sorted operators.

Regards,
Vladimir.

вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>:

>
> Roman just as fast remark, Phoenix builds their approach on
> already existing monolith HBase architecture, most cases it`s just a stub
> for someone who wants use secondary indexes with a base with no
> native support of it. Don`t think it`s good idea here.
>
> >
> >
> >------- Forwarded message -------
> >From: "Roman Kondakov" < kondakov87@mail.ru.invalid >
> >To:  dev@ignite.apache.org
> >Cc:
> >Subject: Adding support for Ignite secondary indexes to Apache Calcite
> >planner
> >Date: Tue, 10 Dec 2019 15:55:52 +0300
> >
> >Hi all!
> >
> >As you may know there is an activity on integration of Apache Calcite
> >query optimizer into Ignite codebase is being carried out [1],[2].
> >
> >One of a bunch of problems in this integration is the absence of
> >out-of-the-box support for secondary indexes in Apache Calcite. After
> >some research I came to conclusion that this problem has a couple of
> >workarounds. Let's name them
> >1. Phoenix-style approach - representing secondary indexes as
> >materialized views which are natively supported by Calcite engine [3]
> >2. Drill-style approach - pushing filters into the table scans and
> >choose appropriate index for lookups when possible [4]
> >
> >Both these approaches have advantages and disadvantages:
> >
> >Phoenix style pros:
> >- natural way of adding indexes as an alternative source of rows: index
> >can be considered as a kind of sorted materialized view.
> >- possibility of using index sortedness for stream aggregates,
> >deduplication (DISTINCT operator), merge joins, etc.
> >- ability to support other types of indexes (i.e. functional indexes).
> >
> >Phoenix style cons:
> >- polluting optimizer's search space extra table scans hence increasing
> >the planning time.
> >
> >Drill style pros:
> >- easier to implement (although it's questionable).
> >- search space is not inflated.
> >
> >Drill style cons:
> >- missed opportunity to exploit sortedness.
> >
> >There is a good discussion about using both approaches can be found in
> [5].
> >
> >I made a small sketch [6] in order to demonstrate the applicability of
> >the Phoenix approach to Ignite. Key design concepts are:
> >1. On creating indexes are registered as tables in Calcite schema. This
> >step is needed for internal Calcite's routines.
> >2. On planner initialization we register these indexes as materialized
> >views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> >method.
> >3. Right before the query execution Calcite selects all materialized
> >views (indexes) which can be potentially used in query.
> >4. During the query optimization indexes are registered by planner as
> >usual TableScans and hence can be chosen by optimizer if they have lower
> >cost.
> >
> >This sketch shows the ability to exploit index sortedness only. So the
> >future work in this direction should be focused on using indexes for
> >fast index lookups. At first glance FilterableTable and
> >FilterTableScanRule are good points to start. We can push Filter into
> >the TableScan and then use FilterableTable for fast index lookups
> >avoiding reading the whole index on TableScan step and then filtering
> >its output on the Filter step.
> >
> >What do you think?
> >
> >
> >
> >[1]
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
> >[2]
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
> >[3]  https://issues.apache.org/jira/browse/PHOENIX-2047
> >[4]  https://issues.apache.org/jira/browse/DRILL-6381
> >[5]  https://issues.apache.org/jira/browse/DRILL-3929
> >[6]  https://github.com/apache/ignite/pull/7115
>
>
>
>

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Zhenya,

there is nothing in common in implementation of Ignite indexes and
Phoenix indexes. I just borrowed the idea how Phoenix supplies the index
metadata (index name, columns, sorting, etc.) to Calcite optimizer. It's
not about index implementation, it's about metadata handling.


-- 
Kind Regards
Roman Kondakov


On 10.12.2019 16:40, Zhenya Stanilovsky wrote:
> 
> Roman just as fast remark, Phoenix builds their approach on already existing monolith HBase architecture, most cases it`s just a stub for someone who wants use secondary indexes with a base with no native support of it. Don`t think it`s good idea here.
>    
>>
>>
>> ------- Forwarded message -------
>> From: "Roman Kondakov" < kondakov87@mail.ru.invalid >
>> To:  dev@ignite.apache.org
>> Cc:
>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
>> planner
>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>
>> Hi all!
>>
>> As you may know there is an activity on integration of Apache Calcite
>> query optimizer into Ignite codebase is being carried out [1],[2].
>>
>> One of a bunch of problems in this integration is the absence of
>> out-of-the-box support for secondary indexes in Apache Calcite. After
>> some research I came to conclusion that this problem has a couple of
>> workarounds. Let's name them
>> 1. Phoenix-style approach - representing secondary indexes as
>> materialized views which are natively supported by Calcite engine [3]
>> 2. Drill-style approach - pushing filters into the table scans and
>> choose appropriate index for lookups when possible [4]
>>
>> Both these approaches have advantages and disadvantages:
>>
>> Phoenix style pros:
>> - natural way of adding indexes as an alternative source of rows: index
>> can be considered as a kind of sorted materialized view.
>> - possibility of using index sortedness for stream aggregates,
>> deduplication (DISTINCT operator), merge joins, etc.
>> - ability to support other types of indexes (i.e. functional indexes).
>>
>> Phoenix style cons:
>> - polluting optimizer's search space extra table scans hence increasing
>> the planning time.
>>
>> Drill style pros:
>> - easier to implement (although it's questionable).
>> - search space is not inflated.
>>
>> Drill style cons:
>> - missed opportunity to exploit sortedness.
>>
>> There is a good discussion about using both approaches can be found in [5].
>>
>> I made a small sketch [6] in order to demonstrate the applicability of
>> the Phoenix approach to Ignite. Key design concepts are:
>> 1. On creating indexes are registered as tables in Calcite schema. This
>> step is needed for internal Calcite's routines.
>> 2. On planner initialization we register these indexes as materialized
>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>> method.
>> 3. Right before the query execution Calcite selects all materialized
>> views (indexes) which can be potentially used in query.
>> 4. During the query optimization indexes are registered by planner as
>> usual TableScans and hence can be chosen by optimizer if they have lower
>> cost.
>>
>> This sketch shows the ability to exploit index sortedness only. So the
>> future work in this direction should be focused on using indexes for
>> fast index lookups. At first glance FilterableTable and
>> FilterTableScanRule are good points to start. We can push Filter into
>> the TableScan and then use FilterableTable for fast index lookups
>> avoiding reading the whole index on TableScan step and then filtering
>> its output on the Filter step.
>>
>> What do you think?
>>
>>
>>
>> [1]
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>> [2]
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>> [6]  https://github.com/apache/ignite/pull/7115 
>  
>  
>  
>  
>

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Posted by Zhenya Stanilovsky <ar...@mail.ru.INVALID>.

Roman just as fast remark, Phoenix builds their approach on already existing monolith HBase architecture, most cases it`s just a stub for someone who wants use secondary indexes with a base with no native support of it. Don`t think it`s good idea here.
   
>
>
>------- Forwarded message -------
>From: "Roman Kondakov" < kondakov87@mail.ru.invalid >
>To:  dev@ignite.apache.org
>Cc:
>Subject: Adding support for Ignite secondary indexes to Apache Calcite
>planner
>Date: Tue, 10 Dec 2019 15:55:52 +0300
>
>Hi all!
>
>As you may know there is an activity on integration of Apache Calcite
>query optimizer into Ignite codebase is being carried out [1],[2].
>
>One of a bunch of problems in this integration is the absence of
>out-of-the-box support for secondary indexes in Apache Calcite. After
>some research I came to conclusion that this problem has a couple of
>workarounds. Let's name them
>1. Phoenix-style approach - representing secondary indexes as
>materialized views which are natively supported by Calcite engine [3]
>2. Drill-style approach - pushing filters into the table scans and
>choose appropriate index for lookups when possible [4]
>
>Both these approaches have advantages and disadvantages:
>
>Phoenix style pros:
>- natural way of adding indexes as an alternative source of rows: index
>can be considered as a kind of sorted materialized view.
>- possibility of using index sortedness for stream aggregates,
>deduplication (DISTINCT operator), merge joins, etc.
>- ability to support other types of indexes (i.e. functional indexes).
>
>Phoenix style cons:
>- polluting optimizer's search space extra table scans hence increasing
>the planning time.
>
>Drill style pros:
>- easier to implement (although it's questionable).
>- search space is not inflated.
>
>Drill style cons:
>- missed opportunity to exploit sortedness.
>
>There is a good discussion about using both approaches can be found in [5].
>
>I made a small sketch [6] in order to demonstrate the applicability of
>the Phoenix approach to Ignite. Key design concepts are:
>1. On creating indexes are registered as tables in Calcite schema. This
>step is needed for internal Calcite's routines.
>2. On planner initialization we register these indexes as materialized
>views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>method.
>3. Right before the query execution Calcite selects all materialized
>views (indexes) which can be potentially used in query.
>4. During the query optimization indexes are registered by planner as
>usual TableScans and hence can be chosen by optimizer if they have lower
>cost.
>
>This sketch shows the ability to exploit index sortedness only. So the
>future work in this direction should be focused on using indexes for
>fast index lookups. At first glance FilterableTable and
>FilterTableScanRule are good points to start. We can push Filter into
>the TableScan and then use FilterableTable for fast index lookups
>avoiding reading the whole index on TableScan step and then filtering
>its output on the Filter step.
>
>What do you think?
>
>
>
>[1]
>http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>[2]
>https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>[3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>[4]  https://issues.apache.org/jira/browse/DRILL-6381
>[5]  https://issues.apache.org/jira/browse/DRILL-3929
>[6]  https://github.com/apache/ignite/pull/7115