You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Maryann Xue <ma...@gmail.com> on 2015/11/04 20:13:34 UTC

Modeling Phoenix ordered and unordered TableScan

Hi,

I just became aware of this requirement in Phoenix using Calcite last week
that for some Phoenix tables, we'd like to have two slightly different
table scan strategies, "ordered" vs. "unordered", the latter of which does
not guarantee primary key of the returned rows but can be significantly
faster.

At first I thought this would be easy, just by having two physical
TableScan operators and some rules to replace the default one with the
alternative one. But this didn't work, regardless which is which, basically
because the parent rel would not be aware of the existence of a new subset
of its child rel, which means we couldn't create a chain of new rels
comprising the new subsets from bottom-up just like the way we'd create a
chain of new rels with new subsets top-down using ConvertRules. To further
justify that RelOptRules couldn't work, a rule that matches Sort over
TableScan wouldn't be good for all cases, for we have operators like
PhoenixServerJoin that does not require a specific collation for its
children but instead surfaces the collation from one of them as its own
collation trait.

However, I found SubstitutionVisitor exactly worked for this case. I ended
up modeling the alternative table scan as a materialization, with a virtual
table name and as an identity projection of the original table, e.g. table
A has an unordered version A' (which physically does not exist) and A' is
registered as a materialization of A.

My conclusion was:
1. When using RelOptRules, you shouldn't expect the transformed new rel to
be in a different subset from the original rel. It can have its own subset
but this subset should be the subset of the original subset, otherwise its
parent wouldn't be aware of its existence. And even if it is a sub-subset,
you can't expect it to affect the planning on higher levels of the tree.
2. The SubstitutionVisitor (or materialization substitution) can be useful
in such cases where you want to change the planning of the tree from
bottom-up, while volcano rules can't.

This coincided with a conversation Julian, James and I had last week about
the SubstitutionVisitor and thought this might be helpful in further
discussions. Please let me know whether my solution sounds reasonable and
whether the statement above is correct.



Thanks,
Maryann

Re: Modeling Phoenix ordered and unordered TableScan

Posted by Maryann Xue <ma...@gmail.com>.

A correction to what wasn't accurate:

2. The SubstitutionVisitor (or materialization substitution) can be useful
in such cases where you want to change the planning of the tree from
bottom-up by changing to traits the parent doesn't know of, while volcano
rules can't.

On Wed, Nov 4, 2015 at 2:13 PM, Maryann Xue <ma...@gmail.com> wrote:

> Hi,
>
> I just became aware of this requirement in Phoenix using Calcite last week
> that for some Phoenix tables, we'd like to have two slightly different
> table scan strategies, "ordered" vs. "unordered", the latter of which does
> not guarantee primary key of the returned rows but can be significantly
> faster.
>
> At first I thought this would be easy, just by having two physical
> TableScan operators and some rules to replace the default one with the
> alternative one. But this didn't work, regardless which is which, basically
> because the parent rel would not be aware of the existence of a new subset
> of its child rel, which means we couldn't create a chain of new rels
> comprising the new subsets from bottom-up just like the way we'd create a
> chain of new rels with new subsets top-down using ConvertRules. To further
> justify that RelOptRules couldn't work, a rule that matches Sort over
> TableScan wouldn't be good for all cases, for we have operators like
> PhoenixServerJoin that does not require a specific collation for its
> children but instead surfaces the collation from one of them as its own
> collation trait.
>
> However, I found SubstitutionVisitor exactly worked for this case. I ended
> up modeling the alternative table scan as a materialization, with a virtual
> table name and as an identity projection of the original table, e.g. table
> A has an unordered version A' (which physically does not exist) and A' is
> registered as a materialization of A.
>
> My conclusion was:
> 1. When using RelOptRules, you shouldn't expect the transformed new rel to
> be in a different subset from the original rel. It can have its own subset
> but this subset should be the subset of the original subset, otherwise its
> parent wouldn't be aware of its existence. And even if it is a sub-subset,
> you can't expect it to affect the planning on higher levels of the tree.
> 2. The SubstitutionVisitor (or materialization substitution) can be useful
> in such cases where you want to change the planning of the tree from
> bottom-up, while volcano rules can't.
>
> This coincided with a conversation Julian, James and I had last week about
> the SubstitutionVisitor and thought this might be helpful in further
> discussions. Please let me know whether my solution sounds reasonable and
> whether the statement above is correct.
>
>
>
> Thanks,
> Maryann
>