You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Jess Balint <jb...@gmail.com> on 2016/10/14 21:33:18 UTC

how does SortRemoveRule work?

Just doing a bit of code reading here. Looking at the SortRemoveRule
implementation:

    // Express the "sortedness" requirement in terms of a collation trait
and
    // we can get rid of the sort. This allows us to use rels that just
happen
    // to be sorted but get the same effect.
    final RelCollation collation = sort.getCollation();
    assert collation == sort.getTraitSet()
        .getTrait(RelCollationTraitDef.INSTANCE);
    final RelTraitSet traits =
sort.getInput().getTraitSet().replace(collation);
    call.transformTo(convert(sort.getInput(), traits));

I don't understand how this is correct without checking that the input
collation is the same as the collation requested by the sort. It looks to
me like it just replaces the input's collation with that of the sort. Any
thoughts?

Thanks.

Jess

Re: how does SortRemoveRule work?

Posted by Vladimir Sitnikov <si...@gmail.com>.

Jess> Is my test incorrect? This also fails:

The test looks like a correct one, however its correctness depends on
the implementation of runDuplicateSortCheck.
It turns out the implementation was wrong, so the top-level sort was
removed not by SortRemoveRule, but it was removed by a mere fact that
the test
dd ask Planner to produce a result with empty collation.

I've fixed that in https://issues.apache.org/jira/browse/CALCITE-2768,
and I've added your test there, so top-level order by is not removed:
https://gitbox.apache.org/repos/asf?p=calcite.git;a=commitdiff;h=b54f6de9d7f87e9853fc9ec01b586555a089b913

Vladimir

Re: how does SortRemoveRule work?

Posted by Jess Balint <jb...@gmail.com>.

Is my test incorrect? This also fails:

  runDuplicateSortCheck("select empid "
                        + "from emps "
                        + "order by name ",
                        "EnumerableSort(sort0=[$1], dir0=[ASC])\n"
                        + "  EnumerableProject(empid=[$0], name=[$2])\n"
                        + "    EnumerableTableScan(table=[[hr, emps]])\n");

The resulting plan is:

EnumerableProject(empid=[$0], name=[$2])
  EnumerableTableScan(table=[[hr, emps]])

No sort... The scan is not sorted. I get the same plan when changing the
sort to any other column.

Jess

On Fri, Oct 14, 2016 at 7:19 PM, Julian Hyde <jh...@apache.org> wrote:

> If the sort is trivial (i.e. its input is already sorted) then the outcome
> of calling call.transformTo(…) will be to put the sort and its input into
> the same subset. Thus the input could be used as a (cheaper) alternative.
>
> If the sort is not trivial, the effect of the rule will be put the sort
> and its input in different RelSubsets of the same RelSet. They were
> probably in different RelSets before the rule was called, in which case it
> will trigger a merge of those sets: the planner now knows that any
> relational expression in either of those sets is now equivalent to any
> other.
>
> > On Oct 14, 2016, at 3:34 PM, Jess Balint <jb...@gmail.com> wrote:
> >
> > On Fri, Oct 14, 2016 at 5:16 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> Note the important expression:
> >>
> >>  convert(sort.getInput(), traits)
> >>
> >> This creates a new relational expression (or returns an existing one)
> that
> >> is “input" sorted by the required fields. It doesn’t change “input”.
> >>
> >>
> > I dug into getOrCreateSubset/addAbstractConverters but I still don't see
> > how this could work.
> >
> >
> >> By the way, the Volcano paper talks about physical properties (what we
> >> call traits) and enforcers. An enforcer is a relational operator that
> >> doesn’t change the logical content of the data, just changes its
> physical
> >> properties.
> >>
> >>
> > Isn't the sort node acting as the enforcer here? I don't see a reason to
> > remove it and then add it back via convert(). I'm specifically looking at
> > the case where the sort is necessary (i.e. should not be removed by this
> > optimization).
> >
> >
> >> Every physical property has a corresponding enforcer: Sort is the
> enforcer
> >> of the Collation trait; Shuffle is the enforcer of the Distribution
> trait;
> >> Convert is the enforcer of the CallingConvention trait.
> >>
> >> The short piece of code you quoted may seem almost trivial, but it is
> >> stating something profound: the relationship between Sort and Collation.
> >>
> >> The trick, of course, is to find a way of achieving a sort order without
> >> using a sort (e.g. finding a copy of the table that contains the columns
> >> you want and is sorted in the order you want, i.e. an index). The whole
> >> reason we have the concept of physical properties is to make that kind
> of
> >> optimization (sometimes) possible.
> >>
> >>
> > I created a simple test:
> >
> >  @Test public void testTwoSortDontRemove() throws Exception {
> >    runDuplicateSortCheck("select empid+deptno from ( "
> >                          + "select empid, deptno "
> >                          + "from emps "
> >                          + "order by empid) "
> >                          + "order by deptno",
> >                          "EnumerableSort(sort0=[$1], dir0=[ASC])\n"
> >                          + "  EnumerableProject(EXPR$0=[+($0, $1)],
> > deptno=[$1])\n"
> >                          + "    EnumerableSort(sort0=[$0], dir0=[ASC])\n"
> >                          + "      EnumerableProject(empid=[$0],
> > deptno=[$1])\n"
> >                          + "        EnumerableTableScan(table=[[hr,
> > emps]])\n");
> >  }
> >
> > When I run this, I get an error because the sort is removed:
> >
> > EnumerableProject(EXPR$0=[+($0, $1)], deptno=[$1])
> >  EnumerableSort(sort0=[$0], dir0=[ASC])
> >    EnumerableProject(empid=[$0], deptno=[$1])
> >      EnumerableTableScan(table=[[hr, emps]])
> >
> > I changed the original method in question to only remove the sort if the
> > requested collation matches that of it's input:
> >
> >    if
> > (sort.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE)
> .equals(collation))
> > {
> >      final RelTraitSet traits =
> > sort.getInput().getTraitSet().replace(collation);
> >      call.transformTo(convert(sort.getInput(), traits));
> >    }
> >
> > This will cause the test to pass. Isn't this a bug?
> >
> > Jess
> >
> >
> >> Julian
> >>
> >>
> >>> On Oct 14, 2016, at 2:33 PM, Jess Balint <jb...@gmail.com> wrote:
> >>>
> >>> Just doing a bit of code reading here. Looking at the SortRemoveRule
> >>> implementation:
> >>>
> >>>   // Express the "sortedness" requirement in terms of a collation trait
> >>> and
> >>>   // we can get rid of the sort. This allows us to use rels that just
> >>> happen
> >>>   // to be sorted but get the same effect.
> >>>   final RelCollation collation = sort.getCollation();
> >>>   assert collation == sort.getTraitSet()
> >>>       .getTrait(RelCollationTraitDef.INSTANCE);
> >>>   final RelTraitSet traits =
> >>> sort.getInput().getTraitSet().replace(collation);
> >>>   call.transformTo(convert(sort.getInput(), traits));
> >>>
> >>> I don't understand how this is correct without checking that the input
> >>> collation is the same as the collation requested by the sort. It looks
> to
> >>> me like it just replaces the input's collation with that of the sort.
> Any
> >>> thoughts?
> >>>
> >>> Thanks.
> >>>
> >>> Jess
> >>
> >>
>
>

Re: how does SortRemoveRule work?

Posted by Julian Hyde <jh...@apache.org>.

If the sort is trivial (i.e. its input is already sorted) then the outcome of calling call.transformTo(…) will be to put the sort and its input into the same subset. Thus the input could be used as a (cheaper) alternative.

If the sort is not trivial, the effect of the rule will be put the sort and its input in different RelSubsets of the same RelSet. They were probably in different RelSets before the rule was called, in which case it will trigger a merge of those sets: the planner now knows that any relational expression in either of those sets is now equivalent to any other.

> On Oct 14, 2016, at 3:34 PM, Jess Balint <jb...@gmail.com> wrote:
> 
> On Fri, Oct 14, 2016 at 5:16 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> Note the important expression:
>> 
>>  convert(sort.getInput(), traits)
>> 
>> This creates a new relational expression (or returns an existing one) that
>> is “input" sorted by the required fields. It doesn’t change “input”.
>> 
>> 
> I dug into getOrCreateSubset/addAbstractConverters but I still don't see
> how this could work.
> 
> 
>> By the way, the Volcano paper talks about physical properties (what we
>> call traits) and enforcers. An enforcer is a relational operator that
>> doesn’t change the logical content of the data, just changes its physical
>> properties.
>> 
>> 
> Isn't the sort node acting as the enforcer here? I don't see a reason to
> remove it and then add it back via convert(). I'm specifically looking at
> the case where the sort is necessary (i.e. should not be removed by this
> optimization).
> 
> 
>> Every physical property has a corresponding enforcer: Sort is the enforcer
>> of the Collation trait; Shuffle is the enforcer of the Distribution trait;
>> Convert is the enforcer of the CallingConvention trait.
>> 
>> The short piece of code you quoted may seem almost trivial, but it is
>> stating something profound: the relationship between Sort and Collation.
>> 
>> The trick, of course, is to find a way of achieving a sort order without
>> using a sort (e.g. finding a copy of the table that contains the columns
>> you want and is sorted in the order you want, i.e. an index). The whole
>> reason we have the concept of physical properties is to make that kind of
>> optimization (sometimes) possible.
>> 
>> 
> I created a simple test:
> 
>  @Test public void testTwoSortDontRemove() throws Exception {
>    runDuplicateSortCheck("select empid+deptno from ( "
>                          + "select empid, deptno "
>                          + "from emps "
>                          + "order by empid) "
>                          + "order by deptno",
>                          "EnumerableSort(sort0=[$1], dir0=[ASC])\n"
>                          + "  EnumerableProject(EXPR$0=[+($0, $1)],
> deptno=[$1])\n"
>                          + "    EnumerableSort(sort0=[$0], dir0=[ASC])\n"
>                          + "      EnumerableProject(empid=[$0],
> deptno=[$1])\n"
>                          + "        EnumerableTableScan(table=[[hr,
> emps]])\n");
>  }
> 
> When I run this, I get an error because the sort is removed:
> 
> EnumerableProject(EXPR$0=[+($0, $1)], deptno=[$1])
>  EnumerableSort(sort0=[$0], dir0=[ASC])
>    EnumerableProject(empid=[$0], deptno=[$1])
>      EnumerableTableScan(table=[[hr, emps]])
> 
> I changed the original method in question to only remove the sort if the
> requested collation matches that of it's input:
> 
>    if
> (sort.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE).equals(collation))
> {
>      final RelTraitSet traits =
> sort.getInput().getTraitSet().replace(collation);
>      call.transformTo(convert(sort.getInput(), traits));
>    }
> 
> This will cause the test to pass. Isn't this a bug?
> 
> Jess
> 
> 
>> Julian
>> 
>> 
>>> On Oct 14, 2016, at 2:33 PM, Jess Balint <jb...@gmail.com> wrote:
>>> 
>>> Just doing a bit of code reading here. Looking at the SortRemoveRule
>>> implementation:
>>> 
>>>   // Express the "sortedness" requirement in terms of a collation trait
>>> and
>>>   // we can get rid of the sort. This allows us to use rels that just
>>> happen
>>>   // to be sorted but get the same effect.
>>>   final RelCollation collation = sort.getCollation();
>>>   assert collation == sort.getTraitSet()
>>>       .getTrait(RelCollationTraitDef.INSTANCE);
>>>   final RelTraitSet traits =
>>> sort.getInput().getTraitSet().replace(collation);
>>>   call.transformTo(convert(sort.getInput(), traits));
>>> 
>>> I don't understand how this is correct without checking that the input
>>> collation is the same as the collation requested by the sort. It looks to
>>> me like it just replaces the input's collation with that of the sort. Any
>>> thoughts?
>>> 
>>> Thanks.
>>> 
>>> Jess
>> 
>>

Re: how does SortRemoveRule work?

Posted by Jess Balint <jb...@gmail.com>.

On Fri, Oct 14, 2016 at 5:16 PM, Julian Hyde <jh...@apache.org> wrote:

> Note the important expression:
>
>   convert(sort.getInput(), traits)
>
> This creates a new relational expression (or returns an existing one) that
> is “input" sorted by the required fields. It doesn’t change “input”.
>
>
I dug into getOrCreateSubset/addAbstractConverters but I still don't see
how this could work.


> By the way, the Volcano paper talks about physical properties (what we
> call traits) and enforcers. An enforcer is a relational operator that
> doesn’t change the logical content of the data, just changes its physical
> properties.
>
>
Isn't the sort node acting as the enforcer here? I don't see a reason to
remove it and then add it back via convert(). I'm specifically looking at
the case where the sort is necessary (i.e. should not be removed by this
optimization).


> Every physical property has a corresponding enforcer: Sort is the enforcer
> of the Collation trait; Shuffle is the enforcer of the Distribution trait;
> Convert is the enforcer of the CallingConvention trait.
>
> The short piece of code you quoted may seem almost trivial, but it is
> stating something profound: the relationship between Sort and Collation.
>
> The trick, of course, is to find a way of achieving a sort order without
> using a sort (e.g. finding a copy of the table that contains the columns
> you want and is sorted in the order you want, i.e. an index). The whole
> reason we have the concept of physical properties is to make that kind of
> optimization (sometimes) possible.
>
>
I created a simple test:

  @Test public void testTwoSortDontRemove() throws Exception {
    runDuplicateSortCheck("select empid+deptno from ( "
                          + "select empid, deptno "
                          + "from emps "
                          + "order by empid) "
                          + "order by deptno",
                          "EnumerableSort(sort0=[$1], dir0=[ASC])\n"
                          + "  EnumerableProject(EXPR$0=[+($0, $1)],
deptno=[$1])\n"
                          + "    EnumerableSort(sort0=[$0], dir0=[ASC])\n"
                          + "      EnumerableProject(empid=[$0],
deptno=[$1])\n"
                          + "        EnumerableTableScan(table=[[hr,
emps]])\n");
  }

When I run this, I get an error because the sort is removed:

EnumerableProject(EXPR$0=[+($0, $1)], deptno=[$1])
  EnumerableSort(sort0=[$0], dir0=[ASC])
    EnumerableProject(empid=[$0], deptno=[$1])
      EnumerableTableScan(table=[[hr, emps]])

I changed the original method in question to only remove the sort if the
requested collation matches that of it's input:

    if
(sort.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE).equals(collation))
{
      final RelTraitSet traits =
sort.getInput().getTraitSet().replace(collation);
      call.transformTo(convert(sort.getInput(), traits));
    }

This will cause the test to pass. Isn't this a bug?

Jess


> Julian
>
>
> > On Oct 14, 2016, at 2:33 PM, Jess Balint <jb...@gmail.com> wrote:
> >
> > Just doing a bit of code reading here. Looking at the SortRemoveRule
> > implementation:
> >
> >    // Express the "sortedness" requirement in terms of a collation trait
> > and
> >    // we can get rid of the sort. This allows us to use rels that just
> > happen
> >    // to be sorted but get the same effect.
> >    final RelCollation collation = sort.getCollation();
> >    assert collation == sort.getTraitSet()
> >        .getTrait(RelCollationTraitDef.INSTANCE);
> >    final RelTraitSet traits =
> > sort.getInput().getTraitSet().replace(collation);
> >    call.transformTo(convert(sort.getInput(), traits));
> >
> > I don't understand how this is correct without checking that the input
> > collation is the same as the collation requested by the sort. It looks to
> > me like it just replaces the input's collation with that of the sort. Any
> > thoughts?
> >
> > Thanks.
> >
> > Jess
>
>

Re: how does SortRemoveRule work?

Posted by Julian Hyde <jh...@apache.org>.

Note the important expression:

  convert(sort.getInput(), traits)

This creates a new relational expression (or returns an existing one) that is “input" sorted by the required fields. It doesn’t change “input”.

By the way, the Volcano paper talks about physical properties (what we call traits) and enforcers. An enforcer is a relational operator that doesn’t change the logical content of the data, just changes its physical properties.

Every physical property has a corresponding enforcer: Sort is the enforcer of the Collation trait; Shuffle is the enforcer of the Distribution trait; Convert is the enforcer of the CallingConvention trait.

The short piece of code you quoted may seem almost trivial, but it is stating something profound: the relationship between Sort and Collation.

The trick, of course, is to find a way of achieving a sort order without using a sort (e.g. finding a copy of the table that contains the columns you want and is sorted in the order you want, i.e. an index). The whole reason we have the concept of physical properties is to make that kind of optimization (sometimes) possible.

Julian

> On Oct 14, 2016, at 2:33 PM, Jess Balint <jb...@gmail.com> wrote:
> 
> Just doing a bit of code reading here. Looking at the SortRemoveRule
> implementation:
> 
>    // Express the "sortedness" requirement in terms of a collation trait
> and
>    // we can get rid of the sort. This allows us to use rels that just
> happen
>    // to be sorted but get the same effect.
>    final RelCollation collation = sort.getCollation();
>    assert collation == sort.getTraitSet()
>        .getTrait(RelCollationTraitDef.INSTANCE);
>    final RelTraitSet traits =
> sort.getInput().getTraitSet().replace(collation);
>    call.transformTo(convert(sort.getInput(), traits));
> 
> I don't understand how this is correct without checking that the input
> collation is the same as the collation requested by the sort. It looks to
> me like it just replaces the input's collation with that of the sort. Any
> thoughts?
> 
> Thanks.
> 
> Jess