You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by "Tzolov, Christian" <ct...@pivotal.io> on 2016/05/13 15:22:47 UTC

How to check for SELECT DISTINCT ...?

Hi there, 

I am working on a Calcite adapter for a datastore that supports a limited subset of SQL expressions. My goal is to push down the supported expression. 

This datastore supports "ORDER BY” only if the SELECT DISTINCT expression is used. 

What would be the best way to check if the select DISTINCT keyword is used from within my SortRule implementation? 

Thanks,
Christian

Re: How to check for SELECT DISTINCT ...?

Posted by Julian Hyde <jh...@apache.org>.

The “keys” are fields in the operators:
* Aggregate.groupSet is a bit map, where "group by x, y” would be encoded by setting bits 2 and 4 if x was the 2nd column (0-based) of the input and y was the 4th column. Note that “group by y, x” is encoded exactly the same way. “select distinct” will have all bits set.
* Sort.collation.getFieldCollations() returns a list of sort keys, each a RelFieldCollation whose fieldIndex is the 0-based offset of the underlying column.

Also remember that the output of Aggregate consists of the group keys followed by the aggregate functions. Other input columns to Aggregate do not appear in its output.

For your case do you really need SELECT DISTINCT (i.e. aggregating on all fields) or is it sufficient that the fields in Sort.collation are a superset of the Aggregate keys? If the latter, and supposing that n = Aggregate.groupSet.cardinality(), then you should check that all sort keys are less than n (i.e. on the leading edge of the columns output by Sort).

> On May 13, 2016, at 11:52 PM, Christian Tzolov <ct...@pivotal.io> wrote:
> 
> Thanks for the explanation and the suggestions Julian!
> 
> After some debugging and enough "explain plan for" i figured out the same
> :)  - that DISTINCT is represented as a GROUP BY over the  projected fields.
> 
>>>> You should write a rule that matches a Sort on top of an Aggregate that
> uses the same keys.
> 
> I've created such rule (operand(Sort.class, operand(Aggregate.class, any())),
> but i am not sure what is the safest way to determine whether the keys are
> the same? Would it be enough to compare their row types or i need check the
> collations ?
> 
> The suggestion to expand the select in case of unique keys is excellent.
> But i have to investigate whether the backend system in they way i'm using
> it allows primary keys.
> 
> How does RelMdUniqueKeys determines whether the keys are unique? Is this a
> meta info provided by the table (or row type) definition or is computed on
> the fly?
> 
> 
> 
> 
> 
> 
> On 13 May 2016 at 18:54, Julian Hyde <jh...@apache.org> wrote:
> 
>> By the time the query is translated into relational algebra, SELECT
>> DISTINCT will have been converted into an Aggregate with no aggregate
>> functions.
>> 
>> So, for instance,
>> 
>>  select distinct deptno from emp
>> 
>> and
>> 
>>  select deptno from emp group by deptno
>> 
>> will be indistinguishable if you look at the relational algebra.
>> 
>> You should write a rule that matches a Sort on top of an Aggregate
>> that uses the same keys.
>> 
>> Also, I don't know whether your system supports primary keys. But if,
>> for example, empno is unique, then you would want to be able to
>> evaluate
>> 
>>  select empno from emp order by empno
>> 
>> even though there is no explicit SELECT DISTINCT. You can use the
>> RelMdUniqueKeys metadata to figure out whether the sort key is already
>> unique.
>> 
>> Julian
>> 
>> 
>> On Fri, May 13, 2016 at 8:22 AM, Tzolov, Christian <ct...@pivotal.io>
>> wrote:
>>> Hi there,
>>> 
>>> I am working on a Calcite adapter for a datastore that supports a
>> limited subset of SQL expressions. My goal is to push down the supported
>> expression.
>>> 
>>> This datastore supports "ORDER BY” only if the SELECT DISTINCT
>> expression is used.
>>> 
>>> What would be the best way to check if the select DISTINCT keyword is
>> used from within my SortRule implementation?
>>> 
>>> Thanks,
>>> Christian
>>> 
>> 
> 
> 
> 
> -- 
> Christian Tzolov <http://www.linkedin.com/in/tzolov> | Solution Architect,
> EMEA Practice Team | Pivotal <http://pivotal.io/>
> ctzolov@pivotal.io|+31610285517

Re: How to check for SELECT DISTINCT ...?

Posted by Christian Tzolov <ct...@pivotal.io>.

Thanks for the explanation and the suggestions Julian!

After some debugging and enough "explain plan for" i figured out the same
:)  - that DISTINCT is represented as a GROUP BY over the  projected fields.

>>> You should write a rule that matches a Sort on top of an Aggregate that
uses the same keys.

I've created such rule (operand(Sort.class, operand(Aggregate.class, any())),
but i am not sure what is the safest way to determine whether the keys are
the same? Would it be enough to compare their row types or i need check the
collations ?

The suggestion to expand the select in case of unique keys is excellent.
But i have to investigate whether the backend system in they way i'm using
it allows primary keys.

How does RelMdUniqueKeys determines whether the keys are unique? Is this a
meta info provided by the table (or row type) definition or is computed on
the fly?

On 13 May 2016 at 18:54, Julian Hyde <jh...@apache.org> wrote:

> By the time the query is translated into relational algebra, SELECT
> DISTINCT will have been converted into an Aggregate with no aggregate
> functions.
>
> So, for instance,
>
>   select distinct deptno from emp
>
> and
>
>   select deptno from emp group by deptno
>
> will be indistinguishable if you look at the relational algebra.
>
> You should write a rule that matches a Sort on top of an Aggregate
> that uses the same keys.
>
> Also, I don't know whether your system supports primary keys. But if,
> for example, empno is unique, then you would want to be able to
> evaluate
>
>   select empno from emp order by empno
>
> even though there is no explicit SELECT DISTINCT. You can use the
> RelMdUniqueKeys metadata to figure out whether the sort key is already
> unique.
>
> Julian
>
>
> On Fri, May 13, 2016 at 8:22 AM, Tzolov, Christian <ct...@pivotal.io>
> wrote:
> > Hi there,
> >
> > I am working on a Calcite adapter for a datastore that supports a
> limited subset of SQL expressions. My goal is to push down the supported
> expression.
> >
> > This datastore supports "ORDER BY” only if the SELECT DISTINCT
> expression is used.
> >
> > What would be the best way to check if the select DISTINCT keyword is
> used from within my SortRule implementation?
> >
> > Thanks,
> > Christian
> >
>

-- 
Christian Tzolov <http://www.linkedin.com/in/tzolov> | Solution Architect,
EMEA Practice Team | Pivotal <http://pivotal.io/>
ctzolov@pivotal.io|+31610285517

Re: How to check for SELECT DISTINCT ...?

Posted by Julian Hyde <jh...@apache.org>.

By the time the query is translated into relational algebra, SELECT
DISTINCT will have been converted into an Aggregate with no aggregate
functions.

So, for instance,

  select distinct deptno from emp

and

  select deptno from emp group by deptno

will be indistinguishable if you look at the relational algebra.

You should write a rule that matches a Sort on top of an Aggregate
that uses the same keys.

Also, I don't know whether your system supports primary keys. But if,
for example, empno is unique, then you would want to be able to
evaluate

  select empno from emp order by empno

even though there is no explicit SELECT DISTINCT. You can use the
RelMdUniqueKeys metadata to figure out whether the sort key is already
unique.

Julian

On Fri, May 13, 2016 at 8:22 AM, Tzolov, Christian <ct...@pivotal.io> wrote:
> Hi there,
>
> I am working on a Calcite adapter for a datastore that supports a limited subset of SQL expressions. My goal is to push down the supported expression.
>
> This datastore supports "ORDER BY” only if the SELECT DISTINCT expression is used.
>
> What would be the best way to check if the select DISTINCT keyword is used from within my SortRule implementation?
>
> Thanks,
> Christian
>