You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Itiel Sadeh <it...@sqreamtech.com.INVALID> on 2022/08/03 20:40:48 UTC

Percentile_Disc return type

Hello,

First of all I just wanted to thank all of you for the work you are doing
on Calcite.

We want to add support for the percentile_disc aggregate function, but we
encounter a problem.
Our percentile_disc return type is dependent on the column of the "order
by" clause (just like postgresql
<https://www.postgresql.org/docs/current/functions-aggregate.html>).
However, I don't see how to achieve that on calcite. The issue is that the
sort column argument is not passed to the percentile_disc function, Rather,
it is stored outside of it as a collation. Therefore, I cannot use the
SqlReturnTypeInference mechanism.

If someone has an idea on how to achieve this it would be very much
appreciated.
Thank you,

Itiel

Re: Percentile_Disc return type

Posted by Julian Hyde <jh...@gmail.com>.
My mistake. I was thinking of CUME_DIST, which is the inverse of PERCENTILE_DIST and returns a fraction.

I agree that the return type of “PERCENTILE_DIST(fraction) WITHIN GROUP (ORDER BY x)” should be the same as the type of “x”.

> On Aug 7, 2022, at 12:59 AM, Itiel Sadeh <it...@sqreamtech.com.INVALID> wrote:
> 
> opened a ticket in https://issues.apache.org/jira/browse/CALCITE-5230 <https://issues.apache.org/jira/browse/CALCITE-5230>.
> Maybe I'm missing something, but I think that in your example, the
> percentile_disc will calculate the percentile that you specify and the
> output will be the appropriate value from the input column.
> 
> We can continue this discussion in the Jira ticket.
> 
> Itiel
> On Fri, Aug 5, 2022 at 9:00 PM Julian Hyde <jhyde.apache@gmail.com <ma...@gmail.com>> wrote:
> 
>> I’m not sure what Postgres has done, but it doesn’t make sense to derive
>> the return type of PERCENTILE_DISC from the argument type, even though both
>> are typically numeric types. (Strictly speaking, the argument to
>> PERCENTILE_DISC can be any type that has a ‘-‘ operation, so it ought to
>> work on DATE, TIMESTAMP, INTERVAL as well as numeric.)
>> 
>> To take an extreme example, if I am a geologist, the argument to
>> PERCENTILE_DISC might be the age of my rock samples in millions of years,
>> and the output will be a number between 0 and 1.
>> 
>> I think your problem is that you don’t like the default data type returned
>> by PERCENTILE_DISC. The solution would be to make the policy customizable
>> via RelDataTypeSystem, as we did for other aggregate functions in
>> https://issues.apache.org/jira/browse/CALCITE-1945 <
>> https://issues.apache.org/jira/browse/CALCITE-1945 <https://issues.apache.org/jira/browse/CALCITE-1945>>.
>> 
>> Can you please log a Jira case for this, and we can discuss further there?
>> 
>> Julian
>> 
>>> On Aug 5, 2022, at 4:23 AM, Itiel Sadeh <it...@sqreamtech.com.INVALID>
>> wrote:
>>> 
>>> Hi Julian, thanks for your email.
>>> 
>>> I don't think those tickets will solve the problem as from what I
>>> understand, they are referring to the input types, while my issue is with
>>> the return type (although it might have the same root cause).
>>> 
>>> As you can see in the code
>>> <
>> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L2304
>>> ,
>>> right now the return type is hard coded as DOUBLE. The problem is that in
>>> the following query:
>>> 
>>>> select percentile_disc(0.12) within group(order by x) from t;
>>> 
>>> The PERCENTILE_DISC argument will be only 0.12, and the argument "x" will
>>> be outside of it (in the WITHIN_GROUP call). so the return type inference
>>> cannot use the type of the column "x" when it's inferring the return
>> type.
>>> 
>>> Itiel
>>> 
>>> On Thu, Aug 4, 2022 at 9:43 PM Julian Hyde <jh...@gmail.com>
>> wrote:
>>> 
>>>> I believer that PERCENTILE_DISC was implemented in
>>>> https://issues.apache.org/jira/browse/CALCITE-4644 <
>>>> https://issues.apache.org/jira/browse/CALCITE-4644>. There are open
>>>> issues to support any sortable type [
>>>> https://issues.apache.org/jira/browse/CALCITE-4670 <
>>>> https://issues.apache.org/jira/browse/CALCITE-4670> ] and also to
>> change
>>>> the implementation strategy [
>>>> https://issues.apache.org/jira/browse/CALCITE-4666 <
>>>> https://issues.apache.org/jira/browse/CALCITE-4666> ].
>>>> 
>>>> Does your case map onto any of those?
>>>> 
>>>> Is PERCENTILE_DISC able to deduce return types from its arguments today?
>>>> If so, how does it do it?
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Aug 3, 2022, at 1:40 PM, Itiel Sadeh <itiels@sqreamtech.com.INVALID
>>> 
>>>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> First of all I just wanted to thank all of you for the work you are
>> doing
>>>>> on Calcite.
>>>>> 
>>>>> We want to add support for the percentile_disc aggregate function, but
>> we
>>>>> encounter a problem.
>>>>> Our percentile_disc return type is dependent on the column of the
>> "order
>>>>> by" clause (just like postgresql
>>>>> <https://www.postgresql.org/docs/current/functions-aggregate.html>).
>>>>> However, I don't see how to achieve that on calcite. The issue is that
>>>> the
>>>>> sort column argument is not passed to the percentile_disc function,
>>>> Rather,
>>>>> it is stored outside of it as a collation. Therefore, I cannot use the
>>>>> SqlReturnTypeInference mechanism.
>>>>> 
>>>>> If someone has an idea on how to achieve this it would be very much
>>>>> appreciated.
>>>>> Thank you,
>>>>> 
>>>>> Itiel


Re: Percentile_Disc return type

Posted by Itiel Sadeh <it...@sqreamtech.com.INVALID>.
opened a ticket in https://issues.apache.org/jira/browse/CALCITE-5230.
Maybe I'm missing something, but I think that in your example, the
percentile_disc will calculate the percentile that you specify and the
output will be the appropriate value from the input column.

We can continue this discussion in the Jira ticket.

Itiel
On Fri, Aug 5, 2022 at 9:00 PM Julian Hyde <jh...@gmail.com> wrote:

> I’m not sure what Postgres has done, but it doesn’t make sense to derive
> the return type of PERCENTILE_DISC from the argument type, even though both
> are typically numeric types. (Strictly speaking, the argument to
> PERCENTILE_DISC can be any type that has a ‘-‘ operation, so it ought to
> work on DATE, TIMESTAMP, INTERVAL as well as numeric.)
>
> To take an extreme example, if I am a geologist, the argument to
> PERCENTILE_DISC might be the age of my rock samples in millions of years,
> and the output will be a number between 0 and 1.
>
> I think your problem is that you don’t like the default data type returned
> by PERCENTILE_DISC. The solution would be to make the policy customizable
> via RelDataTypeSystem, as we did for other aggregate functions in
> https://issues.apache.org/jira/browse/CALCITE-1945 <
> https://issues.apache.org/jira/browse/CALCITE-1945>.
>
> Can you please log a Jira case for this, and we can discuss further there?
>
> Julian
>
> > On Aug 5, 2022, at 4:23 AM, Itiel Sadeh <it...@sqreamtech.com.INVALID>
> wrote:
> >
> > Hi Julian, thanks for your email.
> >
> > I don't think those tickets will solve the problem as from what I
> > understand, they are referring to the input types, while my issue is with
> > the return type (although it might have the same root cause).
> >
> > As you can see in the code
> > <
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L2304
> >,
> > right now the return type is hard coded as DOUBLE. The problem is that in
> > the following query:
> >
> >> select percentile_disc(0.12) within group(order by x) from t;
> >
> > The PERCENTILE_DISC argument will be only 0.12, and the argument "x" will
> > be outside of it (in the WITHIN_GROUP call). so the return type inference
> > cannot use the type of the column "x" when it's inferring the return
> type.
> >
> > Itiel
> >
> > On Thu, Aug 4, 2022 at 9:43 PM Julian Hyde <jh...@gmail.com>
> wrote:
> >
> >> I believer that PERCENTILE_DISC was implemented in
> >> https://issues.apache.org/jira/browse/CALCITE-4644 <
> >> https://issues.apache.org/jira/browse/CALCITE-4644>. There are open
> >> issues to support any sortable type [
> >> https://issues.apache.org/jira/browse/CALCITE-4670 <
> >> https://issues.apache.org/jira/browse/CALCITE-4670> ] and also to
> change
> >> the implementation strategy [
> >> https://issues.apache.org/jira/browse/CALCITE-4666 <
> >> https://issues.apache.org/jira/browse/CALCITE-4666> ].
> >>
> >> Does your case map onto any of those?
> >>
> >> Is PERCENTILE_DISC able to deduce return types from its arguments today?
> >> If so, how does it do it?
> >>
> >> Julian
> >>
> >>
> >>> On Aug 3, 2022, at 1:40 PM, Itiel Sadeh <itiels@sqreamtech.com.INVALID
> >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> First of all I just wanted to thank all of you for the work you are
> doing
> >>> on Calcite.
> >>>
> >>> We want to add support for the percentile_disc aggregate function, but
> we
> >>> encounter a problem.
> >>> Our percentile_disc return type is dependent on the column of the
> "order
> >>> by" clause (just like postgresql
> >>> <https://www.postgresql.org/docs/current/functions-aggregate.html>).
> >>> However, I don't see how to achieve that on calcite. The issue is that
> >> the
> >>> sort column argument is not passed to the percentile_disc function,
> >> Rather,
> >>> it is stored outside of it as a collation. Therefore, I cannot use the
> >>> SqlReturnTypeInference mechanism.
> >>>
> >>> If someone has an idea on how to achieve this it would be very much
> >>> appreciated.
> >>> Thank you,
> >>>
> >>> Itiel
> >>
> >>
>
>

Re: Percentile_Disc return type

Posted by Julian Hyde <jh...@gmail.com>.
I’m not sure what Postgres has done, but it doesn’t make sense to derive the return type of PERCENTILE_DISC from the argument type, even though both are typically numeric types. (Strictly speaking, the argument to PERCENTILE_DISC can be any type that has a ‘-‘ operation, so it ought to work on DATE, TIMESTAMP, INTERVAL as well as numeric.)

To take an extreme example, if I am a geologist, the argument to PERCENTILE_DISC might be the age of my rock samples in millions of years, and the output will be a number between 0 and 1.

I think your problem is that you don’t like the default data type returned by PERCENTILE_DISC. The solution would be to make the policy customizable via RelDataTypeSystem, as we did for other aggregate functions in https://issues.apache.org/jira/browse/CALCITE-1945 <https://issues.apache.org/jira/browse/CALCITE-1945>.

Can you please log a Jira case for this, and we can discuss further there?

Julian

> On Aug 5, 2022, at 4:23 AM, Itiel Sadeh <it...@sqreamtech.com.INVALID> wrote:
> 
> Hi Julian, thanks for your email.
> 
> I don't think those tickets will solve the problem as from what I
> understand, they are referring to the input types, while my issue is with
> the return type (although it might have the same root cause).
> 
> As you can see in the code
> <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L2304>,
> right now the return type is hard coded as DOUBLE. The problem is that in
> the following query:
> 
>> select percentile_disc(0.12) within group(order by x) from t;
> 
> The PERCENTILE_DISC argument will be only 0.12, and the argument "x" will
> be outside of it (in the WITHIN_GROUP call). so the return type inference
> cannot use the type of the column "x" when it's inferring the return type.
> 
> Itiel
> 
> On Thu, Aug 4, 2022 at 9:43 PM Julian Hyde <jh...@gmail.com> wrote:
> 
>> I believer that PERCENTILE_DISC was implemented in
>> https://issues.apache.org/jira/browse/CALCITE-4644 <
>> https://issues.apache.org/jira/browse/CALCITE-4644>. There are open
>> issues to support any sortable type [
>> https://issues.apache.org/jira/browse/CALCITE-4670 <
>> https://issues.apache.org/jira/browse/CALCITE-4670> ] and also to change
>> the implementation strategy [
>> https://issues.apache.org/jira/browse/CALCITE-4666 <
>> https://issues.apache.org/jira/browse/CALCITE-4666> ].
>> 
>> Does your case map onto any of those?
>> 
>> Is PERCENTILE_DISC able to deduce return types from its arguments today?
>> If so, how does it do it?
>> 
>> Julian
>> 
>> 
>>> On Aug 3, 2022, at 1:40 PM, Itiel Sadeh <it...@sqreamtech.com.INVALID>
>> wrote:
>>> 
>>> Hello,
>>> 
>>> First of all I just wanted to thank all of you for the work you are doing
>>> on Calcite.
>>> 
>>> We want to add support for the percentile_disc aggregate function, but we
>>> encounter a problem.
>>> Our percentile_disc return type is dependent on the column of the "order
>>> by" clause (just like postgresql
>>> <https://www.postgresql.org/docs/current/functions-aggregate.html>).
>>> However, I don't see how to achieve that on calcite. The issue is that
>> the
>>> sort column argument is not passed to the percentile_disc function,
>> Rather,
>>> it is stored outside of it as a collation. Therefore, I cannot use the
>>> SqlReturnTypeInference mechanism.
>>> 
>>> If someone has an idea on how to achieve this it would be very much
>>> appreciated.
>>> Thank you,
>>> 
>>> Itiel
>> 
>> 


Re: Percentile_Disc return type

Posted by Itiel Sadeh <it...@sqreamtech.com.INVALID>.
Hi Julian, thanks for your email.

I don't think those tickets will solve the problem as from what I
understand, they are referring to the input types, while my issue is with
the return type (although it might have the same root cause).

As you can see in the code
<https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L2304>,
right now the return type is hard coded as DOUBLE. The problem is that in
the following query:

> select percentile_disc(0.12) within group(order by x) from t;

The PERCENTILE_DISC argument will be only 0.12, and the argument "x" will
be outside of it (in the WITHIN_GROUP call). so the return type inference
cannot use the type of the column "x" when it's inferring the return type.

Itiel

On Thu, Aug 4, 2022 at 9:43 PM Julian Hyde <jh...@gmail.com> wrote:

> I believer that PERCENTILE_DISC was implemented in
> https://issues.apache.org/jira/browse/CALCITE-4644 <
> https://issues.apache.org/jira/browse/CALCITE-4644>. There are open
> issues to support any sortable type [
> https://issues.apache.org/jira/browse/CALCITE-4670 <
> https://issues.apache.org/jira/browse/CALCITE-4670> ] and also to change
> the implementation strategy [
> https://issues.apache.org/jira/browse/CALCITE-4666 <
> https://issues.apache.org/jira/browse/CALCITE-4666> ].
>
> Does your case map onto any of those?
>
> Is PERCENTILE_DISC able to deduce return types from its arguments today?
> If so, how does it do it?
>
> Julian
>
>
> > On Aug 3, 2022, at 1:40 PM, Itiel Sadeh <it...@sqreamtech.com.INVALID>
> wrote:
> >
> > Hello,
> >
> > First of all I just wanted to thank all of you for the work you are doing
> > on Calcite.
> >
> > We want to add support for the percentile_disc aggregate function, but we
> > encounter a problem.
> > Our percentile_disc return type is dependent on the column of the "order
> > by" clause (just like postgresql
> > <https://www.postgresql.org/docs/current/functions-aggregate.html>).
> > However, I don't see how to achieve that on calcite. The issue is that
> the
> > sort column argument is not passed to the percentile_disc function,
> Rather,
> > it is stored outside of it as a collation. Therefore, I cannot use the
> > SqlReturnTypeInference mechanism.
> >
> > If someone has an idea on how to achieve this it would be very much
> > appreciated.
> > Thank you,
> >
> > Itiel
>
>

Re: Percentile_Disc return type

Posted by Julian Hyde <jh...@gmail.com>.
I believer that PERCENTILE_DISC was implemented in https://issues.apache.org/jira/browse/CALCITE-4644 <https://issues.apache.org/jira/browse/CALCITE-4644>. There are open issues to support any sortable type [ https://issues.apache.org/jira/browse/CALCITE-4670 <https://issues.apache.org/jira/browse/CALCITE-4670> ] and also to change the implementation strategy [ https://issues.apache.org/jira/browse/CALCITE-4666 <https://issues.apache.org/jira/browse/CALCITE-4666> ].

Does your case map onto any of those?

Is PERCENTILE_DISC able to deduce return types from its arguments today? If so, how does it do it?

Julian


> On Aug 3, 2022, at 1:40 PM, Itiel Sadeh <it...@sqreamtech.com.INVALID> wrote:
> 
> Hello,
> 
> First of all I just wanted to thank all of you for the work you are doing
> on Calcite.
> 
> We want to add support for the percentile_disc aggregate function, but we
> encounter a problem.
> Our percentile_disc return type is dependent on the column of the "order
> by" clause (just like postgresql
> <https://www.postgresql.org/docs/current/functions-aggregate.html>).
> However, I don't see how to achieve that on calcite. The issue is that the
> sort column argument is not passed to the percentile_disc function, Rather,
> it is stored outside of it as a collation. Therefore, I cannot use the
> SqlReturnTypeInference mechanism.
> 
> If someone has an idea on how to achieve this it would be very much
> appreciated.
> Thank you,
> 
> Itiel