You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Marc Prud'hommeaux <ma...@glimpse.io> on 2017/11/01 16:41:43 UTC

Re: Rounding with AVG aggregate

Ahh, numeric with unspecified precision has the special meaning that it will retain whatever precision is stored in the column. From https://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL :

  "Specifying: NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale. (The SQL standard requires a default scale of 0, i.e., coercion to integer precision. We find this a bit useless. If you're concerned about portability, always specify the precision and scale explicitly.)"

So even though the JDBC driver reports a precision of zero, it actually means arbitrary precision when it is on a numeric/decimal column. I'm guessing that extending SqlDialect.getCastSpec(RelDataType) in PostgresqlSqlDialect is the right place to fix this?

	-Marc



> On Oct 31, 2017, at 11:38 AM, Julian Hyde <jh...@apache.org> wrote:
> 
> I don’t recall whether DECIMAL without precision + scale is even valid. (Or, if Calcite treats it a “valid”, maybe Calcite is wrong, and should be giving an error.)
> 
>> On Oct 31, 2017, at 7:07 AM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>> 
>> 
>> I understand, and I agree that the behavior is a sensible compromise. But in this case, price is a decimal column, and so the average call should also be a decimal, wheres it is being rounded to an integer (actually, a DECIMAL(19, 0) as you can see in the server log).
>> 
>> However, it looks like this might only be an issue with PostgreSQL:
>> 
>> 0: jdbc:calcite:schemaType=JDBC> select "price" from "products" limit 1;
>> +---------------------+
>> |        price        |
>> +---------------------+
>> | 25.99               | <-- price is a decimal...
>> +---------------------+
>> 1 row selected (0.217 seconds)
>> 
>> 0: jdbc:calcite:schemaType=JDBC> select avg("price") from "products";
>> +---------------------+
>> |       EXPR$0        |
>> +---------------------+
>> | 20                  | <-- ... but the average isn't
>> +---------------------+
>> 
>> 1 row selected (0.063 seconds)
>> 
>> 0: jdbc:calcite:schemaType=JDBC> select avg(cast("price" as decimal)) from "products";
>> +---------------------+
>> |       EXPR$0        |
>> +---------------------+
>> | 20                  | <-- casting to a non-precision decimal doesn't help...
>> +---------------------+
>> 1 row selected (0.067 seconds)
>> 
>> 0: jdbc:calcite:schemaType=JDBC> select avg(cast("price" as decimal(6,4))) from "products";
>> +--------+
>> | EXPR$0 |
>> +--------+
>> | 20.0151| <-- but specifying the precision does
>> +--------+
>> 1 row selected (0.066 seconds)
>> 
>> 
>> 
>> Perhaps Calcite is missing the precision of the column when it reads the metadata, or the driver is misreporting the precision? Because executing directly against the PostgreSQL driver yields the correct behavior:
>> 
>> 0: jdbc:postgresql://localhost/dvdstore> select avg(cast("price" as decimal)) from "products";
>> +-----------------------+
>> | 20.0151000000000000   |
>> +-----------------------+
>> 1 row selected (0.022 seconds)
>> 
>> 
>> I’ll dig a bit further and submit a PR if I can find a fix.
>> 
>> Thanks for your help!
>> 
>> 	-Marc
>> 
>> 
>> 
>>> On Oct 30, 2017, at 11:16 AM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>> The bug explains the current behavior. The behavior is not what everyone would like, but it is what it is. I happen to like it because it is simple. The behavior is this: if have a column c of type T, then AVG(c) will have type T. If c is an INTEGER, then AVG will return an INTEGER. If you cast that result to DOUBLE, surprise surprise, that DOUBLE has no fractional part.
>>> 
>>> The solution is to convert the column before applying AVG: AVG(CAST(c AS DOUBLE)) will return DOUBLE.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Oct 30, 2017, at 5:54 AM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>>>> 
>>>> 
>>>> I had noticed that issue, but it purports to be closed for 1.14.0, which I am using.
>>>> 
>>>> It only seems to affect AVG; other aggregates don’t appear to be rounded. E.g.:
>>>> 
>>>> 0: jdbc:calcite:schemaType=JDBC> select min("price"), sum("price"), avg("price") from "products";
>>>> +---------------------+---------------------+---------------------+
>>>> |       EXPR$0        |       EXPR$1        |       EXPR$2        |
>>>> +---------------------+---------------------+---------------------+
>>>> | 9.99                | 200151.00           | 20                  |
>>>> +---------------------+---------------------+---------------------+
>>>> 
>>>> yields the server-side log:
>>>> 
>>>> 2017-10-30 08:16:00 EDT [4272-13] dgdemo@dvdstore LOG:  execute <unnamed>: SELECT MIN("price"), CASE WHEN COUNT(*) = 0 THEN NULL ELSE SUM("price") END, CAST(CASE WHEN COUNT(*) = 0 THEN NULL ELSE SUM("price") END / COUNT(*) AS DECIMAL(19, 0)) FROM “products"
>>>> 
>>>> Is this something I can get around by implementing my own RelDataTypeSystem? If so, I’ll experiment with that.
>>>> 
>>>> 	-Marc
>>>> 
>>>> 
>>>> 
>>>>> On Oct 29, 2017, at 7:11 PM, Julian Hyde <jh...@apache.org> wrote:
>>>>> 
>>>>> See https://issues.apache.org/jira/browse/CALCITE-1945 <https://issues.apache.org/jira/browse/CALCITE-1945>.
>>>>> 
>>>>>> On Oct 29, 2017, at 3:31 PM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>>>>>> 
>>>>>> 
>>>>>> When I run the following against a Calcite connection containing the PostgreSQL "dvdstore" sample database:
>>>>>> 
>>>>>> select avg(products.price) from dvdstore.products group by products.category
>>>>>> 
>>>>>> The following SQL is executed on the server:
>>>>>> 
>>>>>> SELECT CAST(SUM("price") / COUNT(*) AS DECIMAL(19, 0)) FROM "products" GROUP BY “category"
>>>>>> 
>>>>>> Is there some way I can prevent Calcite from rounding it (price is a decimal type)? Is there some reason it isn’t just sending the aggregate as an AVG?
>>>>>> 
>>>>>> 	-Marc
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: Rounding with AVG aggregate

Posted by Marc Prud'hommeaux <ma...@glimpse.io>.
Filed: https://issues.apache.org/jira/browse/CALCITE-2030


> On Nov 1, 2017, at 1:20 PM, Julian Hyde <jh...@apache.org> wrote:
> 
> Sounds plausible. You know far more of the details than I do.
> 
> Please log a JIRA case and let’s continue discussion there.
> 
>> On Nov 1, 2017, at 9:41 AM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>> 
>> 
>> Ahh, numeric with unspecified precision has the special meaning that it will retain whatever precision is stored in the column. From https://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL :
>> 
>> "Specifying: NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale. (The SQL standard requires a default scale of 0, i.e., coercion to integer precision. We find this a bit useless. If you're concerned about portability, always specify the precision and scale explicitly.)"
>> 
>> So even though the JDBC driver reports a precision of zero, it actually means arbitrary precision when it is on a numeric/decimal column. I'm guessing that extending SqlDialect.getCastSpec(RelDataType) in PostgresqlSqlDialect is the right place to fix this?
>> 
>> 	-Marc
>> 
>> 
>> 
>>> On Oct 31, 2017, at 11:38 AM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>> I don’t recall whether DECIMAL without precision + scale is even valid. (Or, if Calcite treats it a “valid”, maybe Calcite is wrong, and should be giving an error.)
>>> 
>>>> On Oct 31, 2017, at 7:07 AM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>>>> 
>>>> 
>>>> I understand, and I agree that the behavior is a sensible compromise. But in this case, price is a decimal column, and so the average call should also be a decimal, wheres it is being rounded to an integer (actually, a DECIMAL(19, 0) as you can see in the server log).
>>>> 
>>>> However, it looks like this might only be an issue with PostgreSQL:
>>>> 
>>>> 0: jdbc:calcite:schemaType=JDBC> select "price" from "products" limit 1;
>>>> +---------------------+
>>>> |        price        |
>>>> +---------------------+
>>>> | 25.99               | <-- price is a decimal...
>>>> +---------------------+
>>>> 1 row selected (0.217 seconds)
>>>> 
>>>> 0: jdbc:calcite:schemaType=JDBC> select avg("price") from "products";
>>>> +---------------------+
>>>> |       EXPR$0        |
>>>> +---------------------+
>>>> | 20                  | <-- ... but the average isn't
>>>> +---------------------+
>>>> 
>>>> 1 row selected (0.063 seconds)
>>>> 
>>>> 0: jdbc:calcite:schemaType=JDBC> select avg(cast("price" as decimal)) from "products";
>>>> +---------------------+
>>>> |       EXPR$0        |
>>>> +---------------------+
>>>> | 20                  | <-- casting to a non-precision decimal doesn't help...
>>>> +---------------------+
>>>> 1 row selected (0.067 seconds)
>>>> 
>>>> 0: jdbc:calcite:schemaType=JDBC> select avg(cast("price" as decimal(6,4))) from "products";
>>>> +--------+
>>>> | EXPR$0 |
>>>> +--------+
>>>> | 20.0151| <-- but specifying the precision does
>>>> +--------+
>>>> 1 row selected (0.066 seconds)
>>>> 
>>>> 
>>>> 
>>>> Perhaps Calcite is missing the precision of the column when it reads the metadata, or the driver is misreporting the precision? Because executing directly against the PostgreSQL driver yields the correct behavior:
>>>> 
>>>> 0: jdbc:postgresql://localhost/dvdstore> select avg(cast("price" as decimal)) from "products";
>>>> +-----------------------+
>>>> | 20.0151000000000000   |
>>>> +-----------------------+
>>>> 1 row selected (0.022 seconds)
>>>> 
>>>> 
>>>> I’ll dig a bit further and submit a PR if I can find a fix.
>>>> 
>>>> Thanks for your help!
>>>> 
>>>> 	-Marc
>>>> 
>>>> 
>>>> 
>>>>> On Oct 30, 2017, at 11:16 AM, Julian Hyde <jh...@apache.org> wrote:
>>>>> 
>>>>> The bug explains the current behavior. The behavior is not what everyone would like, but it is what it is. I happen to like it because it is simple. The behavior is this: if have a column c of type T, then AVG(c) will have type T. If c is an INTEGER, then AVG will return an INTEGER. If you cast that result to DOUBLE, surprise surprise, that DOUBLE has no fractional part.
>>>>> 
>>>>> The solution is to convert the column before applying AVG: AVG(CAST(c AS DOUBLE)) will return DOUBLE.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Oct 30, 2017, at 5:54 AM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>>>>>> 
>>>>>> 
>>>>>> I had noticed that issue, but it purports to be closed for 1.14.0, which I am using.
>>>>>> 
>>>>>> It only seems to affect AVG; other aggregates don’t appear to be rounded. E.g.:
>>>>>> 
>>>>>> 0: jdbc:calcite:schemaType=JDBC> select min("price"), sum("price"), avg("price") from "products";
>>>>>> +---------------------+---------------------+---------------------+
>>>>>> |       EXPR$0        |       EXPR$1        |       EXPR$2        |
>>>>>> +---------------------+---------------------+---------------------+
>>>>>> | 9.99                | 200151.00           | 20                  |
>>>>>> +---------------------+---------------------+---------------------+
>>>>>> 
>>>>>> yields the server-side log:
>>>>>> 
>>>>>> 2017-10-30 08:16:00 EDT [4272-13] dgdemo@dvdstore LOG:  execute <unnamed>: SELECT MIN("price"), CASE WHEN COUNT(*) = 0 THEN NULL ELSE SUM("price") END, CAST(CASE WHEN COUNT(*) = 0 THEN NULL ELSE SUM("price") END / COUNT(*) AS DECIMAL(19, 0)) FROM “products"
>>>>>> 
>>>>>> Is this something I can get around by implementing my own RelDataTypeSystem? If so, I’ll experiment with that.
>>>>>> 
>>>>>> 	-Marc
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Oct 29, 2017, at 7:11 PM, Julian Hyde <jh...@apache.org> wrote:
>>>>>>> 
>>>>>>> See https://issues.apache.org/jira/browse/CALCITE-1945 <https://issues.apache.org/jira/browse/CALCITE-1945>.
>>>>>>> 
>>>>>>>> On Oct 29, 2017, at 3:31 PM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> When I run the following against a Calcite connection containing the PostgreSQL "dvdstore" sample database:
>>>>>>>> 
>>>>>>>> select avg(products.price) from dvdstore.products group by products.category
>>>>>>>> 
>>>>>>>> The following SQL is executed on the server:
>>>>>>>> 
>>>>>>>> SELECT CAST(SUM("price") / COUNT(*) AS DECIMAL(19, 0)) FROM "products" GROUP BY “category"
>>>>>>>> 
>>>>>>>> Is there some way I can prevent Calcite from rounding it (price is a decimal type)? Is there some reason it isn’t just sending the aggregate as an AVG?
>>>>>>>> 
>>>>>>>> 	-Marc
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: Rounding with AVG aggregate

Posted by Julian Hyde <jh...@apache.org>.
Sounds plausible. You know far more of the details than I do.

Please log a JIRA case and let’s continue discussion there.

> On Nov 1, 2017, at 9:41 AM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
> 
> 
> Ahh, numeric with unspecified precision has the special meaning that it will retain whatever precision is stored in the column. From https://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL :
> 
>  "Specifying: NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale. (The SQL standard requires a default scale of 0, i.e., coercion to integer precision. We find this a bit useless. If you're concerned about portability, always specify the precision and scale explicitly.)"
> 
> So even though the JDBC driver reports a precision of zero, it actually means arbitrary precision when it is on a numeric/decimal column. I'm guessing that extending SqlDialect.getCastSpec(RelDataType) in PostgresqlSqlDialect is the right place to fix this?
> 
> 	-Marc
> 
> 
> 
>> On Oct 31, 2017, at 11:38 AM, Julian Hyde <jh...@apache.org> wrote:
>> 
>> I don’t recall whether DECIMAL without precision + scale is even valid. (Or, if Calcite treats it a “valid”, maybe Calcite is wrong, and should be giving an error.)
>> 
>>> On Oct 31, 2017, at 7:07 AM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>>> 
>>> 
>>> I understand, and I agree that the behavior is a sensible compromise. But in this case, price is a decimal column, and so the average call should also be a decimal, wheres it is being rounded to an integer (actually, a DECIMAL(19, 0) as you can see in the server log).
>>> 
>>> However, it looks like this might only be an issue with PostgreSQL:
>>> 
>>> 0: jdbc:calcite:schemaType=JDBC> select "price" from "products" limit 1;
>>> +---------------------+
>>> |        price        |
>>> +---------------------+
>>> | 25.99               | <-- price is a decimal...
>>> +---------------------+
>>> 1 row selected (0.217 seconds)
>>> 
>>> 0: jdbc:calcite:schemaType=JDBC> select avg("price") from "products";
>>> +---------------------+
>>> |       EXPR$0        |
>>> +---------------------+
>>> | 20                  | <-- ... but the average isn't
>>> +---------------------+
>>> 
>>> 1 row selected (0.063 seconds)
>>> 
>>> 0: jdbc:calcite:schemaType=JDBC> select avg(cast("price" as decimal)) from "products";
>>> +---------------------+
>>> |       EXPR$0        |
>>> +---------------------+
>>> | 20                  | <-- casting to a non-precision decimal doesn't help...
>>> +---------------------+
>>> 1 row selected (0.067 seconds)
>>> 
>>> 0: jdbc:calcite:schemaType=JDBC> select avg(cast("price" as decimal(6,4))) from "products";
>>> +--------+
>>> | EXPR$0 |
>>> +--------+
>>> | 20.0151| <-- but specifying the precision does
>>> +--------+
>>> 1 row selected (0.066 seconds)
>>> 
>>> 
>>> 
>>> Perhaps Calcite is missing the precision of the column when it reads the metadata, or the driver is misreporting the precision? Because executing directly against the PostgreSQL driver yields the correct behavior:
>>> 
>>> 0: jdbc:postgresql://localhost/dvdstore> select avg(cast("price" as decimal)) from "products";
>>> +-----------------------+
>>> | 20.0151000000000000   |
>>> +-----------------------+
>>> 1 row selected (0.022 seconds)
>>> 
>>> 
>>> I’ll dig a bit further and submit a PR if I can find a fix.
>>> 
>>> Thanks for your help!
>>> 
>>> 	-Marc
>>> 
>>> 
>>> 
>>>> On Oct 30, 2017, at 11:16 AM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>> The bug explains the current behavior. The behavior is not what everyone would like, but it is what it is. I happen to like it because it is simple. The behavior is this: if have a column c of type T, then AVG(c) will have type T. If c is an INTEGER, then AVG will return an INTEGER. If you cast that result to DOUBLE, surprise surprise, that DOUBLE has no fractional part.
>>>> 
>>>> The solution is to convert the column before applying AVG: AVG(CAST(c AS DOUBLE)) will return DOUBLE.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Oct 30, 2017, at 5:54 AM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>>>>> 
>>>>> 
>>>>> I had noticed that issue, but it purports to be closed for 1.14.0, which I am using.
>>>>> 
>>>>> It only seems to affect AVG; other aggregates don’t appear to be rounded. E.g.:
>>>>> 
>>>>> 0: jdbc:calcite:schemaType=JDBC> select min("price"), sum("price"), avg("price") from "products";
>>>>> +---------------------+---------------------+---------------------+
>>>>> |       EXPR$0        |       EXPR$1        |       EXPR$2        |
>>>>> +---------------------+---------------------+---------------------+
>>>>> | 9.99                | 200151.00           | 20                  |
>>>>> +---------------------+---------------------+---------------------+
>>>>> 
>>>>> yields the server-side log:
>>>>> 
>>>>> 2017-10-30 08:16:00 EDT [4272-13] dgdemo@dvdstore LOG:  execute <unnamed>: SELECT MIN("price"), CASE WHEN COUNT(*) = 0 THEN NULL ELSE SUM("price") END, CAST(CASE WHEN COUNT(*) = 0 THEN NULL ELSE SUM("price") END / COUNT(*) AS DECIMAL(19, 0)) FROM “products"
>>>>> 
>>>>> Is this something I can get around by implementing my own RelDataTypeSystem? If so, I’ll experiment with that.
>>>>> 
>>>>> 	-Marc
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Oct 29, 2017, at 7:11 PM, Julian Hyde <jh...@apache.org> wrote:
>>>>>> 
>>>>>> See https://issues.apache.org/jira/browse/CALCITE-1945 <https://issues.apache.org/jira/browse/CALCITE-1945>.
>>>>>> 
>>>>>>> On Oct 29, 2017, at 3:31 PM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> When I run the following against a Calcite connection containing the PostgreSQL "dvdstore" sample database:
>>>>>>> 
>>>>>>> select avg(products.price) from dvdstore.products group by products.category
>>>>>>> 
>>>>>>> The following SQL is executed on the server:
>>>>>>> 
>>>>>>> SELECT CAST(SUM("price") / COUNT(*) AS DECIMAL(19, 0)) FROM "products" GROUP BY “category"
>>>>>>> 
>>>>>>> Is there some way I can prevent Calcite from rounding it (price is a decimal type)? Is there some reason it isn’t just sending the aggregate as an AVG?
>>>>>>> 
>>>>>>> 	-Marc
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>