You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Stamatis Zampetakis <za...@gmail.com> on 2022/11/22 10:26:22 UTC

Re: Issue for discussion (slight change in behavior): HIVE-26683

Hello,

Regarding the inconsistency you describe in the window function, indeed it
seems to be a bug. However, I would double-check with the SQL standard
to be sure there is no intentional deviation and/or test the query in
different DBMS.

As far as it concerns the behavior of the aggregate function SUM on
string/varchar types the SQL standard forbids this operation (small extract
below).

10.9 <aggregate function>

Syntax Rules
5g) If SUM or AVG is specified, then:
i) DT shall be a numeric type or an interval type.

General Rules
6)d)v) If SUM is specified, then the result is the sum of the values in
TXA. If the sum is not within the
range of the declared type of the result, then an exception condition is
raised: data exception — numeric value out of range.

As you observed, Postgres is inline with the standard and forbids this
operation but this is not the case for every DBMS. Note that Hive is closer
to MySQL than it is to Postgres so in many cases it makes sense to use it
as a reference.
Below, I outline the results on 8.0.27 MySQL Community Server.

select sum('a') from tblstrcol;
+----------+
| sum('a') |
+----------+
|        0 |
+----------+

select sum('a') from tblstrcol where false;
+----------+
| sum('a') |
+----------+
|     NULL |
+----------+

When there are rows the result of SUM is zero, and NULL when the result set
is empty thus I am a bit skeptical about changing the existing behavior.

Best,
Stamatis


On Mon, Nov 21, 2022 at 3:53 PM Stephen Carlin <sc...@cloudera.com> wrote:

> Wanted to throw this one out for discussion for a bug I found  and how to
> fix it...
>
> So we are inconsistent with how we handle sum() on windowing functions.
> If all the rows are null and the rows are all on "preceding" rows, we
> return NULL.  On "following" rows, however, if all the rows are null, we
> return 0.  This is inconsistent and I have a fix for that so that we always
> return null.  The fix I have is here (not yet reviewed):
> https://github.com/apache/hive/pull/3789
>
> My discussion though lies in a different problem which you can see in the
> patch I uploaded.  My current fix changes behavior of the following
> statement:  "select sum('a') from my_table".  If my_table has rows, right
> now we are return 0.0.
>
> I've looked on postgres and it doesn't even allow a sum on a string column
> so I can't really compare to that database.  My current fix doesn't disable
> this, but it does change the behavior to return NULL on this select.
>
> I kinda feel that returning NULL is more correct than return 0, but I
> wanted to throw this out there to see what y'all think.  This would be a
> change in behavior and that makes me nervous.
>
> Thanks!
>

Re: Issue for discussion (slight change in behavior): HIVE-26683

Posted by Pau Tallada <ta...@pic.es>.
Hi,

I agree that for the windowing function, in the absence of any value the
sum shall be NULL, not 0.

Cheers,

Pau.

Missatge de Stamatis Zampetakis <za...@gmail.com> del dia dt., 22 de nov.
2022 a les 11:26:

> Hello,
>
> Regarding the inconsistency you describe in the window function, indeed it
> seems to be a bug. However, I would double-check with the SQL standard
> to be sure there is no intentional deviation and/or test the query in
> different DBMS.
>
> As far as it concerns the behavior of the aggregate function SUM on
> string/varchar types the SQL standard forbids this operation (small extract
> below).
>
> 10.9 <aggregate function>
>
> Syntax Rules
> 5g) If SUM or AVG is specified, then:
> i) DT shall be a numeric type or an interval type.
>
> General Rules
> 6)d)v) If SUM is specified, then the result is the sum of the values in
> TXA. If the sum is not within the
> range of the declared type of the result, then an exception condition is
> raised: data exception — numeric value out of range.
>
> As you observed, Postgres is inline with the standard and forbids this
> operation but this is not the case for every DBMS. Note that Hive is closer
> to MySQL than it is to Postgres so in many cases it makes sense to use it
> as a reference.
> Below, I outline the results on 8.0.27 MySQL Community Server.
>
> select sum('a') from tblstrcol;
> +----------+
> | sum('a') |
> +----------+
> |        0 |
> +----------+
>
> select sum('a') from tblstrcol where false;
> +----------+
> | sum('a') |
> +----------+
> |     NULL |
> +----------+
>
> When there are rows the result of SUM is zero, and NULL when the result
> set is empty thus I am a bit skeptical about changing the existing behavior.
>
> Best,
> Stamatis
>
>
> On Mon, Nov 21, 2022 at 3:53 PM Stephen Carlin <sc...@cloudera.com>
> wrote:
>
>> Wanted to throw this one out for discussion for a bug I found  and how to
>> fix it...
>>
>> So we are inconsistent with how we handle sum() on windowing functions.
>> If all the rows are null and the rows are all on "preceding" rows, we
>> return NULL.  On "following" rows, however, if all the rows are null, we
>> return 0.  This is inconsistent and I have a fix for that so that we always
>> return null.  The fix I have is here (not yet reviewed):
>> https://github.com/apache/hive/pull/3789
>>
>> My discussion though lies in a different problem which you can see in the
>> patch I uploaded.  My current fix changes behavior of the following
>> statement:  "select sum('a') from my_table".  If my_table has rows, right
>> now we are return 0.0.
>>
>> I've looked on postgres and it doesn't even allow a sum on a string
>> column so I can't really compare to that database.  My current fix doesn't
>> disable this, but it does change the behavior to return NULL on this select.
>>
>>
>> I kinda feel that returning NULL is more correct than return 0, but I
>> wanted to throw this out there to see what y'all think.  This would be a
>> change in behavior and that makes me nervous.
>>
>> Thanks!
>>
>

-- 
----------------------------------
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
----------------------------------