You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2020/04/24 21:55:00 UTC

[jira] [Updated] (IMPALA-5121) avg() on timestamp col is wrong with -use_local_tz_for_unix_timestamp_conversions

     [ https://issues.apache.org/jira/browse/IMPALA-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe McDonnell updated IMPALA-5121:
----------------------------------
    Target Version: Impala 4.0  (was: Impala 3.4.0)

> avg() on timestamp col is wrong with -use_local_tz_for_unix_timestamp_conversions
> ---------------------------------------------------------------------------------
>
>                 Key: IMPALA-5121
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5121
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.5.0, Impala 2.2.10, Impala 2.3.4
>            Reporter: Matthew Jacobs
>            Priority: Major
>              Labels: timestamp
>
> The flag '-use_local_tz_for_unix_timestamp_conversions' was added for IMPALA-97. Enabling it results in timestamps sometimes being converted into localtime, but unfortunately this doesn't seem to be well defined when/where this conversion will happen.
> I've noticed that its use seems to break the avg() aggregate function on timestamp types (despite being an odd function on timestamps, it should still work).
> Impala by default, i.e. not enabling this flag:
> {code}
> [localhost:21000] > select timestamp_col from functional.alltypestiny;
> Query: select timestamp_col from functional.alltypestiny
> Query submitted at: 2017-03-27 18:50:57 (Coordinator: http://mj-desktop.ca.cloudera.com:25000)
> Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=8242bb6012948f06:143961ed00000000
> +---------------------+
> | timestamp_col       |
> +---------------------+
> | 2009-01-01 00:00:00 |
> | 2009-01-01 00:01:00 |
> | 2009-02-01 00:00:00 |
> | 2009-02-01 00:01:00 |
> | 2009-03-01 00:00:00 |
> | 2009-03-01 00:01:00 |
> | 2009-04-01 00:00:00 |
> | 2009-04-01 00:01:00 |
> +---------------------+
> Fetched 8 row(s) in 0.02s
> [localhost:21000] > select avg(timestamp_col) from functional.alltypestiny;
> Query: select avg(timestamp_col) from functional.alltypestiny
> Query submitted at: 2017-03-27 18:50:59 (Coordinator: http://mj-desktop.ca.cloudera.com:25000)
> Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=534f6ab59b201b5e:40e2a86d00000000
> +---------------------+
> | avg(timestamp_col)  |
> +---------------------+
> | 2009-02-14 23:45:30 |
> +---------------------+
> {code}
> Then enabling the flag results in the same timestamps returned when scanning, but evaluating them in avg() results in them being converted:
> {code}
> [localhost:21000] > select timestamp_col from functional.alltypestiny;
> Query: select timestamp_col from functional.alltypestiny
> Query submitted at: 2017-03-27 18:51:17 (Coordinator: http://mj-desktop.ca.cloudera.com:25000)
> Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=ac4ab8fd8caf4be9:ebb0834d00000000
> +---------------------+
> | timestamp_col       |
> +---------------------+
> | 2009-01-01 00:00:00 |
> | 2009-01-01 00:01:00 |
> | 2009-02-01 00:00:00 |
> | 2009-02-01 00:01:00 |
> | 2009-03-01 00:00:00 |
> | 2009-03-01 00:01:00 |
> | 2009-04-01 00:00:00 |
> | 2009-04-01 00:01:00 |
> +---------------------+
> Fetched 8 row(s) in 0.30s
> [localhost:21000] > select avg(timestamp_col) from functional.alltypestiny;
> Query: select avg(timestamp_col) from functional.alltypestiny
> Query submitted at: 2017-03-27 18:51:25 (Coordinator: http://mj-desktop.ca.cloudera.com:25000)
> Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=9e4e2c16896090f7:8922c4f200000000
> +---------------------+
> | avg(timestamp_col)  |
> +---------------------+
> | 2009-02-15 00:00:30 |
> +---------------------+
> Fetched 1 row(s) in 0.12s
> {code}
> This behavior seems inconsistent and I'm pretty sure is not intentional. There are two misleading functions on TimestampValue that will do this conversion when the flag is set: ToUnixTime() and ToSubsecondUnixTime(). avg() seems to have started using ToSubsecondUnixTime() after IMPALA-2914.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org