You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ryan Harris (JIRA)" <ji...@apache.org> on 2013/12/17 05:53:06 UTC

[jira] [Updated] (HIVE-6040) to_utc_timestamp() not intuitive when cluster timezone is not set to UTC

     [ https://issues.apache.org/jira/browse/HIVE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan Harris updated HIVE-6040:
------------------------------

    Description: 
unix_timestamp() is the root of string to timestamp conversion operations.

Therefore, when local cluster timezone is NOT set to UTC the results produced by to_utc_timestamp() may be confusing to the user.

The query below was run on a cluster with the local timezone set to 'America/Denver'
For reference, 1386000000 =  2013-12-02 16:00:00 GMT

SELECT
from_unixtime(1386000000) as test01,
  -- test01 =  2013-12-02 09:00:00 
  -- local timezone applies to from_unixtime()
unix_timestamp(from_unixtime(1386000000)) as test02, 
  -- test02 = 1386000000
  -- local timezone applies to both unix_timestamp and from_unixtime()
unix_timestamp('2013-12-02 16:00:00') as test03,
  -- test03 =  1386025200
  -- local timezone applies to from_unixtime()
from_utc_timestamp(1386000000, 'UTC') as test04,
  -- test04 = 1970-01-16 18:00:00
  -- This demonstrates the bug from HIVE-3454
cast(from_utc_timestamp(cast(1386000000 as DOUBLE),'UTC') as BIGINT) as test05, 
  -- test05 = 1386000000
  -- one work-around to HIVE-3454
cast(from_utc_timestamp(1386000000 * 1.0,'UTC') as BIGINT) as test06, 
  -- test06 = 1386000000
  -- a second work-around to HIVE-3454
cast(from_utc_timestamp(cast(cast(1386000000 as DOUBLE) as TIMESTAMP),'UTC') as BIGINT) as test07,
  --test07 = 1386000000
  --direct cast conversion of DOUBLE to TIMESTAMP works
cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test08,
  --test08 = 1386025200
  -- same result as test03, cast conversion uses local timezone setting
cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'UTC') as BIGINT) as test09,
  --test09 = 1386025200
  --same as test08 demonstrating consistency even when using from_utc_timestamp()
cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'America/Denver') as BIGINT) as test10,
  --test10 = 1386000000
  --not exactly intuitive, but this is the correct result
cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00'),'America/Denver') as BIGINT) as test11,
  --test11= 1360825
  --similar to test10, but producing HIVE-3454 bug
cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00') * 1.0,'America/Denver') as BIGINT) as test12,
  --test12=1386000000
  --like test10/test11 with HIVE-3454 work-around.
cast(to_utc_timestamp('2013-12-02 16:00:00','UTC') as BIGINT) as test13,
  --test13=1386025200
  --intuitively this seems to be the correct approach
  --the result is wrong.
cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test14,
  --test14=1386025200
  --this result could be confusing to the user
cast( from_utc_timestamp( cast( '2013-12-02 16:00:00' as TIMESTAMP ), '\${system:user.timezone}' ) as BIGINT ) as test15,
  --test15=1386000000
  --like test10, using hiveconf system variable substitution
cast( from_utc_timestamp( unix_timestamp( '2013-12-02 16:00:00') * 1.0, '\${system:user.timezone}' ) as BIGINT) as test16,
  --test16=1386000000
  --like test12, using hiveconf system variable substitution
cast( to_utc_timestamp( from_utc_timestamp( unix_timestamp( '2013-12-02 16:00:00' ) * 1.0, '\${system:user.timezone}' ), 'UTC') as BIGINT) as test17
  --test17=1386000000
  --universal work-around, produces results expected by test13
FROM dummy

To work with datetime string conversions using to_utc_timestamp in a way that produces consistent results, regardless of cluster timezone configuration, the following work-around is required:

to_utc_timestamp(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}'), '<logsource_timezone>')

One solution could be to add a utc_timestamp() udf to hive and for to_utc_timestamp() to use that for implicit datetime string conversions.


  was:
unix_timestamp() is the root of string to timestamp conversion operations.

Therefore, when local cluster timezone is NOT set to UTC the results produced by to_utc_timestamp() may be confusing to the user.

The query below was run on a cluster with the local timezone set to 'America/Denver'
For reference, 1386000000 =  2013-12-02 16:00:00 GMT

SELECT
from_unixtime(1386000000) as test01,
  -- test01 =  2013-12-02 09:00:00 
  -- local timezone applies to from_unixtime()
unix_timestamp(from_unixtime(1386000000)) as test02, 
  -- test02 = 1386000000
  -- local timezone applies to both unix_timestamp and from_unixtime()
unix_timestamp('2013-12-02 16:00:00') as test03,
  -- test03 =  1386025200
  -- local timezone applies to from_unixtime()
from_utc_timestamp(1386000000, 'UTC') as test04,
  -- test04 = 1970-01-16 18:00:00
  -- This demonstrates the bug from HIVE-3454
cast(from_utc_timestamp(cast(1386000000 as DOUBLE),'UTC') as BIGINT) as test05, 
  -- test05 = 1386000000
  -- one work-around to HIVE-3454
cast(from_utc_timestamp(1386000000 * 1.0,'UTC') as BIGINT) as test06, 
  -- test06 = 1386000000
  -- a second work-around to HIVE-3454
cast(from_utc_timestamp(cast(cast(1386000000 as DOUBLE) as TIMESTAMP),'UTC') as BIGINT) as test07,
  --test07 = 1386000000
  --direct cast conversion of DOUBLE to TIMESTAMP works
cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test08,
  --test08 = 1386025200
  -- same result as test03, cast conversion uses local timezone setting
cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'UTC') as BIGINT) as test09,
  --test09 = 1386025200
  --same as test08 demonstrating consistency even when using from_utc_timestamp()
cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'America/Denver') as BIGINT) as test10,
  --test10 = 1386000000
  --not exactly intuitive, but this is the correct result
cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00'),'America/Denver') as BIGINT) as test11,
  --test11= 1360825
  --similar to test10, but producing HIVE-3454 bug
cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00') * 1.0,'America/Denver') as BIGINT) as test12,
  --test12=1386000000
  --like test10/test11 with HIVE-3454 work-around.
cast(to_utc_timestamp('2013-12-02 16:00:00','UTC') as BIGINT) as test13,
  --test13=1386025200
  --intuitively this seems to be the correct approach
  --the result is wrong.
cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test14,
  --test14=1386025200
  --this result could be confusing to the user
cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'\${system:user.timezone}') as BIGINT) as test15,
  --test15=1386000000
  --like test10, using hiveconf system variable substitution
cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}') as BIGINT) as test16,
  --test16=1386000000
  --like test12, using hiveconf system variable substitution
cast(to_utc_timestamp(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}'), 'UTC') as BIGINT) as test17
  --test17=1386000000
  --universal work-around, produces results expected by test13
FROM dummy

To work with datetime string conversions using to_utc_timestamp in a way that produces consistent results, regardless of cluster timezone configuration, the following work-around is required:

to_utc_timestamp(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}'), '<logsource_timezone>')

One solution could be to add a utc_timestamp() udf to hive and for to_utc_timestamp() to use that for implicit datetime string conversions.



> to_utc_timestamp() not intuitive when cluster timezone is not set to UTC
> ------------------------------------------------------------------------
>
>                 Key: HIVE-6040
>                 URL: https://issues.apache.org/jira/browse/HIVE-6040
>             Project: Hive
>          Issue Type: Bug
>          Components: Types, UDF
>    Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.12.1
>            Reporter: Ryan Harris
>            Priority: Minor
>
> unix_timestamp() is the root of string to timestamp conversion operations.
> Therefore, when local cluster timezone is NOT set to UTC the results produced by to_utc_timestamp() may be confusing to the user.
> The query below was run on a cluster with the local timezone set to 'America/Denver'
> For reference, 1386000000 =  2013-12-02 16:00:00 GMT
> SELECT
> from_unixtime(1386000000) as test01,
>   -- test01 =  2013-12-02 09:00:00 
>   -- local timezone applies to from_unixtime()
> unix_timestamp(from_unixtime(1386000000)) as test02, 
>   -- test02 = 1386000000
>   -- local timezone applies to both unix_timestamp and from_unixtime()
> unix_timestamp('2013-12-02 16:00:00') as test03,
>   -- test03 =  1386025200
>   -- local timezone applies to from_unixtime()
> from_utc_timestamp(1386000000, 'UTC') as test04,
>   -- test04 = 1970-01-16 18:00:00
>   -- This demonstrates the bug from HIVE-3454
> cast(from_utc_timestamp(cast(1386000000 as DOUBLE),'UTC') as BIGINT) as test05, 
>   -- test05 = 1386000000
>   -- one work-around to HIVE-3454
> cast(from_utc_timestamp(1386000000 * 1.0,'UTC') as BIGINT) as test06, 
>   -- test06 = 1386000000
>   -- a second work-around to HIVE-3454
> cast(from_utc_timestamp(cast(cast(1386000000 as DOUBLE) as TIMESTAMP),'UTC') as BIGINT) as test07,
>   --test07 = 1386000000
>   --direct cast conversion of DOUBLE to TIMESTAMP works
> cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test08,
>   --test08 = 1386025200
>   -- same result as test03, cast conversion uses local timezone setting
> cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'UTC') as BIGINT) as test09,
>   --test09 = 1386025200
>   --same as test08 demonstrating consistency even when using from_utc_timestamp()
> cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'America/Denver') as BIGINT) as test10,
>   --test10 = 1386000000
>   --not exactly intuitive, but this is the correct result
> cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00'),'America/Denver') as BIGINT) as test11,
>   --test11= 1360825
>   --similar to test10, but producing HIVE-3454 bug
> cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00') * 1.0,'America/Denver') as BIGINT) as test12,
>   --test12=1386000000
>   --like test10/test11 with HIVE-3454 work-around.
> cast(to_utc_timestamp('2013-12-02 16:00:00','UTC') as BIGINT) as test13,
>   --test13=1386025200
>   --intuitively this seems to be the correct approach
>   --the result is wrong.
> cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test14,
>   --test14=1386025200
>   --this result could be confusing to the user
> cast( from_utc_timestamp( cast( '2013-12-02 16:00:00' as TIMESTAMP ), '\${system:user.timezone}' ) as BIGINT ) as test15,
>   --test15=1386000000
>   --like test10, using hiveconf system variable substitution
> cast( from_utc_timestamp( unix_timestamp( '2013-12-02 16:00:00') * 1.0, '\${system:user.timezone}' ) as BIGINT) as test16,
>   --test16=1386000000
>   --like test12, using hiveconf system variable substitution
> cast( to_utc_timestamp( from_utc_timestamp( unix_timestamp( '2013-12-02 16:00:00' ) * 1.0, '\${system:user.timezone}' ), 'UTC') as BIGINT) as test17
>   --test17=1386000000
>   --universal work-around, produces results expected by test13
> FROM dummy
> To work with datetime string conversions using to_utc_timestamp in a way that produces consistent results, regardless of cluster timezone configuration, the following work-around is required:
> to_utc_timestamp(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}'), '<logsource_timezone>')
> One solution could be to add a utc_timestamp() udf to hive and for to_utc_timestamp() to use that for implicit datetime string conversions.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)