You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vitaly Gerasimov (JIRA)" <ji...@apache.org> on 2017/10/27 08:35:00 UTC

[jira] [Commented] (SPARK-17971) Unix timestamp handling in Spark SQL not allowing calculations on UTC times

    [ https://issues.apache.org/jira/browse/SPARK-17971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221941#comment-16221941 ] 

Vitaly Gerasimov commented on SPARK-17971:
------------------------------------------

Seems unexpected for me too. I don't understand why *cast(unix_timestamp() as timestamp)* should be in the server local timezone? Peoples that use Spark SQL API doesn't know what default timezone server has and it may be unexpected for them when *cast(unix_timestamp() as timestamp)* will be converted to sever local timezone without timezone offset or abbreviation.
{quote}
unix_timestamp - 1509089047
cast(unix_timestamp() as timestamp) - 2017-10-27 14:24:07.0
{quote}

Perhaps we should reconsider what the “fundamental” problem really is.

> Unix timestamp handling in Spark SQL not allowing calculations on UTC times
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-17971
>                 URL: https://issues.apache.org/jira/browse/SPARK-17971
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.6.2
>         Environment: MacOS X JDK 7
>            Reporter: Gabriele Del Prete
>
> In our Spark data pipeline we store timed events using a bigint column called 'timestamp', the values contained being Unix timestamp time points.
> Our datacenter servers Java VMs are all set up to start with timezone set to UTC, while developer's computers are all in the US Eastern timezone. 
> Given how Spark SQL datetime functions work, it's impossible to do calculations (eg. extract and compare hours, year-month-date triplets) using UTC values:
> - from_unixtime takes a bigint unix timestamp and forces it to the computer's local timezone;
> - casting the bigint column to timestamp does the same (it converts it to the local timezone)
> - from_utc_timestamp works in the same way, the only difference being that it gets a string as input instead of a bigint.
> The result of all of this is that it's impossible to extract individual fields of a UTC timestamp, since all timestamp always get converted to the local timezone.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org