You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Gabriele Del Prete (JIRA)" <ji...@apache.org> on 2016/10/17 17:28:58 UTC

[jira] [Created] (SPARK-17971) Unix timestamp handling in Spark SQL not allowing calculations on UTC times

Gabriele Del Prete created SPARK-17971:
------------------------------------------

             Summary: Unix timestamp handling in Spark SQL not allowing calculations on UTC times
                 Key: SPARK-17971
                 URL: https://issues.apache.org/jira/browse/SPARK-17971
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, SQL
    Affects Versions: 1.6.2
         Environment: MacOS X JDK 7
            Reporter: Gabriele Del Prete


In our Spark data pipeline we store timed events using a bigint column called 'timestamp', the values contained being Unix timestamp time points.

Our datacenter servers Java VMs are all set up to start with timezone set to UTC, while developer's computers are all in the US Eastern timezone. 

Given how Spark SQL datetime functions work, it's impossible to do calculations (eg. extract and compare hours, year-month-date triplets) using UTC values:
- from_unixtime takes a bigint unix timestamp and forces it to the computer's local timezone;
- casting the bigint column to timestamp does the same (it converts it to the local timezone)
- from_utc_timestamp works in the same way, the only difference being that it gets a string as input instead of a bigint.

The result of all of this is that it's impossible to extract individual fields of a UTC timestamp, since all timestamp always get converted to the local timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org