You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bill Schneider (Jira)" <ji...@apache.org> on 2022/07/05 17:08:00 UTC
[jira] [Comment Edited] (SPARK-35662) Support Timestamp without time zone data type

    [ https://issues.apache.org/jira/browse/SPARK-35662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562724#comment-17562724 ] 

Bill Schneider edited comment on SPARK-35662 at 7/5/22 5:07 PM:
----------------------------------------------------------------

Is this delayed until 3.4.0?  It did not appear to work in Spark 3.3.  

However, Spark 3.4.0-SNAPSHOT appears to do exactly what I wanted it to:

`cast(string, DataTypes.TimestampNTZType)`

when written to Parquet, will be exactly the same timestamp when read from a Spark session in a different timezone.


was (Author: wrschneider99):
Is this delayed until 3.4.0?  It did not appear to work in Spark 3.3

> Support Timestamp without time zone data type
> ---------------------------------------------
>
>                 Key: SPARK-35662
>                 URL: https://issues.apache.org/jira/browse/SPARK-35662
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Gengliang Wang
>            Assignee: Apache Spark
>            Priority: Major
>
> Spark SQL today supports the TIMESTAMP data type. However the semantics provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. Timestamps embedded in a SQL query or passed through JDBC are presumed to be in session local timezone and cast to UTC before being processed.
>  These are desirable semantics in many cases, such as when dealing with calendars.
>  In many (more) other cases, such as when dealing with log files it is desirable that the provided timestamps not be altered.
>  SQL users expect that they can model either behavior and do so by using TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH LOCAL TIME ZONE for time zone sensitive data.
>  Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not exist in the standard.
> In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for standard semantic.
>  Using these two types will provide clarity.
>  We will also allow users to set the default behavior for TIMESTAMP to either use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE.
> h3. Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type TimestampWithoutTZ meets or exceeds all function of the existing SQL Timestamp):
>  * Add a new DataType implementation for TimestampWithoutTZ.
>  * Support TimestampWithoutTZ in Dataset/UDF.
>  * TimestampWithoutTZ literals
>  * TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - TimestampWithoutTZ, TimestampWithoutTZ - Date)
>  * Datetime functions/operators: dayofweek, weekofyear, year, etc
>  * Cast to and from TimestampWithoutTZ, cast String/Timestamp to TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty printing)/Timestamp, with the SQL syntax to specify the types
>  * Support sorting TimestampWithoutTZ.
> h3. Milestone 2 – Persistence:
>  * Ability to create tables of type TimestampWithoutTZ
>  * Ability to write to common file formats such as Parquet and JSON.
>  * INSERT, SELECT, UPDATE, MERGE
>  * Discovery
> h3. Milestone 3 – Client support
>  * JDBC support
>  * Hive Thrift server
> h3. Milestone 4 – PySpark and Spark R integration
>  * Python UDF can take and return TimestampWithoutTZ
>  * DataFrame support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org