You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2015/08/25 10:01:46 UTC

[jira] [Resolved] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

     [ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheng Lian resolved SPARK-10177.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.0

Issue resolved by pull request 8400
[https://github.com/apache/spark/pull/8400]

> Parquet support interprets timestamp values differently from Hive 0.14.0+
> -------------------------------------------------------------------------
>
>                 Key: SPARK-10177
>                 URL: https://issues.apache.org/jira/browse/SPARK-10177
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Cheng Lian
>            Assignee: Davies Liu
>            Priority: Blocker
>             Fix For: 1.5.0
>
>         Attachments: 000000_0
>
>
> Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 1.2.1):
> {code:sql}
> CREATE TABLE ts_test STORED AS PARQUET
> AS SELECT CAST("2015-01-01 00:00:00" AS TIMESTAMP);
> {code}
> Then read the Parquet file generated by Hive with Spark SQL:
> {noformat}
> scala> sqlContext.read.parquet("hdfs://localhost:9000/user/hive/warehouse_hive14/ts_test").collect()
> res1: Array[org.apache.spark.sql.Row] = Array([2015-01-01 12:00:00.0])
> {noformat}
> This issue can be easily reproduced with [this test case in PR #8392|https://github.com/apache/spark/pull/8392/files#diff-1e55698cc579cbae676f827a89c2dc2eR116].
> Spark 1.4.1 works as expected in this case.
> ----
> Update:
> Seems that the problem is that we do Julian day conversion wrong in {{DateTimeUtils}}.  The following {{spark-shell}} session illustrates it:
> {code}
> import java.sql._
> import java.util._
> import org.apache.hadoop.hive.ql.io.parquet.timestamp._
> import org.apache.spark.sql.catalyst.util._
> TimeZone.setDefault(TimeZone.getTimeZone("GMT"))
> val ts = Timestamp.valueOf("1970-01-01 00:00:00")
> val nt = NanoTimeUtils.getNanoTime(ts, false)
> val jts = DateTimeUtils.fromJulianDay(nt.getJulianDay, nt.getTimeOfDayNanos)
> DateTimeUtils.toJavaTimestamp(jts)
> // ==> java.sql.Timestamp = 1970-01-01 12:00:00.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org