You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jason C Lee (JIRA)" <ji...@apache.org> on 2016/01/12 21:32:39 UTC
[jira] [Commented] (SPARK-12683) SQL timestamp is wrong when
accessed as Python datetime
[ https://issues.apache.org/jira/browse/SPARK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094877#comment-15094877 ]
Jason C Lee commented on SPARK-12683:
-------------------------------------
For me on my machine it's an hour of difference. The difference does not always occur, as it depends on the specified year, month, and day. I will trace through the code and get to the bottom of it.
> SQL timestamp is wrong when accessed as Python datetime
> -------------------------------------------------------
>
> Key: SPARK-12683
> URL: https://issues.apache.org/jira/browse/SPARK-12683
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.5.1, 1.5.2, 1.6.0
> Environment: Windows 7 Pro x64
> Python 3.4.3
> py4j 0.9
> Reporter: Gerhard Fiedler
> Attachments: spark_bug_date.py
>
>
> When accessing SQL timestamp data through {{.show()}}, it looks correct, but when accessing it (as Python {{datetime}}) through {{.collect()}}, it is wrong.
> {code}
> from datetime import datetime
> from pyspark import SparkContext
> from pyspark.sql import SQLContext
> if __name__ == "__main__":
> spark_context = SparkContext(appName='SparkBugTimestampHour')
> sql_context = SQLContext(spark_context)
> sql_text = """select cast('2100-09-09 12:11:10.09' as timestamp) as ts"""
> data_frame = sql_context.sql(sql_text)
> data_frame.show(truncate=False)
> # Result from .show() (as expected, looks correct):
> # +----------------------+
> # |ts |
> # +----------------------+
> # |2100-09-09 12:11:10.09|
> # +----------------------+
> rows = data_frame.collect()
> row = rows[0]
> ts = row[0]
> print('ts={ts}'.format(ts=ts))
> # Expected result from this print statement:
> # ts=2100-09-09 12:11:10.090000
> #
> # Actual, wrong result (note the hours being 18 instead of 12):
> # ts=2100-09-09 18:11:10.090000
> #
> # This error seems to be dependent on some characteristic of the system. We couldn't reproduce
> # this on all of our systems, but it is not clear what the differences are. One difference is
> # the processor: it failed on Intel Xeon E5-2687W v2.
> assert isinstance(ts, datetime)
> assert ts.year == 2100 and ts.month == 9 and ts.day == 9
> assert ts.minute == 11 and ts.second == 10 and ts.microsecond == 90000
> if ts.hour != 12:
> print('hour is not correct; should be 12, is actually {hour}'.format(hour=ts.hour))
> spark_context.stop()
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org