You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2016/04/28 00:30:13 UTC
[jira] [Commented] (SPARK-13837) SQL Context function to_date()
returns wrong date
[ https://issues.apache.org/jira/browse/SPARK-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261084#comment-15261084 ]
Davies Liu commented on SPARK-13837:
------------------------------------
@Amaud Caruso I'm in the same time zone as you , but can't reproduce you issue in both 1.6 branch and master.
{code}
>>> import datetime
>>> data = [[datetime.datetime(2015, 2, 20, 0, 0, 2)],[datetime.datetime(2015, 10, 9, 0, 0, 2)]]
>>> rddData = sc.parallelize(data)
fields=[StructField('timestamp', TimestampType(), True)]
schema=StructType(fields)
data_table=sqlCtx.createDataFrame(data,schema)
sqlCtx.registerDataFrameAsTable(data_table,"data")
query="SELECT timestamp, TO_DATE(timestamp) FROM data "
df=sqlCtx.sql(query)
df.collect()
>>> fields=[StructField('timestamp', TimestampType(), True)]
>>> schema=StructType(fields)
>>> data_table=sqlCtx.createDataFrame(data,schema)
>>> sqlCtx.registerDataFrameAsTable(data_table,"data")
>>> query="SELECT timestamp, TO_DATE(timestamp) FROM data "
>>> df=sqlCtx.sql(query)
>>> df.collect()
[Row(timestamp=datetime.datetime(2015, 2, 20, 0, 0, 2), to_date(CAST(timestamp AS DATE))=datetime.date(2015, 2, 20)), Row(timestamp=datetime.datetime(2015, 10, 9, 0, 0, 2), to_date(CAST(timestamp AS DATE))=datetime.date(2015, 10, 9))]
{code}
> SQL Context function to_date() returns wrong date
> -------------------------------------------------
>
> Key: SPARK-13837
> URL: https://issues.apache.org/jira/browse/SPARK-13837
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.1
> Environment: Python version:
> 2.7.6 (default, Mar 22 2014, 22:59:56)
> [GCC 4.8.2]
> System timezone:
> PDT
> Reporter: Arnaud Caruso
> Fix For: 2.0.0
>
>
> When using the SQL Context function to_date on a timestamp, it sometimes returns the wrong date.
> Here's how to reproduce the bug in Python:
> data = [[datetime.datetime(2015, 2, 20, 0, 0, 2)],[datetime.datetime(2015, 10, 9, 0, 0, 2)]]
> rddData = sc.parallelize(data)
> fields=[StructField('timestamp', TimestampType(), True)]
> schema=StructType(fields)
> data_table=sqlCtx.createDataFrame(data,schema)
> sqlCtx.registerDataFrameAsTable(data_table,"data")
> query="SELECT timestamp, TO_DATE(timestamp) FROM data "
> df=sqlCtx.sql(query)
> df.collect()
> Here are the results I get:
> [Row(timestamp=datetime.datetime(2015, 2, 20, 0, 0, 2), _c1=datetime.date(2015, 2, 20)),
> Row(timestamp=datetime.datetime(2015, 10, 9, 0, 0, 2), _c1=datetime.date(2015, 10, 8))]
> The first date is right but the second date is wrong, it returns October 8th instead of returning October 9th.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org