You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2016/04/28 00:30:13 UTC

[jira] [Commented] (SPARK-13837) SQL Context function to_date() returns wrong date

    [ https://issues.apache.org/jira/browse/SPARK-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261084#comment-15261084 ] 

Davies Liu commented on SPARK-13837:
------------------------------------

@Amaud Caruso I'm in the same time zone as you , but can't reproduce you issue in both 1.6 branch and master.

{code}
>>> import datetime
>>> data = [[datetime.datetime(2015, 2, 20, 0, 0, 2)],[datetime.datetime(2015, 10, 9, 0, 0, 2)]]
>>> rddData = sc.parallelize(data)
fields=[StructField('timestamp', TimestampType(), True)]
schema=StructType(fields)
data_table=sqlCtx.createDataFrame(data,schema)
sqlCtx.registerDataFrameAsTable(data_table,"data")
query="SELECT timestamp, TO_DATE(timestamp) FROM data "
df=sqlCtx.sql(query)
df.collect()
>>> fields=[StructField('timestamp', TimestampType(), True)]
>>> schema=StructType(fields)
>>> data_table=sqlCtx.createDataFrame(data,schema)
>>> sqlCtx.registerDataFrameAsTable(data_table,"data")
>>> query="SELECT timestamp, TO_DATE(timestamp) FROM data "
>>> df=sqlCtx.sql(query)
>>> df.collect()
[Row(timestamp=datetime.datetime(2015, 2, 20, 0, 0, 2), to_date(CAST(timestamp AS DATE))=datetime.date(2015, 2, 20)), Row(timestamp=datetime.datetime(2015, 10, 9, 0, 0, 2), to_date(CAST(timestamp AS DATE))=datetime.date(2015, 10, 9))]
{code}

> SQL Context function to_date() returns wrong date
> -------------------------------------------------
>
>                 Key: SPARK-13837
>                 URL: https://issues.apache.org/jira/browse/SPARK-13837
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>         Environment: Python version:
> 2.7.6 (default, Mar 22 2014, 22:59:56) 
> [GCC 4.8.2]
> System timezone:
> PDT
>            Reporter: Arnaud Caruso
>             Fix For: 2.0.0
>
>
> When using the SQL Context function to_date on a timestamp, it sometimes returns the wrong date.
> Here's how to reproduce the bug in Python:
> data = [[datetime.datetime(2015, 2, 20, 0, 0, 2)],[datetime.datetime(2015, 10, 9, 0, 0, 2)]]
> rddData = sc.parallelize(data)
> fields=[StructField('timestamp', TimestampType(), True)]
> schema=StructType(fields)
> data_table=sqlCtx.createDataFrame(data,schema)
> sqlCtx.registerDataFrameAsTable(data_table,"data")
> query="SELECT timestamp, TO_DATE(timestamp) FROM data "
> df=sqlCtx.sql(query)
> df.collect()
> Here are the results I get:
> [Row(timestamp=datetime.datetime(2015, 2, 20, 0, 0, 2), _c1=datetime.date(2015, 2, 20)),
>  Row(timestamp=datetime.datetime(2015, 10, 9, 0, 0, 2), _c1=datetime.date(2015, 10, 8))]
> The first date is right but the second date is wrong, it returns October 8th instead of returning October 9th.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org