You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Florentino Sainz (JIRA)" <ji...@apache.org> on 2017/08/02 11:10:00 UTC
[jira] [Comment Edited] (SPARK-21613) Wrong unix_timestamp when parsing Dates

    [ https://issues.apache.org/jira/browse/SPARK-21613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110719#comment-16110719 ] 

Florentino Sainz edited comment on SPARK-21613 at 8/2/17 11:09 AM:
-------------------------------------------------------------------

One addition, on my main development, this unix_timestamp is in the WHERE clause (for now it's much simpler) and it seems to work fine there... problem appears only when selecting the column


was (Author: florentino):
One addition, on my main development, this unix_timestamp is in the WHERE clause (for now it's much simpler) and it seems to work fine there... problem is only when selecting the column

> Wrong unix_timestamp when parsing Dates
> ---------------------------------------
>
>                 Key: SPARK-21613
>                 URL: https://issues.apache.org/jira/browse/SPARK-21613
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.6.1, 2.1.0
>            Reporter: Florentino Sainz
>
> Hi,
> I'm trying to perform some SQL queries on my database and I'm getting very weird behaviour:
> sqlContext.sql('SELECT date, time, to_utc_timestamp(from_unixtime(unix_timestamp( CONCAT (date, " ", time),"dd-MMM-YYYY HH:mm:ss")),"Europe/Madrid") FROM MYTABLE WHERE YEAR=2017  AND MONTH=01 LIMIT 10').show();
> By doing this table, doesn't matter the month, we are always getting the same TIMESTAMP (corresponding 01/01/2017). As if month and day were not being processed at all. I've tested it @ 1.6, 2.1 (at my company's cluster and at Databricks). I also tested by setting a wrong month (aka JJA) and it gives NULL, so it is getting processed at some point.
> I've isolated the problem with the following snippet (not using Spark SQL directly):
> I also tested using numeric Months (first part of my snippet), just in case, but no difference at all.
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4270191918950108/1394884673460602/1201575579760496/latest.html
> {code}
> from pyspark.sql.functions import concat, col, lit, from_utc_timestamp,to_utc_timestamp, unix_timestamp, from_unixtime
> df = sqlContext.createDataFrame([ ("11-08-2016", "20:11:18.222"),  ("20-11-2016", "20:11:18.222")], ('Date', 'Time'))
> df.select('Date',unix_timestamp(timestamp=col("Date"),format="dd-MM-YYYY")).show()
> df.select('Date',to_utc_timestamp(from_unixtime(unix_timestamp(concat(col("Date"), lit(" "), col("Time")),"dd-MM-YYYY HH:mm:ss")),"Europe/Madrid")).show()
> df = sqlContext.createDataFrame([("20-SEP-2017", "20:00:18.111"), ("20-OCT-2017", "20:11:18.222"),  ("11-AUG-2016", "20:11:18.222"),  ("20-OCT-2016", "20:11:18.222")], ('Date', 'Time'))
> df.select('Date',unix_timestamp(timestamp=col("Date"),format="dd-MMM-YYYY")).show()
> df.select('Date',to_utc_timestamp(from_unixtime(unix_timestamp(concat(col("Date"), lit(" "), col("Time")),"dd-MMM-YYYY HH:mm:ss")),"Europe/Madrid")).show()
> {code}
> Apart from the bug report, anyone can guess a workaround? (I'm trying to use DF as my datasize is quite big and I want to prevent serializing my items back into python at this stage)
> Thanks,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org