You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/02/08 06:12:42 UTC

[jira] [Comment Edited] (SPARK-19496) to_date with format has weird behavior

    [ https://issues.apache.org/jira/browse/SPARK-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857459#comment-15857459 ] 

Hyukjin Kwon edited comment on SPARK-19496 at 2/8/17 6:12 AM:
--------------------------------------------------------------

- Hive

{code}
hive> SELECT to_date('2014-31-12');
2016-07-12
...

hive> SELECT to_date('2014-12-32');
2015-01-01
...

hive> SELECT to_date('2014-12-31');
2014-12-31
{code}

{code}
hive> SELECT to_date('2015-07-22', 'yyyy-dd-MM')
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:7 Arguments length mismatch ''yyyy-dd-MM'': to_date() requires 1 argument, got 2
...
{code}

- Postgres

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
postgres=# SELECT to_date('2014-12-31', 'yyyy-MM-dd');
  to_date
------------
 2014-12-31
(1 row)

postgres=# SELECT to_date('2014-13-31', 'yyyy-MM-dd');
  to_date
------------
 2015-01-31
(1 row)
{code}


- Spark

{code}
spark-sql> SELECT to_date('2014-31-12');
NULL
...

spark-sql> SELECT to_date('2014-12-32');
NULL
...

spark-sql> SELECT to_date('2014-12-31');
2014-12-31
...
{code}

{code}
spark-sql> SELECT to_date('2015-07-22', 'yyyy-dd-MM')
2016-10-07
...
{code}

- MySQL

{code}
mysql> SELECT str_to_date('2014-12-31', '%Y-%m-%d');
+---------------------------------------+
| str_to_date('2014-12-31', '%Y-%m-%d') |
+---------------------------------------+
| 2014-12-31                            |
+---------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT str_to_date('2014-13-31', '%Y-%m-%d');
+---------------------------------------+
| str_to_date('2014-13-31', '%Y-%m-%d') |
+---------------------------------------+
| NULL                                  |
+---------------------------------------+
1 row in set, 1 warning (0.00 sec)
{code}


MySQL/SparkSQL - it seems returning {{NULL}}. MySQL seems not supporting omitting the format.

Hive/Postgres - it seems returning calculated dates. Hive seems not supporting the format. Postgres seems not supporting omitting the format.


was (Author: hyukjin.kwon):
- Hive

{code}
hive> SELECT to_date('2014-31-12');
2016-07-12
...

hive> SELECT to_date('2014-12-32');
2015-01-01
...

hive> SELECT to_date('2014-12-31');
2014-12-31
{code}


- Postgres

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
postgres=# SELECT to_date('2014-12-31', 'yyyy-MM-dd');
  to_date
------------
 2014-12-31
(1 row)

postgres=# SELECT to_date('2014-13-31', 'yyyy-MM-dd');
  to_date
------------
 2015-01-31
(1 row)
{code}


- Spark

{code}
spark-sql> SELECT to_date('2014-31-12');
NULL
...

spark-sql> SELECT to_date('2014-12-32');
NULL
...

spark-sql> SELECT to_date('2014-12-31');
2014-12-31
...
{code}


- MySQL

{code}
mysql> SELECT str_to_date('2014-12-31', '%Y-%m-%d');
+---------------------------------------+
| str_to_date('2014-12-31', '%Y-%m-%d') |
+---------------------------------------+
| 2014-12-31                            |
+---------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT str_to_date('2014-13-31', '%Y-%m-%d');
+---------------------------------------+
| str_to_date('2014-13-31', '%Y-%m-%d') |
+---------------------------------------+
| NULL                                  |
+---------------------------------------+
1 row in set, 1 warning (0.00 sec)
{code}


MySQL/SparkSQL - it seems returning {{NULL}}. 

Hive/Postgres - it seems returning calculated dates.

> to_date with format has weird behavior
> --------------------------------------
>
>                 Key: SPARK-19496
>                 URL: https://issues.apache.org/jira/browse/SPARK-19496
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Wenchen Fan
>
> Today, if we run
> {code}
> SELECT to_date('2015-07-22', 'yyyy-dd-MM')
> {code}
> will result to `2016-10-07`, while running
> {code}
> SELECT to_date('2014-31-12')   # default format
> {code}
> will return null.
> this behavior is weird and we should check other systems like hive to see if this is expected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org