You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/08/23 01:51:20 UTC
[jira] [Commented] (SPARK-17174) Provide support for Timestamp type Column in add_months function to return HH:mm:ss

    [ https://issues.apache.org/jira/browse/SPARK-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431965#comment-15431965 ] 

Hyukjin Kwon commented on SPARK-17174:
--------------------------------------

I just took a look for others as references.

It seems Hive is also doing this, https://github.com/apache/hive/blob/26b5c7b56a4f28ce3eabc0207566cce46b29b558/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java#L48-L51

Oracle's also returns also date types, https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions004.htm

It seems the input timestamp type is being converted into date types as below:

{code}
Seq(Tuple1(Timestamp.valueOf("2012-07-16 12:12:12"))).toDF("ts")
  .selectExpr("add_months(ts, 1)", "date_add(ts, 1)")
  .show()
{code}

prints as below:

{code}
+-------------------------------+-----------------------------+
|add_months(CAST(ts AS DATE), 1)|date_add(CAST(ts AS DATE), 1)|
+-------------------------------+-----------------------------+
|                     2012-08-16|                   2012-07-17|
+-------------------------------+-----------------------------+
{code}

It seems there is a discussion about this here, https://github.com/apache/spark/pull/7589#discussion_r35186500

So, I believe it'd make sense that we document this behaviour for expression description like Hive does https://github.com/apache/hive/blob/26b5c7b56a4f28ce3eabc0207566cce46b29b558/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java#L48-L51

Do you mind if i submit a PR for documentation for this?


> Provide support for Timestamp type Column in add_months function to return HH:mm:ss
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-17174
>                 URL: https://issues.apache.org/jira/browse/SPARK-17174
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>    Affects Versions: 2.0.0
>            Reporter: Amit Baghel
>            Priority: Minor
>
> add_months function currently supports Date types. If Column is Timestamp type then it adds month to date but it doesn't return timestamp part (HH:mm:ss). See the code below.
> {code}
> import java.util.Calendar
> val now = Calendar.getInstance().getTime()
> val df = sc.parallelize((0 to 3).map(i => {now.setMonth(i); (i, new java.sql.Timestamp(now.getTime))}).toSeq).toDF("ID", "DateWithTS")
> df.withColumn("NewDateWithTS", add_months(df("DateWithTS"),1)).show
> {code}
> Above code gives following response. See the HH:mm:ss is missing from NewDateWithTS column.
> {code}
> +---+--------------------+-------------+
> | ID|          DateWithTS|NewDateWithTS|
> +---+--------------------+-------------+
> |  0|2016-01-21 09:38:...|   2016-02-21|
> |  1|2016-02-21 09:38:...|   2016-03-21|
> |  2|2016-03-21 09:38:...|   2016-04-21|
> |  3|2016-04-21 09:38:...|   2016-05-21|
> +---+--------------------+-------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org