You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2021/02/10 03:28:00 UTC

[jira] [Assigned] (SPARK-33995) Make datetime addition easier for years, weeks, hours, minutes, and seconds

     [ https://issues.apache.org/jira/browse/SPARK-33995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan reassigned SPARK-33995:
-----------------------------------

    Assignee: Matthew Powers

> Make datetime addition easier for years, weeks, hours, minutes, and seconds
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-33995
>                 URL: https://issues.apache.org/jira/browse/SPARK-33995
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Matthew Powers
>            Assignee: Matthew Powers
>            Priority: Minor
>
> There are add_months and date_add functions that make it easy to perform datetime addition with months and days, but there isn't an easy way to perform datetime addition with years, weeks, hours, minutes, or seconds with the Scala/Python/R APIs.
> Users need to write code like expr("first_datetime + INTERVAL 2 hours") to add two hours to a timestamp with the Scala API, which isn't desirable.  We don't want to make Scala users manipulate SQL strings.
> We can expose the [make_interval SQL function|https://github.com/apache/spark/pull/26446/files] to make any combination of datetime addition possible.  That'll make tons of different datetime addition operations possible and will be valuable for a wide array of users.
> make_interval takes 7 arguments: years, months, weeks, days, hours, mins, and secs.
> There are different ways to expose the make_interval functionality to Scala/Python/R users:
>  * Option 1: Single make_interval function that takes 7 arguments
>  * Option 2: expose a few interval functions
>  ** make_date_interval function that takes years, months, days
>  ** make_time_interval function that takes hours, minutes, seconds
>  ** make_datetime_interval function that takes years, months, days, hours, minutes, seconds
>  * Option 3: expose add_years, add_months, add_days, add_weeks, add_hours, add_minutes, and add_seconds as Column methods.  
>  * Option 4: Expose the add_years, add_hours, etc. as column functions.  add_weeks and date_add have already been exposed in this manner.  
> Option 1 is nice from a maintenance perspective cause it's a single function, but it's not standard from a user perspective.  Most languages support datetime instantiation with these arguments: years, months, days, hours, minutes, seconds.  Mixing weeks into the equation is not standard.
> As a user, Option 3 would be my preference.  col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember and type.  col("first_datetime") + make_time_interval(lit(2), lit(0), lit(30)) isn't as nice.  col("first_datetime") + make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), lit(0), lit(30)) is harder still.
> Any of these options is an improvement to the status quo.  Let me know what option you think is best and then I'll make a PR to implement it, building off of Max's foundational work of course ;)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org