You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/05 10:27:24 UTC

[GitHub] [spark] EnricoMi opened a new pull request #27805: Add CalendarIntervals division

EnricoMi opened a new pull request #27805: Add CalendarIntervals division
URL: https://github.com/apache/spark/pull/27805
 
 
   ### What changes were proposed in this pull request?
   This defines the devision of two `CalendarInterval`s as the ratio between the time duration of both. The `CalendarInterval` consists of three time components: `months`, `days` and `microseconds`. The division is only defined between intervals that have a single non-zero time component, while both intervals have the same non-zero time component. Otherwise the division expression would be ambiguous.
   
   ### Why are the changes needed?
   Putting interval into relation is a natural expression: 1 day / 1 hour = number of hours per day.
   
   This allows to evaluate the magnitude of `CalendarInterval`s in SQL expressions:
   
   ```
   Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 13:30:25")))
     .toDF("start", "end")
     .withColumn("interval", $"end" - $"start")
     .withColumn("interval [h]", $"interval" / lit("1 hour").cast(CalendarIntervalType))
     .withColumn("rate [€/h]", lit(1.45))
     .withColumn("price [€]", $"interval [h]" * $"rate [€/h]")
     .show(false)
   +-------------------+-------------------+-----------------------------+------------------+----------+------------------+
   |start              |end                |interval                     |interval [h]      |rate [€/h]|price [€]         |
   +-------------------+-------------------+-----------------------------+------------------+----------+------------------+
   |2020-02-01 12:00:00|2020-02-01 13:30:25|1 hours 30 minutes 25 seconds|1.5069444444444444|1.45      |2.1850694444444443|
   +-------------------+-------------------+-----------------------------+------------------+----------+------------------+
   ```
   
   The currently available approach is
   
   ```
   Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 13:30:25")))
     .toDF("start", "end")
     .withColumn("interval [s]", unix_timestamp($"end") - unix_timestamp($"start"))
     .withColumn("interval [h]", $"interval [s]" / 3600)
     .withColumn("rate [€/h]", lit(1.45))
     .withColumn("price [€]", $"interval [h]" * $"rate [€/h]")
     .show(false)
   ```
   
   Going through `unix_timestamp` is a hack and it pollutes the SQL query with unrelated semantics (unix timestamp is completely irrelevant for this computation). It is merely there because there is currently no way to access the length of an `CalendarInterval`. Dividing an interval by another interval provides means to measure the length in an arbitrary unit (minutes, hours, quarter hours).
   
   ### Does this PR introduce any user-facing change?
   It allows interval division in SQL expressions and adds the `DivideIntervals` catalyst expression.
   
   ### How was this patch tested?
   Unit tests in `IntervalExpressionsSuite`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on issue #27805: Add CalendarIntervals division

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on issue #27805: Add CalendarIntervals division
URL: https://github.com/apache/spark/pull/27805#issuecomment-595205178
 
 
   @EnricoMi, please review https://spark.apache.org/contributing.html (e.g., filing a JIRA and PR titie)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27805: Add CalendarIntervals division

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27805: Add CalendarIntervals division
URL: https://github.com/apache/spark/pull/27805#issuecomment-595155303
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27805: Add CalendarIntervals division

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27805: Add CalendarIntervals division
URL: https://github.com/apache/spark/pull/27805#issuecomment-595156011
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27805: Add CalendarIntervals division

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27805: Add CalendarIntervals division
URL: https://github.com/apache/spark/pull/27805#issuecomment-595155303
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org