You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/28 19:19:35 UTC

[GitHub] [arrow] jonkeane commented on a change in pull request #12707: ARROW-14810 [R] Implement bindings for lubridate's `date_decimal()` and `decimal_date()`

jonkeane commented on a change in pull request #12707:
URL: https://github.com/apache/arrow/pull/12707#discussion_r836749769



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -300,6 +312,34 @@ register_bindings_duration <- function() {
 
     build_expr("cast", x, options = cast_options(to_type = duration(unit = "s")))
   })
+  register_binding("decimal_date", function(date) {
+    # browser()
+    y <- build_expr("year", date)
+    # timezone <- call_binding("tz", date)
+    start <- call_binding("make_datetime", year = y, tz = "UTC")
+    end <- call_binding("make_datetime", year = y + 1L, tz = "UTC")
+    # maybe use yday here

Review comment:
       Yeah, I htink `yday / if (leapyrea(...)) 366 else 365` would probably be a bit cleaner (unless | until the number of days in a year changes, I guess)

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -300,6 +312,34 @@ register_bindings_duration <- function() {
 
     build_expr("cast", x, options = cast_options(to_type = duration(unit = "s")))
   })
+  register_binding("decimal_date", function(date) {
+    # browser()
+    y <- build_expr("year", date)
+    # timezone <- call_binding("tz", date)

Review comment:
       ```suggestion
   ```

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -300,6 +312,34 @@ register_bindings_duration <- function() {
 
     build_expr("cast", x, options = cast_options(to_type = duration(unit = "s")))
   })
+  register_binding("decimal_date", function(date) {
+    # browser()
+    y <- build_expr("year", date)
+    # timezone <- call_binding("tz", date)
+    start <- call_binding("make_datetime", year = y, tz = "UTC")
+    end <- call_binding("make_datetime", year = y + 1L, tz = "UTC")
+    # maybe use yday here
+    sofar <- call_binding("difftime", date, start, units = "secs")
+    total <- call_binding("difftime", end, start, units = "secs")
+    y + sofar$cast(int64()) / total$cast(int64())
+  })
+  register_binding("date_decimal", function(decimal, tz = "UTC") {
+    y <- build_expr(
+      "cast",
+      decimal,
+      options = cast_options(to_type = int32(), safe = FALSE)
+    )
+
+    start <- call_binding("make_datetime", year = y, tz = tz)
+    end <- call_binding("make_datetime", year = y + 1, tz = tz)
+
+    seconds <- call_binding("difftime", end, start, units = "secs")
+    fraction <- decimal - y
+    delta <- seconds$cast(int64()) * fraction
+    delta <- delta$cast(int64(), safe = FALSE)

Review comment:
       You're looking for `floor` or `trunc` here, yeah? `safe = FALSE` will do truncation, but will also overflow and other things we probably don't want.

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -1130,6 +1130,17 @@ test_that("difftime works correctly", {
     test_df_with_tz
   )
 
+  test_df_with_tz %>%
+    arrow_table() %>%
+    mutate(
+      secs2 = difftime(
+        as.POSIXct("2022-03-07", tz = "Europe/Bucharest"),
+        time1,
+        units = "secs"
+      )
+    ) %>%
+    collect()

Review comment:
       Is there an expectation for this test? Also, is the timezone here something important about Bucharest or intended to be not-UTC default?

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -1213,3 +1224,25 @@ test_that("as.difftime()", {
       collect()
   )
 })
+
+test_that("`decimal_date()` and `date_decimal()`", {
+  test_df <- tibble(
+    a = c(2007.38998954347, 1970.77732069883, 2020.96061799722,
+          2009.43465948477, 1975.71251467871, NA),
+    b = as.POSIXct(
+      c("2007-05-23 08:18:30", "1970-10-11 17:19:45", "2020-12-17 14:04:06",
+        "2009-06-08 15:37:01", "1975-09-18 01:37:42", NA)
+    )

Review comment:
       Could we also add a test case that has `Date`s as inputs?

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -300,6 +312,34 @@ register_bindings_duration <- function() {
 
     build_expr("cast", x, options = cast_options(to_type = duration(unit = "s")))
   })
+  register_binding("decimal_date", function(date) {
+    # browser()

Review comment:
       ```suggestion
   ```

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -261,6 +261,18 @@ register_bindings_duration <- function() {
       time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp(timezone = "UTC")))
     }
 
+    # if time1 or time2 are timestamps they cannot be expressed in "s" /seconds
+    # otherwise they cannot be added subtracted with durations
+    if (inherits(time1, "Expression") &&
+        time1$type_id() %in% Type[c("TIMESTAMP")] && time1$type()$unit() != 2L) {
+      time1 <- build_expr("cast", time1, options = cast_options(to_type = timestamp("us")))
+    }
+
+    if (inherits(time2, "Expression") &&
+        time2$type_id() %in% Type[c("TIMESTAMP")] && time2$type()$unit() != 2L) {
+      time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp("us")))
+    }
+

Review comment:
       Is there a jira for clearing this up in the C++?

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -300,6 +312,34 @@ register_bindings_duration <- function() {
 
     build_expr("cast", x, options = cast_options(to_type = duration(unit = "s")))
   })
+  register_binding("decimal_date", function(date) {
+    # browser()
+    y <- build_expr("year", date)
+    # timezone <- call_binding("tz", date)
+    start <- call_binding("make_datetime", year = y, tz = "UTC")
+    end <- call_binding("make_datetime", year = y + 1L, tz = "UTC")
+    # maybe use yday here
+    sofar <- call_binding("difftime", date, start, units = "secs")
+    total <- call_binding("difftime", end, start, units = "secs")
+    y + sofar$cast(int64()) / total$cast(int64())
+  })
+  register_binding("date_decimal", function(decimal, tz = "UTC") {
+    y <- build_expr(
+      "cast",
+      decimal,
+      options = cast_options(to_type = int32(), safe = FALSE)
+    )
+
+    start <- call_binding("make_datetime", year = y, tz = tz)
+    end <- call_binding("make_datetime", year = y + 1, tz = tz)
+
+    seconds <- call_binding("difftime", end, start, units = "secs")

Review comment:
       `seconds` here is effectively the number of seconds in a year, yeah? Would it be better or worse to have 31536000 (or 31622400 for leap years) instead of calculating and constructing it here?

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -300,6 +312,34 @@ register_bindings_duration <- function() {
 
     build_expr("cast", x, options = cast_options(to_type = duration(unit = "s")))
   })
+  register_binding("decimal_date", function(date) {
+    # browser()
+    y <- build_expr("year", date)
+    # timezone <- call_binding("tz", date)
+    start <- call_binding("make_datetime", year = y, tz = "UTC")
+    end <- call_binding("make_datetime", year = y + 1L, tz = "UTC")
+    # maybe use yday here
+    sofar <- call_binding("difftime", date, start, units = "secs")
+    total <- call_binding("difftime", end, start, units = "secs")
+    y + sofar$cast(int64()) / total$cast(int64())
+  })
+  register_binding("date_decimal", function(decimal, tz = "UTC") {
+    y <- build_expr(
+      "cast",
+      decimal,
+      options = cast_options(to_type = int32(), safe = FALSE)

Review comment:
       You're looking for floor or trunc here, yeah? safe = FALSE will do truncation, but will also overflow and other things we probably don't want.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org