You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/16 11:49:44 UTC

[GitHub] [arrow] rok opened a new pull request, #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

rok opened a new pull request, #13627:
URL: https://github.com/apache/arrow/pull/13627

   This is to resolve [ARROW-16395](https://issues.apache.org/jira/browse/ARROW-16395).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot merged pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
paleolimbot merged PR #13627:
URL: https://github.com/apache/arrow/pull/13627


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13627:
URL: https://github.com/apache/arrow/pull/13627#issuecomment-1186163836

   https://issues.apache.org/jira/browse/ARROW-16395


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13627:
URL: https://github.com/apache/arrow/pull/13627#issuecomment-1192031123

   @nealrichardson do you think we could get this into the release?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13627:
URL: https://github.com/apache/arrow/pull/13627#discussion_r925682120


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -571,11 +571,20 @@ register_bindings_datetime_parsers <- function() {
     }
   })
 
-  ymd_parser_vec <- c("ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq")
+  parser_vec <- c(
+    "ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq",
+    "ymd_HMS", "ymd_HM", "ymd_H", "dmy_HMS", "dmy_HM", "dmy_H",
+    "mdy_HMS", "mdy_HM", "mdy_H", "ydm_HMS", "ydm_HM", "ydm_H"
+  )
 
-  ymd_parser_map_factory <- function(order) {
+  parser_map_factory <- function(order) {
     force(order)
     function(x, tz = NULL) {

Review Comment:
   Opened [ARROW-17146](https://issues.apache.org/jira/browse/ARROW-17146) and [ARROW-17147](https://issues.apache.org/jira/browse/ARROW-17147).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13627:
URL: https://github.com/apache/arrow/pull/13627#issuecomment-1189322466

   @dragosmg is this missing something obvious?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13627:
URL: https://github.com/apache/arrow/pull/13627#issuecomment-1192862032

   Thanks for reviews and merge @dragosmg & @paleolimbot !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] dragosmg commented on a diff in pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
dragosmg commented on code in PR #13627:
URL: https://github.com/apache/arrow/pull/13627#discussion_r924875872


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -571,11 +571,20 @@ register_bindings_datetime_parsers <- function() {
     }
   })
 
-  ymd_parser_vec <- c("ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq")
+  parser_vec <- c(
+    "ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq",
+    "ymd_HMS", "ymd_HM", "ymd_H", "dmy_HMS", "dmy_HM", "dmy_H",
+    "mdy_HMS", "mdy_HM", "mdy_H", "ydm_HMS", "ydm_HM", "ydm_H"
+  )
 
-  ymd_parser_map_factory <- function(order) {
+  parser_map_factory <- function(order) {
     force(order)
     function(x, tz = NULL) {

Review Comment:
   `lubridate::parse_date_time()` has a `truncated` argument (which we also support with the `parse_date_time` binding). What do you think about exposing it to the `ymd_...()` family of functions? The same goes for `quiet` where the binding has a different default (is not chatty, `quiet = TRUE`). 



##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -2474,4 +2534,24 @@ test_that("build_formats() and build_format_from_order()", {
       "%y%b%d%H%M%S", "%Y%b%d%H%M%S"
     )
   )
+
+  expect_equal(
+    build_format_from_order("ymdHM"),
+    c(
+      "%y-%m-%d-%H-%M", "%Y-%m-%d-%H-%M", "%y-%B-%d-%H-%M",
+      "%Y-%B-%d-%H-%M", "%y-%b-%d-%H-%M", "%Y-%b-%d-%H-%M",
+      "%y%m%d%H%M", "%Y%m%d%H%M", "%y%B%d%H%M", "%Y%B%d%H%M",
+      "%y%b%d%H%M", "%Y%b%d%H%M"
+    )
+  )
+
+  expect_equal(
+    build_format_from_order("ymdH"),
+    c(
+      "%y-%m-%d-%H", "%Y-%m-%d-%H", "%y-%B-%d-%H",
+      "%Y-%B-%d-%H", "%y-%b-%d-%H", "%Y-%b-%d-%H",
+      "%y%m%d%H", "%Y%m%d%H", "%y%B%d%H", "%Y%B%d%H",
+      "%y%b%d%H", "%Y%b%d%H"
+    )
+  )
 })

Review Comment:
   Maybe in a separate `test_that` block we could have a few parsing tests with regular R objects. That's totally on me as I should've added them a while ago. For example, something like:
   ```r
     compare_dplyr_binding(
        .input %>%
          mutate(
            ymd_hms_dttm = ymd_hms("2022-07-19 20:24:43"),
           ...
          ) %>%
          collect(),
        test_dates_times2
      )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] dragosmg commented on a diff in pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
dragosmg commented on code in PR #13627:
URL: https://github.com/apache/arrow/pull/13627#discussion_r924875872


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -571,11 +571,20 @@ register_bindings_datetime_parsers <- function() {
     }
   })
 
-  ymd_parser_vec <- c("ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq")
+  parser_vec <- c(
+    "ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq",
+    "ymd_HMS", "ymd_HM", "ymd_H", "dmy_HMS", "dmy_HM", "dmy_H",
+    "mdy_HMS", "mdy_HM", "mdy_H", "ydm_HMS", "ydm_HM", "ydm_H"
+  )
 
-  ymd_parser_map_factory <- function(order) {
+  parser_map_factory <- function(order) {
     force(order)
     function(x, tz = NULL) {

Review Comment:
   `lubridate::parse_date_time()` has a `truncated` argument (which we also support with the `parse_date_time` binding). What do you think about exposing it to the `ymd_...()` family of functions? The same goes for `quiet` where the binding has a different default (is not chatty, `quiet = TRUE`). 
   I think I added support for these 2 arguments after the first iteration of the `ymd_parser_map_factory()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13627:
URL: https://github.com/apache/arrow/pull/13627#discussion_r925090444


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -2474,4 +2534,24 @@ test_that("build_formats() and build_format_from_order()", {
       "%y%b%d%H%M%S", "%Y%b%d%H%M%S"
     )
   )
+
+  expect_equal(
+    build_format_from_order("ymdHM"),
+    c(
+      "%y-%m-%d-%H-%M", "%Y-%m-%d-%H-%M", "%y-%B-%d-%H-%M",
+      "%Y-%B-%d-%H-%M", "%y-%b-%d-%H-%M", "%Y-%b-%d-%H-%M",
+      "%y%m%d%H%M", "%Y%m%d%H%M", "%y%B%d%H%M", "%Y%B%d%H%M",
+      "%y%b%d%H%M", "%Y%b%d%H%M"
+    )
+  )
+
+  expect_equal(
+    build_format_from_order("ymdH"),
+    c(
+      "%y-%m-%d-%H", "%Y-%m-%d-%H", "%y-%B-%d-%H",
+      "%Y-%B-%d-%H", "%y-%b-%d-%H", "%Y-%b-%d-%H",
+      "%y%m%d%H", "%Y%m%d%H", "%y%B%d%H", "%Y%B%d%H",
+      "%y%b%d%H", "%Y%b%d%H"
+    )
+  )
 })

Review Comment:
   [Added](https://github.com/apache/arrow/pull/13627/commits/a4e5059eb4900e2fb974bd40eb1dbcf1470df522), do you think more is needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] dragosmg commented on a diff in pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
dragosmg commented on code in PR #13627:
URL: https://github.com/apache/arrow/pull/13627#discussion_r924875872


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -571,11 +571,20 @@ register_bindings_datetime_parsers <- function() {
     }
   })
 
-  ymd_parser_vec <- c("ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq")
+  parser_vec <- c(
+    "ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq",
+    "ymd_HMS", "ymd_HM", "ymd_H", "dmy_HMS", "dmy_HM", "dmy_H",
+    "mdy_HMS", "mdy_HM", "mdy_H", "ydm_HMS", "ydm_HM", "ydm_H"
+  )
 
-  ymd_parser_map_factory <- function(order) {
+  parser_map_factory <- function(order) {
     force(order)
     function(x, tz = NULL) {

Review Comment:
   `lubridate::parse_date_time()` has a `truncated` argument (which we also support with the `parse_date_time` binding). What do you think about exposing it to the `ymd_...()` family of functions? The same goes for `quiet` where the binding has a different default (is not chatty, `quiet = TRUE`). 
   I think I added support for these 2 arguments in `parse_date_time()` after the first iteration of the `ymd_parser_map_factory()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13627:
URL: https://github.com/apache/arrow/pull/13627#issuecomment-1193238016

   Benchmark runs are scheduled for baseline = 9442e1ce8d976cdec771c2dbb15704e0cb2018ff and contender = 3c7a0cad0e25ed66e4c555d9da49f320f803573c. 3c7a0cad0e25ed66e4c555d9da49f320f803573c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/34a0c5e3de00469c8923339a3ea67e86...f43c5e3114554b0282dfb1b68e249624/)
   [Failed :arrow_down:0.21% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/acdb18a558fc4f2391e3e2fcca4bc1ab...023aa95d3d61476fba2bff60f7925aa8/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/2ddbe3ca732d43c7b7638981b28f7dcc...84c83ef14f2d49b29462681b44085d84/)
   [Finished :arrow_down:0.11% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/3704d8110e1f4002a683f879791ff6a0...f764f300f550404d8c1269f45c437813/)
   Buildkite builds:
   [Failed] [`3c7a0cad` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1186)
   [Failed] [`3c7a0cad` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1198)
   [Failed] [`3c7a0cad` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1180)
   [Finished] [`3c7a0cad` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1200)
   [Failed] [`9442e1ce` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1185)
   [Failed] [`9442e1ce` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1197)
   [Failed] [`9442e1ce` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1179)
   [Finished] [`9442e1ce` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1199)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13627:
URL: https://github.com/apache/arrow/pull/13627#discussion_r925090139


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -571,11 +571,20 @@ register_bindings_datetime_parsers <- function() {
     }
   })
 
-  ymd_parser_vec <- c("ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq")
+  parser_vec <- c(
+    "ymd", "ydm", "mdy", "myd", "dmy", "dym", "ym", "my", "yq",
+    "ymd_HMS", "ymd_HM", "ymd_H", "dmy_HMS", "dmy_HM", "dmy_H",
+    "mdy_HMS", "mdy_HM", "mdy_H", "ydm_HMS", "ydm_HM", "ydm_H"
+  )
 
-  ymd_parser_map_factory <- function(order) {
+  parser_map_factory <- function(order) {
     force(order)
     function(x, tz = NULL) {

Review Comment:
   Added `quiet `, `locale` and `truncated`. However `quiet` and `locale` currently just error if changed from default. I'll open a Jira to address this later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] dragosmg commented on a diff in pull request #13627: ARROW-16395: [R] Implement lubridate's parsers with year, month, and day, hour, minute, and second components

Posted by GitBox <gi...@apache.org>.
dragosmg commented on code in PR #13627:
URL: https://github.com/apache/arrow/pull/13627#discussion_r925346214


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -2474,4 +2534,24 @@ test_that("build_formats() and build_format_from_order()", {
       "%y%b%d%H%M%S", "%Y%b%d%H%M%S"
     )
   )
+
+  expect_equal(
+    build_format_from_order("ymdHM"),
+    c(
+      "%y-%m-%d-%H-%M", "%Y-%m-%d-%H-%M", "%y-%B-%d-%H-%M",
+      "%Y-%B-%d-%H-%M", "%y-%b-%d-%H-%M", "%Y-%b-%d-%H-%M",
+      "%y%m%d%H%M", "%Y%m%d%H%M", "%y%B%d%H%M", "%Y%B%d%H%M",
+      "%y%b%d%H%M", "%Y%b%d%H%M"
+    )
+  )
+
+  expect_equal(
+    build_format_from_order("ymdH"),
+    c(
+      "%y-%m-%d-%H", "%Y-%m-%d-%H", "%y-%B-%d-%H",
+      "%Y-%B-%d-%H", "%y-%b-%d-%H", "%Y-%b-%d-%H",
+      "%y%m%d%H", "%Y%m%d%H", "%y%B%d%H", "%Y%B%d%H",
+      "%y%b%d%H", "%Y%b%d%H"
+    )
+  )
 })

Review Comment:
   I'd say maybe a couple of tests covering the additional args (`quiet`, `locale` & `truncated`). I would only test for `ymd_hms()`, for example, not all of the individual parsers. 
   
   Something like (maybe added to [this block](https://github.com/apache/arrow/blob/0330353a93eff799616bf66e0e994236393458f7/r/tests/testthat/test-dplyr-funcs-datetime.R#L2342-L2386), where you already have a data frame defined for testing with truncated formats):
   ```r
   compare_dplyr_binding(
       .input %>%
         mutate(dttm = ymd_hms(truncated_ymd_string, truncated = 3)) %>%
         collect(),
       test_truncation_df
     )
   ```
   
   There is also a [block](https://github.com/apache/arrow/blob/0330353a93eff799616bf66e0e994236393458f7/r/tests/testthat/test-dplyr-funcs-datetime.R#L2324-L2340) for testing `quiet = FALSE` not supported.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org