You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/08 19:55:48 UTC

[GitHub] [arrow] jonkeane commented on a change in pull request #12353: ARROW-14471: [R] Implement lubridate's date/time parsing functions

jonkeane commented on a change in pull request #12353:
URL: https://github.com/apache/arrow/pull/12353#discussion_r802006466



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -148,4 +148,56 @@ register_bindings_datetime <- function() {
     !call_binding("am", x)
   })
 
+  register_binding("ymd", function(x) {
+    format_map <-
+      list(
+        ymd_hyphen1 = "%Y-%m-%d",
+        ymd_hyphen2 = "%y-%m-%d",
+        ymd_hyphen3 = "%Y-%B-%d",
+        ymd_hyphen4 = "%y-%B-%d",
+        ymd_hyphen5 = "%Y-%b-%d",
+        ymd_hyphen6 = "%y-%b-%d",

Review comment:
       Could we use one set of 6 like this, and then pre-process the strings with a regex like was suggested in https://issues.apache.org/jira/browse/ARROW-14471?focusedCommentId=17446011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17446011 ?
   
   Something kind of like:
   
   ```
   format_map <- list(
           ymd_hyphen1 = "%Y-%m-%d",
           ymd_hyphen2 = "%y-%m-%d",
           ymd_hyphen3 = "%Y-%B-%d",
           ymd_hyphen4 = "%y-%B-%d",
           ymd_hyphen5 = "%Y-%b-%d",
           ymd_hyphen6 = "%y-%b-%d"
   )
   
   x <- gsub("[^A-Za-z0-9.]", "-", x)
   call_binding(
         "coalesce",
         call_binding("strptime", x, format = format_map[[1]], unit = "s"),
         call_binding("strptime", x, format = format_map[[2]], unit = "s"),
         call_binding("strptime", x, format = format_map[[3]], unit = "s"),
         call_binding("strptime", x, format = format_map[[4]], unit = "s"),
         call_binding("strptime", x, format = format_map[[5]], unit = "s"),
         call_binding("strptime", x, format = format_map[[6]], unit = "s")
       )
   ```
   
   You might need to use `call_binding("gsub", ...)` instead of being able to use it directly

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -148,4 +148,56 @@ register_bindings_datetime <- function() {
     !call_binding("am", x)
   })
 
+  register_binding("ymd", function(x) {
+    format_map <-
+      list(
+        ymd_hyphen1 = "%Y-%m-%d",
+        ymd_hyphen2 = "%y-%m-%d",
+        ymd_hyphen3 = "%Y-%B-%d",
+        ymd_hyphen4 = "%y-%B-%d",
+        ymd_hyphen5 = "%Y-%b-%d",
+        ymd_hyphen6 = "%y-%b-%d",

Review comment:
       This would at least cut down on the number of formats we need to try and _mostly_ get the right answers




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org