You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2022/06/29 23:31:00 UTC

[jira] [Commented] (ARROW-15805) [R] Update the as.Date() binding

    [ https://issues.apache.org/jira/browse/ARROW-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17560679#comment-17560679 ] 

Jonathan Keane commented on ARROW-15805:
----------------------------------------

This is alluded to in the PR comments, but taking a step back and thinking about the behavior:

{code}
dates_dash_first <- c("2022-01-01", "2022/02/02", "2022/02/02", "2022/02/02", "2022-01-01", "2022-01-01")
dates_slash_first <- c("2022/02/02", "2022-01-01", "2022/02/02", "2022/02/02", "2022-01-01", "2022-01-01")

as.Date(dates_dash_first, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"))
#> [1] "2022-01-01" NA           NA           NA           "2022-01-01"
#> [6] "2022-01-01"

as.Date(dates_slash_first, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"))
#> [1] "2022-02-02" NA           "2022-02-02" "2022-02-02" NA          
#> [6] NA
{code}

Which format is chosen and used is dependent on the underlying data, and critically the order that data is in. Given that we can't always guaranty the order of the data we are processing[1] we should not attempt to implement this behavior right now. Instead, we should have an error message if someone tries to specify {{tryFormats}} suggesting that they might use {{lubridate:: as_date()}} if they want to specify multiple formats (and can accept that you don't get NAs for all formats other than the first that matches), or they should pick which format they want to use and use that.


[1] and even if we could, it would take some tricky expression writing to pick the right format

> [R] Update the as.Date() binding
> --------------------------------
>
>                 Key: ARROW-15805
>                 URL: https://issues.apache.org/jira/browse/ARROW-15805
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dragoș Moldovan-Grünfeld
>            Priority: Major
>             Fix For: 9.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)