You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "lgaborini (via GitHub)" <gi...@apache.org> on 2023/11/30 13:07:53 UTC

Re: [I] [R] arrow implementation of lubridate::dmy parses invalid date "00001976" as date [arrow]

lgaborini commented on issue #33425:
URL: https://github.com/apache/arrow/issues/33425#issuecomment-1833755499

   Still open with 14.0.0.1 on Windows.
   I have included more cases: some correctly fail, others do not.
   
   ``` r
   library(arrow)
   
   df <- data.frame(
      d = c(
         "11-00-2022",
         "00-12-2022",
         "11-13-2022",
         "00-13-2022",
         "32-10-2022"
      )
   )
   
   # Base/lubridate R
   
   df$d |> lubridate::dmy()
   #> Warning: All formats failed to parse. No formats found.
   #> [1] NA NA NA NA NA
   df$d |> strptime("%d-%m-%Y")
   #> [1] NA NA NA NA NA
   df$d |> lubridate::parse_date_time("dmY")
   #> Warning: All formats failed to parse. No formats found.
   #> [1] NA NA NA NA NA
   df$d |> lubridate::parse_date_time("dmY", truncated = 0)
   #> Warning: All formats failed to parse. No formats found.
   #> [1] NA NA NA NA NA
   
   dt <- df |>
      arrow::arrow_table()
   
   dt |> dplyr::collect()
   #> # A tibble: 5 × 1
   #>   d         
   #>   <chr>     
   #> 1 11-00-2022
   #> 2 00-12-2022
   #> 3 11-13-2022
   #> 4 00-13-2022
   #> 5 32-10-2022
   
   dt |>
      dplyr::mutate(
         dt_1 = strptime(d, "%d-%m-%Y"),
         dt_2 = dmy(d),
         dt_3 = parse_date_time(d, "%d-%m-%Y", truncated = 0),
         dt_4 = parse_date_time(d, "dmY"),
      ) |>
      dplyr::collect()
   #> # A tibble: 5 × 5
   #>   d       dt_1                dt_2       dt_3                dt_4               
   #>   <chr>   <dttm>              <date>     <dttm>              <dttm>             
   #> 1 11-00-… 2021-12-11 00:00:00 2021-12-11 2021-12-11 00:00:00 2021-12-11 00:00:00
   #> 2 00-12-… 2022-12-01 00:00:00 2022-12-01 2022-12-01 00:00:00 2022-12-01 00:00:00
   #> 3 11-13-… NA                  NA         NA                  NA                 
   #> 4 00-13-… NA                  NA         NA                  NA                 
   #> 5 32-10-… NA                  NA         NA                  NA
   
   arrow_table(x = '00001976') |>
      dplyr::mutate(y = dmy(x)) |>
      dplyr::collect()
   #> # A tibble: 1 × 2
   #>   x        y         
   #>   <chr>    <date>    
   #> 1 00001976 1975-12-01
   ```
   
   <sup>Created on 2023-11-30 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org