You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Mauricio 'Pachá' Vargas Sepúlveda (Jira)" <ji...@apache.org> on 2021/06/07 22:43:00 UTC

[jira] [Commented] (ARROW-12994) [R] stringr tests: 4 hours of difference between arrow and strptime

    [ https://issues.apache.org/jira/browse/ARROW-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358898#comment-17358898 ] 

Mauricio 'Pachá' Vargas Sepúlveda commented on ARROW-12994:
-----------------------------------------------------------

{code:r}
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       Ubuntu 20.04.2 LTS          
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Santiago            
 date     2021-06-07                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version    date       lib source        
 arrow       * 4.0.1.9000 2021-06-07 [1] local         
 assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 bit           4.0.4      2020-08-04 [1] CRAN (R 4.1.0)
 bit64         4.0.5      2020-08-30 [1] CRAN (R 4.1.0)
 cachem        1.0.5      2021-05-15 [1] CRAN (R 4.1.0)
 callr         3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
 cli           2.5.0      2021-04-26 [1] CRAN (R 4.1.0)
 crayon        1.4.1      2021-02-08 [1] CRAN (R 4.1.0)
 DBI           1.1.1      2021-01-15 [1] CRAN (R 4.1.0)
 desc          1.3.0      2021-03-05 [1] CRAN (R 4.1.0)
 devtools    * 2.4.1      2021-05-05 [1] CRAN (R 4.1.0)
 dplyr       * 1.0.6      2021-05-05 [1] CRAN (R 4.1.0)
 ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 fansi         0.5.0      2021-05-25 [1] CRAN (R 4.1.0)
 fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 fs            1.5.0      2020-07-31 [1] CRAN (R 4.1.0)
 generics      0.1.0      2020-10-31 [1] CRAN (R 4.1.0)
 glue          1.4.2      2020-08-27 [1] CRAN (R 4.1.0)
 lifecycle     1.0.0      2021-02-15 [1] CRAN (R 4.1.0)
 magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.1.0)
 memoise       2.0.0      2021-01-26 [1] CRAN (R 4.1.0)
 pillar        1.6.1      2021-05-16 [1] CRAN (R 4.1.0)
 pkgbuild      1.2.0      2020-12-15 [1] CRAN (R 4.1.0)
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 pkgload       1.2.1      2021-04-06 [1] CRAN (R 4.1.0)
 prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 processx      3.5.2      2021-04-30 [1] CRAN (R 4.1.0)
 ps            1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
 purrr         0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
 R6            2.5.0      2020-10-28 [1] CRAN (R 4.1.0)
 remotes       2.3.0      2021-04-01 [1] CRAN (R 4.1.0)
 rlang         0.4.11     2021-04-30 [1] CRAN (R 4.1.0)
 rprojroot     2.0.2      2020-11-15 [1] CRAN (R 4.1.0)
 rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.0)
 sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.1.0)
 testthat    * 3.0.2      2021-02-14 [1] CRAN (R 4.1.0)
 tibble        3.1.2      2021-05-16 [1] CRAN (R 4.1.0)
 tidyselect    1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
 usethis     * 2.0.1      2021-02-10 [1] CRAN (R 4.1.0)
 utf8          1.2.1      2021-03-12 [1] CRAN (R 4.1.0)
 vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
 withr         2.4.2      2021-04-18 [1] CRAN (R 4.1.0)

[1] /home/pacha/R/x86_64-pc-linux-gnu-library/4.1
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
{code}

> [R] stringr tests: 4 hours of difference between arrow and strptime
> -------------------------------------------------------------------
>
>                 Key: ARROW-12994
>                 URL: https://issues.apache.org/jira/browse/ARROW-12994
>             Project: Apache Arrow
>          Issue Type: Task
>          Components: R
>    Affects Versions: 4.0.1
>            Reporter: Mauricio 'Pachá' Vargas Sepúlveda
>            Priority: Major
>
> Here's the problem I detected while  triaging tickets. 
> This was run locally after merging from apache/arrow at commit 8773b9d and re-building both Arrow library and Arrow R package.
> {code:r}
> library(arrow)
> #> See arrow_info() for available features
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #>     timestamp
> library(dplyr)
> #> 
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #> 
> #>     filter, lag
> #> The following objects are masked from 'package:base':
> #> 
> #>     intersect, setdiff, setequal, union
> library(testthat)
> #> 
> #> Attaching package: 'testthat'
> #> The following object is masked from 'package:dplyr':
> #> 
> #>     matches
> #> The following object is masked from 'package:arrow':
> #> 
> #>     matches
> tstring <- tibble(x = c("08-05-2008", NA))
> tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA))
> expect_equal(
>   tstring %>%
>     Table$create() %>%
>     mutate(
>       x = strptime(x, format = "%m-%d-%Y")
>     ) %>%
>     collect(),
>   tstamp,
>   check.tzone = FALSE
> )
> #> Error: `%>%`(...) not equal to `tstamp`.
> #> Component "x": Mean absolute difference: 14400
> {code}
> We can see that the dates are different by exact 4 hours by removing the expectation:
> {code:r}
> library(arrow)
> #> See arrow_info() for available features
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #>     timestamp
> library(dplyr)
> #> 
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #> 
> #>     filter, lag
> #> The following objects are masked from 'package:base':
> #> 
> #>     intersect, setdiff, setequal, union
> library(testthat)
> #> 
> #> Attaching package: 'testthat'
> #> The following object is masked from 'package:dplyr':
> #> 
> #>     matches
> #> The following object is masked from 'package:arrow':
> #> 
> #>     matches
> tstring <- tibble(x = c("08-05-2008", NA))
> tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA))
> tstring %>%
>   Table$create() %>%
>   mutate(
>     x = strptime(x, format = "%m-%d-%Y")
>   ) %>%
>   collect()
> #> # A tibble: 2 x 1
> #>   x                  
> #>   <dttm>             
> #> 1 2008-08-04 20:00:00
> #> 2 NA
> tstamp
> #> # A tibble: 2 x 1
> #>   x                  
> #>   <dttm>             
> #> 1 2008-08-05 00:00:00
> #> 2 NA
> {code}
> _Created on 2021-06-07 by the [reprex package|https://reprex.tidyverse.org] (v2.0.0)_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)