You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Mauricio 'Pachá' Vargas Sepúlveda (Jira)" <ji...@apache.org> on 2021/06/07 22:43:00 UTC
[jira] [Commented] (ARROW-12994) [R] stringr tests: 4 hours of
difference between arrow and strptime
[ https://issues.apache.org/jira/browse/ARROW-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358898#comment-17358898 ]
Mauricio 'Pachá' Vargas Sepúlveda commented on ARROW-12994:
-----------------------------------------------------------
{code:r}
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.1.0 (2021-05-18)
os Ubuntu 20.04.2 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Santiago
date 2021-06-07
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
arrow * 4.0.1.9000 2021-06-07 [1] local
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0)
cachem 1.0.5 2021-05-15 [1] CRAN (R 4.1.0)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0)
cli 2.5.0 2021-04-26 [1] CRAN (R 4.1.0)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
desc 1.3.0 2021-03-05 [1] CRAN (R 4.1.0)
devtools * 2.4.1 2021-05-05 [1] CRAN (R 4.1.0)
dplyr * 1.0.6 2021-05-05 [1] CRAN (R 4.1.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0)
pillar 1.6.1 2021-05-16 [1] CRAN (R 4.1.0)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.1.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0)
remotes 2.3.0 2021-04-01 [1] CRAN (R 4.1.0)
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
testthat * 3.0.2 2021-02-14 [1] CRAN (R 4.1.0)
tibble 3.1.2 2021-05-16 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
usethis * 2.0.1 2021-02-10 [1] CRAN (R 4.1.0)
utf8 1.2.1 2021-03-12 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
[1] /home/pacha/R/x86_64-pc-linux-gnu-library/4.1
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
{code}
> [R] stringr tests: 4 hours of difference between arrow and strptime
> -------------------------------------------------------------------
>
> Key: ARROW-12994
> URL: https://issues.apache.org/jira/browse/ARROW-12994
> Project: Apache Arrow
> Issue Type: Task
> Components: R
> Affects Versions: 4.0.1
> Reporter: Mauricio 'Pachá' Vargas Sepúlveda
> Priority: Major
>
> Here's the problem I detected while triaging tickets.
> This was run locally after merging from apache/arrow at commit 8773b9d and re-building both Arrow library and Arrow R package.
> {code:r}
> library(arrow)
> #> See arrow_info() for available features
> #>
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #>
> #> timestamp
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #> intersect, setdiff, setequal, union
> library(testthat)
> #>
> #> Attaching package: 'testthat'
> #> The following object is masked from 'package:dplyr':
> #>
> #> matches
> #> The following object is masked from 'package:arrow':
> #>
> #> matches
> tstring <- tibble(x = c("08-05-2008", NA))
> tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA))
> expect_equal(
> tstring %>%
> Table$create() %>%
> mutate(
> x = strptime(x, format = "%m-%d-%Y")
> ) %>%
> collect(),
> tstamp,
> check.tzone = FALSE
> )
> #> Error: `%>%`(...) not equal to `tstamp`.
> #> Component "x": Mean absolute difference: 14400
> {code}
> We can see that the dates are different by exact 4 hours by removing the expectation:
> {code:r}
> library(arrow)
> #> See arrow_info() for available features
> #>
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #>
> #> timestamp
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #> intersect, setdiff, setequal, union
> library(testthat)
> #>
> #> Attaching package: 'testthat'
> #> The following object is masked from 'package:dplyr':
> #>
> #> matches
> #> The following object is masked from 'package:arrow':
> #>
> #> matches
> tstring <- tibble(x = c("08-05-2008", NA))
> tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA))
> tstring %>%
> Table$create() %>%
> mutate(
> x = strptime(x, format = "%m-%d-%Y")
> ) %>%
> collect()
> #> # A tibble: 2 x 1
> #> x
> #> <dttm>
> #> 1 2008-08-04 20:00:00
> #> 2 NA
> tstamp
> #> # A tibble: 2 x 1
> #> x
> #> <dttm>
> #> 1 2008-08-05 00:00:00
> #> 2 NA
> {code}
> _Created on 2021-06-07 by the [reprex package|https://reprex.tidyverse.org] (v2.0.0)_
--
This message was sent by Atlassian Jira
(v8.3.4#803005)