You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Will Jones (Jira)" <ji...@apache.org> on 2021/11/16 23:23:00 UTC
[jira] [Commented] (ARROW-14722) [R] dplyr::arrange converts number values to negative
[ https://issues.apache.org/jira/browse/ARROW-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444873#comment-17444873 ]
Will Jones commented on ARROW-14722:
------------------------------------
Thank you for the reproducible example! This is very odd behavior indeed.
From what I can tell there seems to be a side affect from the base::xtfrm function (which dplyr::arrange uses) on arrow altrep arrays. Here's what I found so far debugging your example:
{code:java}
my_array <- my_table$a
vroom::vroom_str(my_array)
# altrep:true type:arrow::arrow::array_int_vector length:10 materialized:false
my_array
# [1] 1 2 3 4 5 6 7 8 9 10
-xtfrm(my_array)
# [1] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
my_array
# [1] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
-xtfrm(my_array)
# [1] 1 2 3 4 5 6 7 8 9 10
my_array
# [1] 1 2 3 4 5 6 7 8 9 10 {code}
> [R] dplyr::arrange converts number values to negative
> ------------------------------------------------------
>
> Key: ARROW-14722
> URL: https://issues.apache.org/jira/browse/ARROW-14722
> Project: Apache Arrow
> Issue Type: Bug
> Affects Versions: 6.0.0
> Reporter: Mislav Zorko
> Priority: Blocker
>
> ```
> {code:java}
> # Load libraries
> library(arrow)
> library(dplyr)
> # Store table
> file_path <- tempfile()
> write_parquet(dplyr::tibble(a = 1:10, b = "A"), file_path)
> # Read table
> my_table <- read_parquet(file_path)
> # Table looks normal
> my_table
> # A tibble: 10 x 2
> # a b
> # <int> <chr>
> # 1 1 A
> # 2 2 A
> # 3 3 A
> # 4 4 A
> # 5 5 A
> # 6 6 A
> # 7 7 A
> # 8 8 A
> # 9 9 A
> # 10 10 A
> # First arrange changes number values to negative
> my_table %>% arrange(dplyr::desc(a))
> # A tibble: 10 x 2
> # a b
> # <int> <chr>
> # 1 -10 A
> # 2 -9 A
> # 3 -8 A
> # 4 -7 A
> # 5 -6 A
> # 6 -5 A
> # 7 -4 A
> # 8 -3 A
> # 9 -2 A
> # 10 -1 A
> # Even underlying data is changed!!!
> my_table
> # A tibble: 10 x 2
> # a b
> # <int> <chr>
> # 1 -10 A
> # 2 -9 A
> # 3 -8 A
> # 4 -7 A
> # 5 -6 A
> # 6 -5 A
> # 7 -4 A
> # 8 -3 A
> # 9 -2 A
> # 10 -1 A
> # Second arrange changes it back
> my_table %>% arrange(dplyr::desc(a))
> my_table
> # A tibble: 10 x 2
> # a b
> # <int> <chr>
> # 1 1 A
> # 2 2 A
> # 3 3 A
> # 4 4 A
> # 5 5 A
> # 6 6 A
> # 7 7 A
> # 8 8 A
> # 9 9 A
> # 10 10 A
> {code}
> ```
--
This message was sent by Atlassian Jira
(v8.20.1#820001)