You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "leoniedu (via GitHub)" <gi...@apache.org> on 2023/06/13 17:05:18 UTC

[GitHub] [arrow] leoniedu opened a new issue, #36053: R and dplyr, summarizing a variable results in NA at random, while there is no NA in the subset of data.

leoniedu opened a new issue, #36053:
URL: https://github.com/apache/arrow/issues/36053

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   ```
   td <- tempdir()
   tzip <- file.path(td, "reprex.zip")
   download.file("https://drive.google.com/uc?export=download&id=1-KefpiALDtUg0PrCUgpMAaE0903jVWWm", destfile = tzip)
   unzip(tzip, exdir = td)
   tlink <- file.path(td, "co_ano_mes=1997-01-01")
   library(dplyr)
   arrow_dset <- arrow::open_dataset(
        tlink,
        format = "parquet"
   )
   arrow_dset%>%
   count(fluxo, vl_frete_miss=is.na(vl_frete))%>%collect()
   ```
   
   No missing values for vl_frete when fluxo=="imp"
   
   >   fluxo vl_frete_miss     n   
   > <chr> <lgl>         <int>  
   > exp   TRUE   35546  
   > imp   FALSE         42332
   
   
   ```
   replicate(10,  arrow_dset%>% 
     group_by(fluxo) %>% 
     summarise(vl_frete=sum(vl_frete)) %>% 
     collect %>% 
     filter(fluxo=="imp") %>%
     pull(vl_frete))
   ```
   
   >      [1]        NA        NA 154149785        NA 154149785        NA 154149785 154149785
   >      [9] 154149785 154149785
   
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] leoniedu commented on issue #36053: [C++][R] summarizing a variable results in NA at random, while there is no NA in the subset of data.

Posted by "leoniedu (via GitHub)" <gi...@apache.org>.

leoniedu commented on issue #36053:
URL: https://github.com/apache/arrow/issues/36053#issuecomment-1615204080

   Thanks @westonpace ! And now I know the warm fuzzy feeling of reporting an important bug or an important open source software and having it promptly addressed! Thanks to you and everyone on the team for the incredible work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz closed issue #36053: [C++][R] summarizing a variable results in NA at random, while there is no NA in the subset of data.

Posted by "bkietz (via GitHub)" <gi...@apache.org>.

bkietz closed issue #36053: [C++][R] summarizing a variable results in NA at random, while there is no NA in the subset of data.
URL: https://github.com/apache/arrow/issues/36053


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] paleolimbot commented on issue #36053: R and dplyr, summarizing a variable results in NA at random, while there is no NA in the subset of data.

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.

paleolimbot commented on issue #36053:
URL: https://github.com/apache/arrow/issues/36053#issuecomment-1597674737

   Thank you for reporting! I can replicate this on MacOS M1 (although I needed a lot more than 10 tries to reliably replicate the last bit):
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   library(dplyr, warn.conflicts = FALSE)
   
   td <- tempfile()
   dir.create(td)
   tzip <- file.path(td, "reprex.zip")
   download.file("https://drive.google.com/uc?export=download&id=1-KefpiALDtUg0PrCUgpMAaE0903jVWWm", destfile = tzip)
   unzip(tzip, exdir = td)
   tlink <- file.path(td, "co_ano_mes=1997-01-01")
   
   arrow_dset <- arrow::open_dataset(
     tlink,
     format = "parquet"
   )
   
   arrow_dset %>%
     count(fluxo, vl_frete_miss = is.na(vl_frete)) %>%
     collect()
   #> # A tibble: 2 × 3
   #>   fluxo vl_frete_miss     n
   #>   <chr> <lgl>         <int>
   #> 1 exp   TRUE          35546
   #> 2 imp   FALSE         42332
   
   replicate(
     1000,  
     arrow_dset %>% 
       group_by(fluxo) %>% 
       summarise(vl_frete = sum(vl_frete)) %>% 
       collect %>% 
       filter(fluxo=="imp") %>%
       pull(vl_frete)
   )
   #>    [1]        NA        NA        NA        NA        NA        NA        NA
   #>    [8]        NA        NA        NA        NA        NA        NA 154149785
   #>   [15]        NA        NA        NA        NA        NA        NA        NA
   #>   [22]        NA        NA        NA        NA        NA        NA        NA
   #>   [29]        NA        NA        NA        NA        NA        NA        NA
   #>   [36]        NA        NA        NA        NA        NA        NA        NA
   #>   [43]        NA        NA        NA        NA        NA        NA        NA
   #>   [50]        NA        NA        NA        NA        NA        NA        NA
   #>   [57]        NA        NA        NA        NA        NA        NA        NA
   #>   [64]        NA        NA 154149785        NA        NA        NA        NA
   #>   [71]        NA        NA        NA        NA        NA        NA        NA
   #>   [78]        NA        NA        NA        NA        NA        NA        NA
   #>   [85]        NA        NA        NA        NA        NA        NA        NA
   #>   [92]        NA        NA        NA        NA        NA        NA        NA
   #>   [99]        NA        NA        NA        NA        NA        NA        NA
   #>  [106]        NA        NA        NA        NA        NA        NA        NA
   #>  [113]        NA        NA        NA        NA        NA        NA        NA
   #>  [120]        NA        NA        NA        NA        NA        NA        NA
   #>  [127]        NA        NA        NA        NA        NA        NA        NA
   #>  [134]        NA        NA        NA        NA        NA        NA        NA
   #>  [141]        NA        NA        NA        NA        NA        NA        NA
   #>  [148]        NA        NA        NA        NA        NA        NA        NA
   #>  [155]        NA        NA        NA        NA        NA        NA        NA
   #>  [162]        NA        NA        NA        NA        NA        NA        NA
   #>  [169]        NA        NA        NA        NA        NA        NA        NA
   #>  [176]        NA        NA        NA        NA        NA        NA        NA
   #>  [183]        NA        NA        NA        NA        NA        NA        NA
   #>  [190]        NA        NA        NA        NA        NA        NA        NA
   #>  [197]        NA        NA        NA        NA        NA        NA        NA
   #>  [204]        NA        NA        NA        NA        NA        NA        NA
   #>  [211]        NA        NA        NA        NA        NA        NA        NA
   #>  [218]        NA        NA        NA        NA        NA        NA        NA
   #>  [225]        NA        NA        NA        NA        NA        NA        NA
   #>  [232]        NA        NA        NA        NA        NA        NA        NA
   #>  [239]        NA        NA 154149785 154149785        NA        NA        NA
   #>  [246]        NA        NA        NA        NA        NA        NA        NA
   #>  [253]        NA        NA        NA        NA        NA        NA        NA
   #>  [260]        NA        NA        NA        NA        NA        NA        NA
   #>  [267]        NA        NA        NA        NA        NA 154149785        NA
   #>  [274]        NA        NA        NA        NA        NA        NA        NA
   #>  [281]        NA        NA        NA        NA        NA        NA        NA
   #>  [288]        NA        NA        NA        NA        NA        NA        NA
   #>  [295]        NA        NA        NA        NA        NA        NA        NA
   #>  [302]        NA        NA        NA        NA        NA        NA        NA
   #>  [309]        NA        NA        NA        NA        NA        NA        NA
   #>  [316]        NA        NA        NA        NA        NA        NA        NA
   #>  [323]        NA        NA        NA        NA        NA        NA        NA
   #>  [330]        NA        NA        NA        NA        NA        NA        NA
   #>  [337]        NA        NA        NA        NA        NA        NA        NA
   #>  [344]        NA        NA        NA        NA        NA        NA        NA
   #>  [351]        NA        NA        NA        NA        NA        NA        NA
   #>  [358]        NA        NA        NA        NA        NA        NA        NA
   #>  [365]        NA        NA        NA        NA        NA        NA        NA
   #>  [372]        NA        NA        NA        NA        NA        NA        NA
   #>  [379]        NA        NA        NA        NA        NA        NA        NA
   #>  [386]        NA        NA        NA        NA        NA        NA        NA
   #>  [393]        NA        NA        NA        NA        NA        NA        NA
   #>  [400]        NA        NA        NA        NA        NA        NA        NA
   #>  [407]        NA        NA        NA        NA        NA        NA        NA
   #>  [414]        NA        NA        NA        NA        NA        NA        NA
   #>  [421]        NA        NA        NA        NA        NA        NA        NA
   #>  [428]        NA        NA        NA        NA        NA        NA        NA
   #>  [435]        NA        NA        NA        NA        NA        NA        NA
   #>  [442]        NA        NA        NA        NA        NA        NA        NA
   #>  [449]        NA        NA        NA        NA        NA        NA        NA
   #>  [456]        NA        NA        NA        NA        NA        NA        NA
   #>  [463]        NA        NA        NA        NA        NA        NA        NA
   #>  [470]        NA        NA        NA        NA        NA        NA        NA
   #>  [477]        NA        NA        NA        NA        NA        NA        NA
   #>  [484]        NA        NA        NA        NA        NA        NA        NA
   #>  [491]        NA        NA        NA        NA        NA        NA        NA
   #>  [498]        NA        NA        NA        NA        NA        NA        NA
   #>  [505]        NA        NA        NA 154149785        NA        NA        NA
   #>  [512]        NA        NA        NA        NA        NA        NA        NA
   #>  [519]        NA        NA        NA        NA        NA        NA        NA
   #>  [526]        NA        NA        NA        NA        NA        NA        NA
   #>  [533]        NA        NA        NA        NA        NA        NA        NA
   #>  [540]        NA        NA        NA        NA        NA        NA        NA
   #>  [547]        NA        NA        NA        NA        NA        NA        NA
   #>  [554]        NA        NA        NA        NA        NA        NA        NA
   #>  [561]        NA        NA        NA        NA        NA        NA        NA
   #>  [568]        NA        NA        NA        NA        NA        NA        NA
   #>  [575]        NA        NA        NA        NA        NA        NA        NA
   #>  [582]        NA        NA        NA        NA        NA        NA        NA
   #>  [589]        NA        NA        NA        NA        NA        NA        NA
   #>  [596]        NA        NA        NA        NA        NA        NA        NA
   #>  [603]        NA        NA        NA        NA        NA        NA        NA
   #>  [610]        NA        NA        NA        NA        NA        NA        NA
   #>  [617]        NA        NA        NA        NA        NA        NA        NA
   #>  [624]        NA        NA        NA        NA        NA        NA        NA
   #>  [631]        NA        NA        NA        NA        NA        NA        NA
   #>  [638]        NA        NA        NA        NA        NA        NA        NA
   #>  [645]        NA        NA        NA        NA        NA        NA        NA
   #>  [652]        NA        NA        NA        NA        NA        NA        NA
   #>  [659]        NA        NA        NA        NA        NA        NA        NA
   #>  [666]        NA        NA        NA        NA        NA        NA        NA
   #>  [673]        NA        NA        NA        NA        NA        NA        NA
   #>  [680]        NA        NA        NA        NA        NA        NA        NA
   #>  [687]        NA        NA        NA        NA        NA        NA        NA
   #>  [694]        NA        NA        NA        NA        NA        NA        NA
   #>  [701]        NA        NA        NA        NA        NA        NA        NA
   #>  [708]        NA        NA        NA        NA        NA        NA        NA
   #>  [715]        NA        NA        NA        NA        NA        NA        NA
   #>  [722] 154149785        NA        NA        NA        NA        NA        NA
   #>  [729]        NA        NA        NA        NA        NA        NA        NA
   #>  [736]        NA        NA        NA        NA        NA        NA        NA
   #>  [743]        NA        NA        NA        NA        NA        NA        NA
   #>  [750]        NA        NA        NA        NA        NA        NA        NA
   #>  [757]        NA        NA        NA        NA        NA        NA        NA
   #>  [764]        NA        NA        NA        NA        NA        NA        NA
   #>  [771]        NA        NA        NA        NA        NA        NA        NA
   #>  [778]        NA        NA        NA        NA        NA        NA        NA
   #>  [785]        NA        NA        NA        NA        NA        NA        NA
   #>  [792]        NA        NA        NA        NA        NA        NA        NA
   #>  [799]        NA        NA        NA        NA        NA        NA        NA
   #>  [806]        NA        NA        NA        NA        NA        NA        NA
   #>  [813]        NA        NA        NA        NA        NA        NA        NA
   #>  [820]        NA        NA        NA        NA        NA        NA        NA
   #>  [827]        NA        NA        NA        NA        NA        NA        NA
   #>  [834]        NA        NA        NA        NA        NA        NA        NA
   #>  [841]        NA        NA        NA        NA        NA        NA        NA
   #>  [848]        NA        NA        NA        NA        NA        NA        NA
   #>  [855]        NA        NA        NA        NA        NA        NA        NA
   #>  [862]        NA        NA        NA        NA        NA        NA        NA
   #>  [869]        NA        NA        NA        NA        NA        NA        NA
   #>  [876]        NA        NA        NA        NA        NA        NA        NA
   #>  [883]        NA        NA        NA        NA        NA        NA        NA
   #>  [890]        NA        NA        NA        NA        NA        NA        NA
   #>  [897]        NA        NA        NA        NA        NA        NA        NA
   #>  [904]        NA        NA        NA        NA        NA        NA        NA
   #>  [911]        NA        NA        NA        NA        NA        NA        NA
   #>  [918]        NA        NA        NA        NA        NA        NA        NA
   #>  [925]        NA        NA        NA        NA        NA        NA        NA
   #>  [932]        NA        NA        NA        NA        NA        NA        NA
   #>  [939]        NA        NA        NA        NA        NA        NA        NA
   #>  [946]        NA        NA        NA        NA        NA        NA        NA
   #>  [953]        NA        NA        NA        NA        NA        NA 154149785
   #>  [960]        NA        NA        NA        NA        NA        NA        NA
   #>  [967]        NA        NA        NA        NA        NA        NA        NA
   #>  [974]        NA        NA        NA        NA        NA        NA        NA
   #>  [981]        NA        NA        NA        NA        NA        NA        NA
   #>  [988]        NA        NA        NA        NA        NA        NA        NA
   #>  [995]        NA        NA        NA        NA        NA        NA
   ```
   
   <sup>Created on 2023-06-19 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #36053: R and dplyr, summarizing a variable results in NA at random, while there is no NA in the subset of data.

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on issue #36053:
URL: https://github.com/apache/arrow/issues/36053#issuecomment-1612384722

   Thanks both for the report, reproduction, and analysis.  This is a good find and an important bug to fix!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org