You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Sam Albers (Jira)" <ji...@apache.org> on 2020/03/25 17:27:00 UTC

[jira] [Created] (ARROW-8216) filter method for Dataset doesn't distinguish between empty strings and NAs

Sam Albers created ARROW-8216:
---------------------------------

             Summary: filter method for Dataset doesn't distinguish between empty strings and NAs
                 Key: ARROW-8216
                 URL: https://issues.apache.org/jira/browse/ARROW-8216
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 0.16.0
         Environment: R 3.6.3, Windows 10
            Reporter: Sam Albers


 

I have just noticed some slightly odd behaviour with the filter method for Dataset. 
{code:java}
library(arrow)
library(dplyr)
packageVersion("arrow")
#> [1] '0.16.0.20200323'
## Make sample parquet
starwars$hair_color[starwars$hair_color == "brown"] <- ""
dir <- tempdir()
fpath <- file.path(dir, 'data.parquet')
write_parquet(starwars, fpath)
## df in memory
df_mem <- starwars %>% 
 filter(hair_color == "")
## reading from the parquet
df_parquet <- read_parquet(fpath) %>% 
 filter(hair_color == "")
## using open_dataset
df_dataset <- open_dataset(dir) %>% 
 filter(hair_color == "") %>% 
 collect()
{code}
I'm pretty sure all these should return the same data.frame. Am I missing something?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)