You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Pierre Gramme (Jira)" <ji...@apache.org> on 2022/01/12 16:22:00 UTC
[jira] [Created] (ARROW-15312) [R] filtering a dataset with is.na() misses some rows
Pierre Gramme created ARROW-15312:
-------------------------------------
Summary: [R] filtering a dataset with is.na() misses some rows
Key: ARROW-15312
URL: https://issues.apache.org/jira/browse/ARROW-15312
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 6.0.1
Environment: R 4.1.2 on Windows
arrow 6.0.1
dplyr 1.0.7
Reporter: Pierre Gramme
Hi !
I just found an issue when querying an Arrow dataset with dplyr, filtering on is.na(...)
It seems linked to columns containing only one distinct value and some NA's.
Can you also reproduce the following?
{quote} library(arrow)
library(dplyr)
ds_path = "test-arrow-na"
df = tibble(x=1:3, y=c(0L, 0L, NA_integer_), z=c(0L, 1L, NA_integer_))
df %>% arrow::write_dataset(ds_path)
# OK: Collect then filter: returns row 3, as expected
arrow::open_dataset(ds_path) %>% collect() %>% filter(is.na(y))
# ERROR: Filter then collect (on y) returns a tibble with no row
arrow::open_dataset(ds_path) %>% filter(is.na(y)) %>% collect()
# OK: Filter then collect (on z) returns row 3, as expected
arrow::open_dataset(ds_path) %>% filter(is.na(z)) %>% collect()
{quote}
Thanks
Pierre
--
This message was sent by Atlassian Jira
(v8.20.1#820001)