You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Will Jones (Jira)" <ji...@apache.org> on 2022/06/30 16:19:00 UTC
[jira] [Closed] (ARROW-16641) [R] How to filter array columns?
[ https://issues.apache.org/jira/browse/ARROW-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Jones closed ARROW-16641.
------------------------------
Fix Version/s: (was: 9.0.0)
Assignee: Will Jones
Resolution: Information Provided
> [R] How to filter array columns?
> --------------------------------
>
> Key: ARROW-16641
> URL: https://issues.apache.org/jira/browse/ARROW-16641
> Project: Apache Arrow
> Issue Type: Wish
> Components: R
> Reporter: Vladimir
> Assignee: Will Jones
> Priority: Minor
>
> In the parquet data we have, there is a column with the array data type ({*}list<array_element <string>>{*}), which flags records that have different issues. For each record, multiple values could be stored in the column. For example, `{_}[A, B, C]{_}`.
> I'm trying to perform a data filtering step and exclude some flagged records.
> Filtering is trivial for the regular columns that contain just a single value. E.g.,
> {code:java}
> flags_to_exclude <- c("A", "B")
> datt %>% filter(! col %in% flags_to_exclude)
> {code}
> Given the array column, is it possible to exclude records with at least one of the flags from `flags_to_exclude` using the arrow R package?
> I really appreciate any advice you can provide!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)