You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/05/03 12:15:00 UTC
[jira] [Commented] (ARROW-12596) [C++] Implement
na.omit/complete.cases
[ https://issues.apache.org/jira/browse/ARROW-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338343#comment-17338343 ]
Joris Van den Bossche commented on ARROW-12596:
-----------------------------------------------
We should probably try to define the requirements in more detail (and without R-specific terms).
So if I understand it correctly, we want a kernel that receives a Table (or RecordBatch) and returns a new Table (or RecordBatch) with all rows removed where any of values is null? This would basically be the Table-equivalent of a {{drop_nulls}} kernel for Array as described in ARROW-1568 ?
This is already somewhat possible by calculating the "is_valid/is_null" for each column, combining those, and then filtering the table with that. This is what [~thisisnic] did in ARROW-12184 for R, and in Python it would look like:
{code:python}
import functools
mask = functools.reduce(pc.and_, map(pc.is_valid, table.columns))
result = pc.filter(table, mask)
{code}
> [C++] Implement na.omit/complete.cases
> --------------------------------------
>
> Key: ARROW-12596
> URL: https://issues.apache.org/jira/browse/ARROW-12596
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Nic Crane
> Priority: Major
>
> The R package has a group of functions which allow for the removal of all rows containing null values any of its columns in a dataset, but it'd be awesome if this was implemented at the C++ level.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)