You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/05/03 12:15:00 UTC

[jira] [Commented] (ARROW-12596) [C++] Implement na.omit/complete.cases

    [ https://issues.apache.org/jira/browse/ARROW-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338343#comment-17338343 ] 

Joris Van den Bossche commented on ARROW-12596:
-----------------------------------------------

We should probably try to define the requirements in more detail (and without R-specific terms). 

So if I understand it correctly, we want a kernel that receives a Table (or RecordBatch) and returns a new Table (or RecordBatch) with all rows removed where any of values is null? This would basically be the Table-equivalent of a {{drop_nulls}} kernel for Array as described in ARROW-1568 ?

This is already somewhat possible by calculating the "is_valid/is_null" for each column, combining those, and then filtering the table with that. This is what [~thisisnic] did in ARROW-12184 for R, and in Python it would look like:

{code:python}
import functools

mask = functools.reduce(pc.and_, map(pc.is_valid, table.columns))
result = pc.filter(table, mask)
{code}

> [C++] Implement na.omit/complete.cases
> --------------------------------------
>
>                 Key: ARROW-12596
>                 URL: https://issues.apache.org/jira/browse/ARROW-12596
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Nic Crane
>            Priority: Major
>
> The R package has a group of functions which allow for the removal of all rows containing null values any of its columns in a dataset, but it'd be awesome if this was implemented at the C++ level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)