You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ian Cook (Jira)" <ji...@apache.org> on 2021/06/03 17:12:00 UTC
[jira] [Created] (ARROW-12960) [C++][R] Option for is_nan(null) to
return true
Ian Cook created ARROW-12960:
--------------------------------
Summary: [C++][R] Option for is_nan(null) to return true
Key: ARROW-12960
URL: https://issues.apache.org/jira/browse/ARROW-12960
Project: Apache Arrow
Issue Type: Improvement
Components: C++, R
Reporter: Ian Cook
(This is the flip side of ARROW-12959.)
Currently the Arrow compute kernel {{is_nan}} always treats {{null}} as a missing value, returning {{null}} at positions of the input datum with {{null}} (missing) values.
It would be helpful to be able to control this behavior with an option. The option could be named {{null_is_nan}} or something similar. It would default to {{false}}, consistent with current behavior. When set to {{true}}, it would return {{true}} at positions of the input datum with {{null}} values.
Among other things, this would enable the {{arrow}} R package to evaluate {{is.nan()}} consistently with the way base R does. In base R, {{is.nan()}} returns {{TRUE}} on {{NA}}. But in the {{arrow}} R package, it returns {{NA}}:
{code:r}
> is.nan(c(3.14, NA, NaN))
##[1] FALSE FALSE TRUE
as.vector(is.nan(Array$create(c(3.14, NA, NaN))))
##[1] FALSE NA TRUE{code}
I think solving this with an option in the C++ kernel is the best solution, because I suspect there are other cases in which users would want the ability to locate both NaN and {{null}} values without making two passes through the data. However, it would also be possible to solve this just in the R package, by changing the mapping of {{is.nan}} in the R package. If we choose to go that route, we should change this Jira issue summary to "[R] Make is.nan(NA) consistent with base R".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)