You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Matthew Summersgill (Jira)" <ji...@apache.org> on 2021/03/04 17:19:01 UTC

[jira] [Commented] (ARROW-9083) [R] collect int64, uint32, uint64 as R integer type if not out of bounds

    [ https://issues.apache.org/jira/browse/ARROW-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295431#comment-17295431 ] 

Matthew Summersgill commented on ARROW-9083:
--------------------------------------------

 

Hate to dredge up an issue that was settled last year, but the current behavior is yielding inconsistent result structures depending on the contents of an int64 column in my query. [~npr] – Did you ever implement any kind of system option I can set to maintain int64 type regardless of the contents?

If not, would you be open to a PR with that option? I'd imagine something along the lines of checking a user configurable option .Options$arrow.strict.integer64, and if .Options$arrow.strict.integer64==TRUE, skip the step of checking max. In addition, an error would be thrown if this option was enabled and the bit64 package was not enabled.

 
{code:r}
str(collect(SparkR::sql("SELECT bigint('2147483648') AS x;"))$x[[1]])
## integer64 2147483648 
str(collect(SparkR::sql("SELECT bigint('1') AS x;"))$x[[1]])
## int 1
{code}
 

> [R] collect int64, uint32, uint64 as R integer type if not out of bounds
> ------------------------------------------------------------------------
>
>                 Key: ARROW-9083
>                 URL: https://issues.apache.org/jira/browse/ARROW-9083
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Neal Richardson
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{bit64::integer64}} can be awkward to work with in R (one example: https://github.com/apache/arrow/issues/7385). Often in Arrow we get {{int64}} types from [compute methods|https://github.com/apache/arrow/pull/7308] or other translation methods that auto-promote to the largest integer type, but they would fit fine in a 32-bit integer, which is R's native type. 
> When calling {{Array__as_vector}} on an int64, we could first call the minmax function on the array, and if the extrema are within the range of a 32-bit int, return a regular R integer vector. This would add a little bit of ambiguity as to what R type you'll get from an Arrow type, but I wonder if the benefits are worth it since you can't do much with an integer64 in R. (We could also make this optional, similar to ARROW-7657, so you could specify a "strict" mode if you are in a use case where roundtrip fidelity is more important than R usability.)
> Likewise, uint32 and uint64 could be kept as integers and prevent the conversion to double that is currently implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)