You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2022/03/10 23:27:00 UTC
[jira] [Commented] (ARROW-15168) [R] Add S3 generics to create main Arrow objects
[ https://issues.apache.org/jira/browse/ARROW-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504651#comment-17504651 ]
Jonathan Keane commented on ARROW-15168:
----------------------------------------
This sounds good to me. We do have a few of these helpers (though they aren't generics...) like {{arrow_table}}. I'm fine with transitioning all of those to {{as_...}} versions of themselves, or we could drop the {{as_}} and repurpose them (AFAIK {{arrow_table}} is literally an alias for {{Table$create}} right now.)
> [R] Add S3 generics to create main Arrow objects
> ------------------------------------------------
>
> Key: ARROW-15168
> URL: https://issues.apache.org/jira/browse/ARROW-15168
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Dewey Dunnington
> Priority: Major
>
> Right now we create Tables, RecordBatches, ChunkedArrays, and Arrays using the corresponding {{$create()}} functions (or a few shortcut functions). This works well for converting other Arrow or base R types to Arow objects but doesn’t work well for objects in other packages (e.g., sf). This is related to ARROW-14378 in that it provides a mechanism for other packages support writing objects to Arrow in a more Arrow-native form instead of serializing attributes that are unlikely to be readable in other packages. Many of these came up when experimenting with {{carrow}} when trying to provide seamless arrow package compatibility for S3 objects that wrap external pointers to C API data structures. S3 is a good way to do this because the other package doesn't have to put arrow in {{Imports}} since it's a heavy dependency.
> For argument’s sake I’ll propose adding the following methods:
> - {{as_arrow_array(x, type = NULL)}} -> {{Array}}
> - {{as_arrow_chunked_array(x, type = NULL)}} -> {{ChunkedArray}}
> - {{as_arrow_record_batch(x, schema = NULL)}} -> {{RecordBatch}}
> - {{as_arrow_table(x, schema = NULL)}} -> {{Table}}
> - {{as_arrow_data_type(x)}} -> {{DataType}}
> - {{as_arrow_record_batch_reader(x, schema = NULL)}} -> {{RecordBatchReader}}
> I’ll note that use {{as_adq()}} internally for similar reasons (to convert a few different object types into a arrow dplyr query when that’s the data structure we need).
> As part of this ticket, if we choose to move forward, we should implement the default methods with some internal consistency (i.e., somebody wanting to provide Arrow support in a package probably only has to implement {{as_arrow_array()}} to get most support.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)