You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2022/03/10 23:27:00 UTC
[jira] [Commented] (ARROW-15168) [R] Add S3 generics to create main Arrow objects

    [ https://issues.apache.org/jira/browse/ARROW-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504651#comment-17504651 ] 

Jonathan Keane commented on ARROW-15168:
----------------------------------------

This sounds good to me. We do have a few of these helpers (though they aren't generics...) like {{arrow_table}}. I'm fine with transitioning all of those to {{as_...}} versions of themselves, or we could drop the {{as_}} and repurpose them (AFAIK {{arrow_table}} is literally an alias for {{Table$create}} right now.)

> [R] Add S3 generics to create main Arrow objects
> ------------------------------------------------
>
>                 Key: ARROW-15168
>                 URL: https://issues.apache.org/jira/browse/ARROW-15168
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dewey Dunnington
>            Priority: Major
>
> Right now we create Tables, RecordBatches, ChunkedArrays, and Arrays using the corresponding {{$create()}} functions (or a few shortcut functions). This works well for converting other Arrow or base R types to Arow objects but doesn’t work well for objects in other packages (e.g., sf). This is related to ARROW-14378 in that it provides a mechanism for other packages support writing objects to Arrow in a more Arrow-native form instead of serializing attributes that are unlikely to be readable in other packages. Many of these came up when experimenting with {{carrow}} when trying to provide seamless arrow package compatibility for S3 objects that wrap external pointers to C API data structures. S3 is a good way to do this because the other package doesn't have to put arrow in {{Imports}} since it's a heavy dependency.
> For argument’s sake I’ll propose adding the following methods: 
> -   {{as_arrow_array(x, type = NULL)}} -> {{Array}} 
> -   {{as_arrow_chunked_array(x, type = NULL)}} -> {{ChunkedArray}} 
> -   {{as_arrow_record_batch(x, schema = NULL)}} -> {{RecordBatch}} 
> -   {{as_arrow_table(x, schema = NULL)}} -> {{Table}} 
> -   {{as_arrow_data_type(x)}} -> {{DataType}} 
> -   {{as_arrow_record_batch_reader(x, schema = NULL)}} -> {{RecordBatchReader}} 
> I’ll note that use {{as_adq()}} internally for similar reasons (to convert a few different object types into a arrow dplyr query when that’s the data structure we need). 
> As part of this ticket, if we choose to move forward, we should implement the default methods with some internal consistency (i.e., somebody wanting to provide Arrow support in a package probably only has to implement {{as_arrow_array()}} to get most support.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)