You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2021/12/20 17:59:00 UTC

[jira] [Created] (ARROW-15168) [R] Add S3 generics to create main Arrow objects

Dewey Dunnington created ARROW-15168:
----------------------------------------

             Summary: [R] Add S3 generics to create main Arrow objects
                 Key: ARROW-15168
                 URL: https://issues.apache.org/jira/browse/ARROW-15168
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Dewey Dunnington


Right now we create Tables, RecordBatches, ChunkedArrays, and Arrays using the corresponding {{$create()}} functions (or a few shortcut functions). This works well for converting other Arrow or base R types to Arow objects but doesn’t work well for objects in other packages (e.g., sf). This is related to ARROW-14378 in that it provides a mechanism for other packages support writing objects to Arrow in a more Arrow-native form instead of serializing attributes that are unlikely to be readable in other packages. Many of these came up when experimenting with {{carrow}} when trying to provide seamless arrow package compatibility for S3 objects that wrap external pointers to C API data structures. S3 is a good way to do this because the other package doesn't have to put arrow in {{Imports}} since it's a heavy dependency.

For argument’s sake I’ll propose adding the following methods: 

-   {{as_arrow_array(x, type = NULL)}} -> {{Array}} 
-   {{as_arrow_chunked_array(x, type = NULL)}} -> {{ChunkedArray}} 
-   {{as_arrow_record_batch(x, schema = NULL)}} -> {{RecordBatch}} 
-   {{as_arrow_table(x, schema = NULL)}} -> {{Table}} 
-   {{as_arrow_data_type(x)}} -> {{DataType}} 
-   {{as_arrow_record_batch_reader(x, schema = NULL)}} -> {{RecordBatchReader}} 

I’ll note that use {{as_adq()}} internally for similar reasons (to convert a few different object types into a arrow dplyr query when that’s the data structure we need). 

As part of this ticket, if we choose to move forward, we should implement the default methods with some internal consistency (i.e., somebody wanting to provide Arrow support in a package probably only has to implement {{as_arrow_array()}} to get most support.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)