You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/22 12:34:16 UTC

[GitHub] [arrow] paleolimbot commented on pull request #12467: ARROW-15471: [R] ExtensionType support in R

paleolimbot commented on pull request #12467:
URL: https://github.com/apache/arrow/pull/12467#issuecomment-1075121124


   The key step that was missing for the roundtrip was `register_extension_type()`, which is needed so that Arrow C++ knows not to discard the extension metadata when it encounters the type! (see details).
   
   I should probably export `ExtensionArray` and use `ExtensionArray$create()` rather than `new_extension_array()` since it's more arrow-ish to do that. Maybe `ExtensionType$create()` instead of `new_extension_type()` is where extension type creation should go, too.
   
   Printing is a good point...definitely confusing in the case of an extension type!
   
   <details>
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   KoreanAge <- R6::R6Class(
     "KoreanAge", 
     inherit = ExtensionType,
     public = list(
       .array_as_vector = function(extension_array) {
         extension_array$storage()$as_vector() + 1
       }
     )
   )
   
   # constructor helpers
   korean_age <- function() {
     new_extension_type(
       int32(),
       "KoreanAge",
       charToRaw("Korean Age, but stored as the western age value"),
       type_class = KoreanAge
     )
   }
   
   korean_age_array <- function(age_korean) {
     new_extension_array(age_korean - 1, korean_age())
   }
   
   (arr <- korean_age_array(1:3))
   #> ExtensionArray
   #> <KoreanAge <Korean Age, but stored as the western age value>>
   #> [
   #>   0,
   #>   1,
   #>   2
   #> ]
   as.vector(arr)
   #> [1] 1 2 3
   
   # you need to register the type for Arrow C++ to keep the extension type
   # slash metadata when it encounters it at the C++ level (import from C
   # and reading files)
   register_extension_type(korean_age())
   
   tf <- tempfile()
   write_feather(arrow_table(col = arr), tf)
   
   tab <- read_feather(tf, as_data_frame = FALSE)
   
   type(tab$col)
   #> KoreanAge
   #> KoreanAge <Korean Age, but stored as the western age value>
   
   as.vector(tab$col)
   #> [1] 0 1 2
   ```
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org