You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "paleolimbot (via GitHub)" <gi...@apache.org> on 2023/02/22 14:49:45 UTC

[GitHub] [arrow] paleolimbot commented on issue #33784: [R] writing/reading a data.frame with column class 'list' changes column class

paleolimbot commented on issue #33784:
URL: https://github.com/apache/arrow/issues/33784#issuecomment-1440174266

   Echoing Nic's thanks for opening this...our support for list columns is far from perfect. In particular, we drop names without warning, which we should fix.
   
   Supporting names internally in Arrow is hard because Arrow doesn't have an internal concept of named things so we would have to invent one. We will probably get there - probably via an extension type - but in the meantime you will have to do some conversion to/from arrow yourself as a workaround.
   
   The two workarounds I can think of off the top of my head are (1) serialize list objects on the way in and unserialize them on the way out:
   
   ``` r
   library(tibble)
   
   tb <- tibble(list_column = list(c(a = 1, b = 2)))
   str(tb$list_column)
   #> List of 1
   #>  $ : Named num [1:2] 1 2
   #>   ..- attr(*, "names")= chr [1:2] "a" "b"
   
   serialize_list_col_to_binary <- function(x) {
     lapply(x, serialize, NULL)
   }
   
   unserialize_list_col_from_binary <- function(x) {
     lapply(x, unserialize)
   }
   
   # Write + read back:
   tmpf <- tempfile()
   tb$list_column <- serialize_list_col_to_binary(tb$list_column)
   arrow::write_feather(tb, tmpf)
   df2 <- arrow::read_feather(tmpf)
   df2$list_column <- unserialize_list_col_from_binary(df2$list_column)
   
   str(df2$list_column)
   #> List of 1
   #>  $ : Named num [1:2] 1 2
   #>   ..- attr(*, "names")= chr [1:2] "a" "b"
   ```
   
   ...or (2) do some of your own modifications to make the list element types fit better in Arrow. In your case, your list elements could be data.frames:
   
   ``` r
   library(tibble)
   
   tb <- tibble(list_column = list(c(a = 1, b = 2)))
   str(tb$list_column)
   #> List of 1
   #>  $ : Named num [1:2] 1 2
   #>   ..- attr(*, "names")= chr [1:2] "a" "b"
   
   list_col_to_arrow_friendly <- function(x) {
     lapply(x, function(x) {
       if (is.null(x)) NULL else as.data.frame(as.list(x))
     })
   }
   
   
   tb$list_column <- list_col_to_arrow_friendly(tb$list_column)
   str(tb$list_column)
   #> List of 1
   #>  $ :'data.frame':    1 obs. of  2 variables:
   #>   ..$ a: num 1
   #>   ..$ b: num 2
   
   
   # Write + read back:
   tmpf <- tempfile()
   arrow::write_feather(tb, tmpf)
   df2 <- arrow::read_feather(tmpf)
   str(df2$list_column)
   #> list<
   #>   tbl_df<
   #>     a: double
   #>     b: double
   #>   >
   #> > [1:1] 
   #> $ : tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
   #>  ..$ a: num 1
   #>  ..$ b: num 2
   #> @ ptype: tibble [0 × 2] (S3: tbl_df/tbl/data.frame)
   #>  ..$ a: num(0) 
   #>  ..$ b: num(0)
   ```
   
   <sup>Created on 2023-02-22 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org