You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "multimeric (via GitHub)" <gi...@apache.org> on 2023/03/02 06:11:21 UTC

[GitHub] [arrow] multimeric opened a new issue, #34409: Named lists cannot be serialized to a map column

multimeric opened a new issue, #34409:
URL: https://github.com/apache/arrow/issues/34409

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   Let's start with a simple table that contains a named list in each row:
   ```R
   > x = tibble::tibble(id=1:3, metadata=list(list(a=1), list(b=2, c=3), list(d=3, e=4)))
   > x
   # A tibble: 3 × 2
        id metadata
     <int> <list>
   1     1 <named list [1]>
   2     2 <named list [2]>
   3     3 <named list [2]>
   ```
   
   From the outset, I would hope that the `metadata` column would be converted to a `Map<string, int32>` in Arrow, but actually it's converted to a list column. This is not the correct behaviour in a lot of cases, because a named list in R is often treated more like a map/dictionary than a list:
   
   ```
   > as_arrow_table(x)
   Table
   3 rows x 2 columns
   $id <int32>
   $metadata: list<item: list<item <double>>>
   ```
   
   Secondly, even if I manually request arrow to use a map column, it fails:
   
   ```r
   > arrow_table(x, schema = schema(id=int32(), metadata = map_of(string(), int32())))
   Error: Invalid: Can only convert data frames to Struct type
   ```
   
   It would be wonderful if one or both of these pathways worked, to allow named lists to be serialized as arrow maps.
   
   Session info below:
   <details>
   
   ```R
   > sessionInfo()
   R version 4.2.1 (2022-06-23)
   Platform: x86_64-pc-linux-gnu (64-bit)
   Running under: CentOS Linux 7 (Core)
   
   Matrix products: default
   BLAS:   /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRblas.so
   LAPACK: /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRlapack.so
   
   locale:
    [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
    [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
    [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8
    [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
    [9] LC_ADDRESS=C               LC_TELEPHONE=C
   [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
   
   attached base packages:
   [1] stats     graphics  grDevices utils     datasets  methods   base
   
   other attached packages:
   [1] tibble_3.1.8   arrow_11.0.0.2
   
   loaded via a namespace (and not attached):
    [1] fansi_1.0.3      utf8_1.2.2       assertthat_0.2.1 R6_2.5.1
    [5] lifecycle_1.0.3  magrittr_2.0.3   pillar_1.8.1     rlang_1.0.6
    [9] cli_3.6.0        vctrs_0.5.2      bit64_4.0.5      glue_1.6.2
   [13] purrr_1.0.1      bit_4.0.5        compiler_4.2.1   pkgconfig_2.0.3
   [17] tidyselect_1.2.0
   ```
   
   </details>
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] multimeric commented on issue #34409: [R] Named lists cannot be serialized to a map column

Posted by "multimeric (via GitHub)" <gi...@apache.org>.
multimeric commented on issue #34409:
URL: https://github.com/apache/arrow/issues/34409#issuecomment-1455431434

   I also note that this doesn't work with envs either, which is an even more explicit key-value data structure in R:
   ```R
   > y = tibble::tibble(a = 1:3, b=list(env(a=1), env(b=2), env(c=3)))
   > y
   # A tibble: 3 × 2
         a b
     <int> <list>
   1     1 <env>
   2     2 <env>
   3     3 <env>
   > arrow::as_arrow_table(y)
   Error: Unrecognized vector instance for type ENVSXP
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] multimeric commented on issue #34409: [R] Named lists cannot be serialized to a map column

Posted by "multimeric (via GitHub)" <gi...@apache.org>.
multimeric commented on issue #34409:
URL: https://github.com/apache/arrow/issues/34409#issuecomment-1455401977

   Yes, although I'm happy for an explicit conversion from named list to map column, whereas that issue is talking about round-tripping which requires an implicit detection of when a named list should be serialized as a map.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eitsupi commented on issue #34409: [R] Named lists cannot be serialized to a map column

Posted by "eitsupi (via GitHub)" <gi...@apache.org>.
eitsupi commented on issue #34409:
URL: https://github.com/apache/arrow/issues/34409#issuecomment-1452040990

   Maybe related to #15033?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org