You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "multimeric (via GitHub)" <gi...@apache.org> on 2023/03/02 06:11:21 UTC
[GitHub] [arrow] multimeric opened a new issue, #34409: Named lists cannot be serialized to a map column
multimeric opened a new issue, #34409:
URL: https://github.com/apache/arrow/issues/34409
### Describe the bug, including details regarding any error messages, version, and platform.
Let's start with a simple table that contains a named list in each row:
```R
> x = tibble::tibble(id=1:3, metadata=list(list(a=1), list(b=2, c=3), list(d=3, e=4)))
> x
# A tibble: 3 × 2
id metadata
<int> <list>
1 1 <named list [1]>
2 2 <named list [2]>
3 3 <named list [2]>
```
From the outset, I would hope that the `metadata` column would be converted to a `Map<string, int32>` in Arrow, but actually it's converted to a list column. This is not the correct behaviour in a lot of cases, because a named list in R is often treated more like a map/dictionary than a list:
```
> as_arrow_table(x)
Table
3 rows x 2 columns
$id <int32>
$metadata: list<item: list<item <double>>>
```
Secondly, even if I manually request arrow to use a map column, it fails:
```r
> arrow_table(x, schema = schema(id=int32(), metadata = map_of(string(), int32())))
Error: Invalid: Can only convert data frames to Struct type
```
It would be wonderful if one or both of these pathways worked, to allow named lists to be serialized as arrow maps.
Session info below:
<details>
```R
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRblas.so
LAPACK: /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tibble_3.1.8 arrow_11.0.0.2
loaded via a namespace (and not attached):
[1] fansi_1.0.3 utf8_1.2.2 assertthat_0.2.1 R6_2.5.1
[5] lifecycle_1.0.3 magrittr_2.0.3 pillar_1.8.1 rlang_1.0.6
[9] cli_3.6.0 vctrs_0.5.2 bit64_4.0.5 glue_1.6.2
[13] purrr_1.0.1 bit_4.0.5 compiler_4.2.1 pkgconfig_2.0.3
[17] tidyselect_1.2.0
```
</details>
### Component(s)
R
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] multimeric commented on issue #34409: [R] Named lists cannot be serialized to a map column
Posted by "multimeric (via GitHub)" <gi...@apache.org>.
multimeric commented on issue #34409:
URL: https://github.com/apache/arrow/issues/34409#issuecomment-1455431434
I also note that this doesn't work with envs either, which is an even more explicit key-value data structure in R:
```R
> y = tibble::tibble(a = 1:3, b=list(env(a=1), env(b=2), env(c=3)))
> y
# A tibble: 3 × 2
a b
<int> <list>
1 1 <env>
2 2 <env>
3 3 <env>
> arrow::as_arrow_table(y)
Error: Unrecognized vector instance for type ENVSXP
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] multimeric commented on issue #34409: [R] Named lists cannot be serialized to a map column
Posted by "multimeric (via GitHub)" <gi...@apache.org>.
multimeric commented on issue #34409:
URL: https://github.com/apache/arrow/issues/34409#issuecomment-1455401977
Yes, although I'm happy for an explicit conversion from named list to map column, whereas that issue is talking about round-tripping which requires an implicit detection of when a named list should be serialized as a map.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] eitsupi commented on issue #34409: [R] Named lists cannot be serialized to a map column
Posted by "eitsupi (via GitHub)" <gi...@apache.org>.
eitsupi commented on issue #34409:
URL: https://github.com/apache/arrow/issues/34409#issuecomment-1452040990
Maybe related to #15033?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org