You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/04/27 21:18:00 UTC

[jira] [Commented] (ARROW-14379) [R] Create a custom extension of list that stores row-level metadata

    [ https://issues.apache.org/jira/browse/ARROW-14379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529086#comment-17529086 ] 

Dewey Dunnington commented on ARROW-14379:
------------------------------------------

Are there examples other than sf columns where this is relevant? It would be possible to make a generic list extension type that just calls {{serialize()}} on each element but it would probably be slow and we probably want to encourage other solutions. The other example I can think of is a list of models, maybe, for which the `broom::glance()` or `broom::tidy()` representation would fit in to Arrow format much better.

> [R] Create a custom extension of list that stores row-level metadata
> --------------------------------------------------------------------
>
>                 Key: ARROW-14379
>                 URL: https://issues.apache.org/jira/browse/ARROW-14379
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: R
>            Reporter: Jonathan Keane
>            Priority: Major
>
> Since lists can be nested, we should be able store each element as something like {{list(value = "foo", attributes = list(attr1 = TRUE, attr2 = "baz"))}} and then we can reconstitute that in the R conversion to transfer the attributes element to attributes.
> This will be more efficient (since we get compression on the column + metadata/attributes) and we also will be able to filter these + use them in datasets since each row has all of the information about itself that it needs to roundtrip.
> This would get us SF columns for free



--
This message was sent by Atlassian Jira
(v8.20.7#820007)