You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neville Dipale (Jira)" <ji...@apache.org> on 2021/02/13 11:42:00 UTC

[jira] [Updated] (ARROW-11618) [Rust] [Parquet] String-based path column projection

     [ https://issues.apache.org/jira/browse/ARROW-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neville Dipale updated ARROW-11618:
-----------------------------------
    Issue Type: New Feature  (was: Task)

> [Rust] [Parquet] String-based path column projection
> ----------------------------------------------------
>
>                 Key: ARROW-11618
>                 URL: https://issues.apache.org/jira/browse/ARROW-11618
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust
>            Reporter: Neville Dipale
>            Priority: Major
>
> There is currently no way to select a column by its path, e.g. 'a.b.c'. We have to select the column by its index, which is not trivial for nested structures.
> For example, if a record has the following schema, the column indices are shown in parentheses:
> {code}
> schema:
>   a [struct]         ("a")        
>     b [struct]       ("a.b")      
>       c [int32]      ("a.b.c")    [0]
>       d [struct]     ("a.b.d")    
>         e [int32]    ("a.b.d.e")  [1]
>         f [bool]     ("a.b.d.f")  [2]
>       g [int64]      ("a.b.g")    [3]
> {code}
> if one wants to select 'a.b', they need to know that 'a.b.d' spans 2 (1 to 2) columns. This is inconvenient, and potentially forces readers to read whole records to avoid this inconvenience.
> A string-based projection could allow one to select columns 1 and 2 via "a.b.d" or column 2 via "a.b.g"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)