You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neville Dipale (Jira)" <ji...@apache.org> on 2021/02/13 11:40:00 UTC
[jira] [Created] (ARROW-11618) [Rust] [Parquet] String-based path
column projection
Neville Dipale created ARROW-11618:
--------------------------------------
Summary: [Rust] [Parquet] String-based path column projection
Key: ARROW-11618
URL: https://issues.apache.org/jira/browse/ARROW-11618
Project: Apache Arrow
Issue Type: Task
Components: Rust
Reporter: Neville Dipale
There is currently no way to select a column by its path, e.g. 'a.b.c'. We have to select the column by its index, which is not trivial for nested structures.
For example, if a record has the following schema, the column indices are shown in parentheses:
{code}
schema:
a [struct] ("a")
b [struct] ("a.b")
c [int32] ("a.b.c")
d [struct] ("a.b.d")
e [int32] ("a.b.d.e") [0]
f [bool] ("a.b.d.f") [1]
g [int64] ("a.b.g") [2]
{code}
if one wants to select 'a.b', they need to know that 'a.b' spans 3 (0 to 2) columns. This is inconvenient, and potentially forces readers to read whole records to avoid this inconvenience.
A string-based projection could allow one to select columns 0 and 1 via "a.b.d" or column 2 via "a.b.g"
--
This message was sent by Atlassian Jira
(v8.3.4#803005)