You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/10/12 13:29:00 UTC
[jira] [Updated] (ARROW-14286) [Python][Parquet] Allow to select
columns of a list field without requiring the list component names
[ https://issues.apache.org/jira/browse/ARROW-14286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-14286:
-----------------------------------
Labels: parquet pull-request-available (was: parquet)
> [Python][Parquet] Allow to select columns of a list field without requiring the list component names
> ----------------------------------------------------------------------------------------------------
>
> Key: ARROW-14286
> URL: https://issues.apache.org/jira/browse/ARROW-14286
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Assignee: Joris Van den Bossche
> Priority: Major
> Labels: parquet, pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Subtask for ARROW-14196.
> Currently, if you have a list column, where the list elements itself are nested items (eg a list of structs), selecting a subset of that list column requires something like {{columns=["columnA.list.item.subfield"]}}. While this "list.item" is superfluous, since a list always contains a single child. So ideally we allow to specify this as {{columns=["columnA.subfield"]}}.
> This also avoids relying on the exact name of the list item (item vs element), for which the default differs between Parquet and Arrow.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)