You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Maarten Ballintijn (Jira)" <ji...@apache.org> on 2020/05/09 18:34:00 UTC

[jira] [Commented] (ARROW-8746) [Python][Documentation] Add column limit recommendations Parquet page

    [ https://issues.apache.org/jira/browse/ARROW-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103416#comment-17103416 ] 

Maarten Ballintijn commented on ARROW-8746:
-------------------------------------------

[~wesm], you've mentioned this before and as this is a not uncommon use-case can you maybe expand a bit on the following related questions. (use-cases include daily or minute data for a few 10's of thousands items like stocks or other financial instruments, IoT sensors, etc).
 * Parquet Standard - Is the issue intrinsic to the Parquet standard you think? The ability to read a sub-set of the columns and or row-groups, compact storage through the use of RLE, categoricals etc, all seem to point to the format being well suited for these use-cases?
 * Parquet-C++ implementation - Is the issue with current Parquet-C++ implementation, or any of the dependencies? Is it something which could be fixed? Would a specialized implementation help? Is the problem related to going from Parquet -> Arrow -> Python/Pandas? E.g. would a Parquet -> numpy reader work better?
 * Alternatives - What would you recommend as a superior solution? Store this data tall i.s.o wide? Use another storage format?

Appreciate your (and others) insights.

Cheers, Maarten.

> [Python][Documentation] Add column limit recommendations Parquet page
> ---------------------------------------------------------------------
>
>                 Key: ARROW-8746
>                 URL: https://issues.apache.org/jira/browse/ARROW-8746
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Documentation, Python
>            Reporter: Wes McKinney
>            Priority: Major
>
> Users would be well advised to not write columns with large numbers (> 1000) of columns



--
This message was sent by Atlassian Jira
(v8.3.4#803005)