You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Jayjeet Chakraborty (Jira)" <ji...@apache.org> on 2021/06/21 12:34:00 UTC

[jira] [Closed] (ARROW-13126) Read out only the required columns from a Feather file on Disk

     [ https://issues.apache.org/jira/browse/ARROW-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jayjeet Chakraborty closed ARROW-13126.
---------------------------------------
    Resolution: Fixed

> Read out only the required columns from a Feather file on Disk
> --------------------------------------------------------------
>
>                 Key: ARROW-13126
>                 URL: https://issues.apache.org/jira/browse/ARROW-13126
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Jayjeet Chakraborty
>            Priority: Major
>
> The LZ4 compressed feather format works really well and is quite nice and fast than Parquet when reading all the columns out. For single columns, looks like feather format does not yet support the capability to read out only the required columns from the disk. Are there any plans to add support for this? Here are some numbers to support my claim from an experiment with a table with 17 columns and 5 million rows in uncompressed parquet and LZ4 compressed feather format. No memory mapping involved.
>  
> pq_all_cols: 0.4179724836349487 ms
> feather_all_cols: 0.26202451705932617 ms
> pq_single_col: 0.10951032638549804 ms
> feather_single_col: 0.2119576358795166 ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)