You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2021/03/10 22:00:00 UTC

[jira] [Comment Edited] (ARROW-8981) [C++][Dataset] Add support for compressed FileSources

    [ https://issues.apache.org/jira/browse/ARROW-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299129#comment-17299129 ] 

Ben Kietzman edited comment on ARROW-8981 at 3/10/21, 9:59 PM:
---------------------------------------------------------------

I'm not sure this is worthwhile, actually. I think that the cases where FileSources could benefit from blanket compression are format specific - it's highly unlikely that parquet would be read from a gzipped file. I think it'd be better to support compression on a per-FileFormat basis (an option for or variant of CsvFileFormat for example)


was (Author: bkietz):
I'm not sure this is worthwhile, actually. I think that the cases where FileSources could benefit from blanket compression are slim (it's highly that parquet would be read from a gzipped file). I think it'd be better to support compression on a per-FileFormat basis (an option for or variant of CsvFileFormat for example)

> [C++][Dataset] Add support for compressed FileSources
> -----------------------------------------------------
>
>                 Key: ARROW-8981
>                 URL: https://issues.apache.org/jira/browse/ARROW-8981
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 0.17.1
>            Reporter: Ben Kietzman
>            Priority: Major
>              Labels: dataset
>
> FileSource::compression_ is currently ignored. Ideally files/buffers which are compressed could be decompressed on read. See ARROW-8942



--
This message was sent by Atlassian Jira
(v8.3.4#803005)