You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2021/03/10 22:00:00 UTC
[jira] [Comment Edited] (ARROW-8981) [C++][Dataset] Add support for
compressed FileSources
[ https://issues.apache.org/jira/browse/ARROW-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299129#comment-17299129 ]
Ben Kietzman edited comment on ARROW-8981 at 3/10/21, 9:59 PM:
---------------------------------------------------------------
I'm not sure this is worthwhile, actually. I think that the cases where FileSources could benefit from blanket compression are format specific - it's highly unlikely that parquet would be read from a gzipped file. I think it'd be better to support compression on a per-FileFormat basis (an option for or variant of CsvFileFormat for example)
was (Author: bkietz):
I'm not sure this is worthwhile, actually. I think that the cases where FileSources could benefit from blanket compression are slim (it's highly that parquet would be read from a gzipped file). I think it'd be better to support compression on a per-FileFormat basis (an option for or variant of CsvFileFormat for example)
> [C++][Dataset] Add support for compressed FileSources
> -----------------------------------------------------
>
> Key: ARROW-8981
> URL: https://issues.apache.org/jira/browse/ARROW-8981
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 0.17.1
> Reporter: Ben Kietzman
> Priority: Major
> Labels: dataset
>
> FileSource::compression_ is currently ignored. Ideally files/buffers which are compressed could be decompressed on read. See ARROW-8942
--
This message was sent by Atlassian Jira
(v8.3.4#803005)