You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/07/14 08:45:00 UTC
[jira] [Created] (ARROW-9459) [C++][Dataset] Make
collecting/parsing statistics optional for ParquetFragment
Joris Van den Bossche created ARROW-9459:
--------------------------------------------
Summary: [C++][Dataset] Make collecting/parsing statistics optional for ParquetFragment
Key: ARROW-9459
URL: https://issues.apache.org/jira/browse/ARROW-9459
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Joris Van den Bossche
See some timing checks here: https://github.com/dask/dask/pull/6346#issuecomment-656548675
Parsing all statistics, even from a centralized {{_metadata}} file can be quite expensive. If you know in advance that you are not going to use them (eg you are only going to do filtering on the partition fields, and otherwise read all data), it could be nice to have an option to disable parsing statistics.
cc [~rjzamora] [~bkietz] [~fsaintjacques]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)