You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/12/17 19:28:00 UTC

[jira] [Commented] (ARROW-9459) [C++][Dataset] Make collecting/parsing statistics optional for ParquetFragment

    [ https://issues.apache.org/jira/browse/ARROW-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251306#comment-17251306 ] 

Joris Van den Bossche commented on ARROW-9459:
----------------------------------------------

 ARROW-10131 is implemented in the meantime (lazy parsing), so an option to disable parsing should not be necessary then. So closing this issue

> [C++][Dataset] Make collecting/parsing statistics optional for ParquetFragment
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-9459
>                 URL: https://issues.apache.org/jira/browse/ARROW-9459
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: dataset, dataset-dask-integration
>             Fix For: 3.0.0
>
>
> See some timing checks here: https://github.com/dask/dask/pull/6346#issuecomment-656548675
> Parsing all statistics, even from a centralized {{_metadata}} file, can be quite expensive. If you know in advance that you are not going to use them (eg you are only going to do filtering on the partition fields, and otherwise read all data), it could be nice to have an option to disable parsing statistics.
> cc [~rjzamora] [~bkietz] [~fsaintjacques]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)