You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Damien Profeta (JIRA)" <ji...@apache.org> on 2017/09/15 23:09:01 UTC
[jira] [Created] (DRILL-5795) Filter pushdown for parquet handles
multi rowgroup file
Damien Profeta created DRILL-5795:
-------------------------------------
Summary: Filter pushdown for parquet handles multi rowgroup file
Key: DRILL-5795
URL: https://issues.apache.org/jira/browse/DRILL-5795
Project: Apache Drill
Issue Type: Improvement
Components: Storage - Parquet
Reporter: Damien Profeta
DRILL-1950 implemented the filter pushdown for parquet file but only in the case of one rowgroup per parquet file. In the case of multiple rowgroups per files, it detects that the rowgroup can be pruned but then tell to the drillbit to read the whole file which leads to performance issue.
Having multiple rowgroup per file helps to handle partitioned dataset and still read only the relevant subset of data without ending with more file than really needed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)