You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Yash Datta (JIRA)" <ji...@apache.org> on 2014/11/08 15:48:33 UTC
[jira] [Created] (PARQUET-128) Optimize the parquet RecordReader
implementation when filterpredicate is pushed down
Yash Datta created PARQUET-128:
----------------------------------
Summary: Optimize the parquet RecordReader implementation when filterpredicate is pushed down
Key: PARQUET-128
URL: https://issues.apache.org/jira/browse/PARQUET-128
Project: Parquet
Issue Type: Improvement
Components: parquet-mr
Affects Versions: 1.6.0rc2
Reporter: Yash Datta
Fix For: parquet-mr_1.6.0
The RecordReader implementation currently will read all the columns before applying the filter predicate and deciding whether to keep the row or discard it.
We can have a RecordReader which will only assemble the columns on which filters are applied (which are usually a few), then apply the filter and decide whether to keep the row or not , and then goes on to assemble the remaining columns or skip the remaining columns accordingly.
The performance improvement by this change is seen to be significant , and is better in case smaller number of rows are returned by filtering (which is usually the case) and there are many number of columns
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)