You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2016/03/09 19:27:40 UTC
[jira] [Resolved] (PARQUET-374) Add api to read dictionary from
each column chunk for predicate pushdown
[ https://issues.apache.org/jira/browse/PARQUET-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Blue resolved PARQUET-374.
-------------------------------
Resolution: Won't Fix
I'm marking this as "Won't fix" because PARQUET-384 includes the proposed API for accessing dictionaries.
> Add api to read dictionary from each column chunk for predicate pushdown
> ------------------------------------------------------------------------
>
> Key: PARQUET-374
> URL: https://issues.apache.org/jira/browse/PARQUET-374
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Zhenxiao Luo
> Assignee: Zhenxiao Luo
>
> Parquet files's dictionary could be used for predicate pushdown
> eg.
> SQL query:
> select * from table where column = 10;
> could skip reading the whole row group if the dictionary for column has values [5, 11, 17, 20]
> This could save IO and improve performance.
> We implemented predicate pushdown using dictionary in Presto for parquet files, and benchmark shows up to 40X speedup for selective queries.
> Need to add an api to ParquetFileReader, so that it returns dictionaries for requested columns.
> If the column is not dictionary encoded in this row group, return null.
> If the not all column pages are dictionary encoded in this row group, return null.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)