You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chao Sun (JIRA)" <ji...@apache.org> on 2016/11/04 16:56:58 UTC
[jira] [Created] (HIVE-15131) Change Parquet reader to read
metadata on the task side
Chao Sun created HIVE-15131:
-------------------------------
Summary: Change Parquet reader to read metadata on the task side
Key: HIVE-15131
URL: https://issues.apache.org/jira/browse/HIVE-15131
Project: Hive
Issue Type: Bug
Components: Reader
Reporter: Chao Sun
Assignee: Chao Sun
Currently the {{ParquetRecordReaderWrapper}} still uses the {{readFooter}} API without filtering, which means it needs to read metadata about all row groups every time. This could some issues when input dataset is particularly big and has many columns.
[Parquet-84|https://issues.apache.org/jira/browse/PARQUET-84] introduced another API which allows to do row group filtering on the task side. Hive should adopt this API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)