You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Chao Sun (JIRA)" <ji...@apache.org> on 2016/11/04 16:56:58 UTC

[jira] [Created] (HIVE-15131) Change Parquet reader to read metadata on the task side

Chao Sun created HIVE-15131:
-------------------------------

             Summary: Change Parquet reader to read metadata on the task side
                 Key: HIVE-15131
                 URL: https://issues.apache.org/jira/browse/HIVE-15131
             Project: Hive
          Issue Type: Bug
          Components: Reader
            Reporter: Chao Sun
            Assignee: Chao Sun


Currently the {{ParquetRecordReaderWrapper}} still uses the {{readFooter}} API without filtering, which means it needs to read metadata about all row groups every time. This could some issues when input dataset is particularly big and has many columns.

[Parquet-84|https://issues.apache.org/jira/browse/PARQUET-84] introduced another API which allows to do row group filtering on the task side. Hive should adopt this API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)