You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Shivram Mani (JIRA)" <ji...@apache.org> on 2016/09/12 23:07:20 UTC

[jira] [Updated] (HAWQ-886) Investigation of HAWQ/PXF support for ORC

     [ https://issues.apache.org/jira/browse/HAWQ-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shivram Mani updated HAWQ-886:
------------------------------
    Summary: Investigation of HAWQ/PXF support for ORC  (was: Support PXF filter push down for ORC)

> Investigation of HAWQ/PXF support for ORC
> -----------------------------------------
>
>                 Key: HAWQ-886
>                 URL: https://issues.apache.org/jira/browse/HAWQ-886
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Shivram Mani
>            Assignee: Shivram Mani
>             Fix For: backlog
>
>
> Currently HAWQ when reading ORC files via PXF (using the default Hive profile) doesn’t push down any of the filter information down to the underlying ORC reader. The only filter that is possible right now is at the level of partition and is generically done for all Hive tables.
> ORC internally contains file level, stripe level and row level statistics including information such as min,max values etc. For more information refer to https://orc.apache.org/docs/indexes.html
> The proposal here is to introduce a new PXF profile optimized for ORC files which leverages these stats to improve the performance of HAWQ queries with predicates. We will also use the Vectorized approach (VectorizedRowBatch) while reading along with SearchArgument to build the filter as opposed to the existing expensive reader which is row based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)