You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ferdinand Xu (JIRA)" <ji...@apache.org> on 2014/10/15 06:56:35 UTC

[jira] [Commented] (HIVE-8122) Make use of SearchArgument classes

    [ https://issues.apache.org/jira/browse/HIVE-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171984#comment-14171984 ] 

Ferdinand Xu commented on HIVE-8122:
------------------------------------

Hi [~brocknoland], currently I'm working on this jira and did some investigations both on parquet and hive side. 
To my understand, search argument is kind of predication pushing down framework or mechanism. Vertical and horizontal partitions are supported in the latest parquet project already. And it's using FilterPredicate which is like searchArgument. See  https://github.com/apache/incubator-parquet-mr/blob/0148455170be07f89bd6b9230960a6cd510c7ca6/parquet-column/src/main/java/parquet/filter2/predicate/FilterPredicate.java
I found it an issue about the current filter solutio in HIVE side. Hive is implementing the filter pushing down by putting FILTER_EXPR_CONF_STR and FILTER_TEXT_CONF_STR into the conf and then pass it to the ParquetIntputFormat. However,  the parquet is using FILTER_PREDICATE configuration which is serialized with a FilterPredicate.
Is the jira filed for the purpose of enabling Filter Predicate features provided by the parquet in the hive code? If so, maybe we can use the machinism from parquet by creating a FilterPredicate in hive code. See https://github.com/apache/incubator-parquet-mr/blob/5dafd127f3de7c516ce9c1b7329087a01ab2fc57/parquet-hadoop/src/main/java/parquet/hadoop/ParquetInputFormat.java#L163
Please feel free to figure out what I am wrong.

> Make use of SearchArgument classes
> ----------------------------------
>
>                 Key: HIVE-8122
>                 URL: https://issues.apache.org/jira/browse/HIVE-8122
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Brock Noland
>            Assignee: Ferdinand Xu
>
> ParquetSerde could be much cleaner if we used SearchArgument and associated classes like ORC does:
> https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)