You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by ashish-kumar-sharma <gi...@git.apache.org> on 2018/05/12 08:51:08 UTC

[GitHub] hive pull request #346: HIVE-12898: First commit

GitHub user ashish-kumar-sharma opened a pull request:

    https://github.com/apache/hive/pull/346

    HIVE-12898: First commit

    1. Predicate Pushdown For Nested field
    
    1.1 Objective
    
    In the ORC(Optimized Row Columnar) all the primitive type column consist of index. Predicate refer to the column name in where clause and pushdown mean skipping rows groups, strips and block while reading by comparing the meta store in the strips. Meta consist of max, sum ,min value present in the given column. 
    
    Currently predicate pushdown only work for top level column of the schema. Extending the Predicate Pushdown for nested structure in hive.  
    
    
    1.2 Current state - 
     
    1.2.1 Schema
    struct<col1:int, col2:bigint,col3:struct<col4:int,col5:struct<col6:int>,col7:string>>
     
    1.2.2 Query 
    select col3.col5.col6 from table where col3.col5.col6 > 10;
     
    1.2.3 Conf 
    Hive.io.filter.expr.serialized = “ASdni2enalfkncwjnlsdnfrnqwoglqernmgkqrg”;
    Hive.io.filter.text - “where c.e.f > 10”;
     
    1.2.4 Pushdown Predicate not supported in Nested field
     
    Generate ExprNodeGenericFuncDesc  object which is of type ExprNodeFieldDesc which is serialized and stored in Hive.io.filter.expr.serialized.
    
    But while parsing ExprNodeGenericFuncDesc object to generate searchArg in function ConvertAstToSearchArg() there is strict checking of  (ExprNodeGenericFuncDesc instanceof ExprNodeColumnDesc). Due to which it completely skip the SearchArgment creation.  
    
    
    1.2.5 Result - 
    
    builder.literal(SearchArgument.TruthValue.YES_NO_NULL);
    
    1.3 Expected state - 
    
    1.3.1 Schema
    struct<col1:int, col2:bigint,col3:struct<col4:int,col5:struct<col6:int>,col7:string>>
     
    1.3.2 Query
    select col3.col5.col6 from table where col3.col5.col6 > 10;
     
    1.3.3 Conf
    Hive.io.filter.expr.serialized = “ASdni2enalfkncwjnlsdnfrnqwoglqernmgkqrg”;
    Hive.io.filter.text - “where c.e.f > 10”;
     
    1.3.4 Pushdown Predicate support in Nested field
     
    Generate ExprNodeGenericFuncDesc  object which is of type ExprNodeFieldDesc which is serialized and stored in Hive.io.filter.expr.serialized.
    
    But while parsing ExprNodeGenericFuncDesc object to generate searchArg in function ConvertAstToSearchArg() there should also contain an check for ExprNodeFieldDesc and separate parsing plan which convert the fieldName to ColumnID and generate PredicateLeaf nodes.
    
    1.3.5 Result
    
    leaf-0 = (LESS_THAN c.e.f 10), expr = (not leaf-0)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Flipkart/hive nestedppd

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hive/pull/346.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #346
    
----
commit f3e46b62b4fab6877f2373c49d933ebe7119ec2f
Author: Aashish Kumar Sharma <aa...@...>
Date:   2018-05-12T08:47:39Z

    HIVE-12898: First commit

----


---