You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Atanu Mishra (JIRA)" <ji...@apache.org> on 2016/05/07 02:40:12 UTC

[jira] [Closed] (TRAFODION-1662) Predicate push down revisited (V2)

     [ https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Atanu Mishra closed TRAFODION-1662.
-----------------------------------
    Resolution: Fixed

Updated Fix version field.

> Predicate push down revisited (V2)
> ----------------------------------
>
>                 Key: TRAFODION-1662
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-1662
>             Project: Apache Trafodion
>          Issue Type: Improvement
>          Components: sql-exe
>    Affects Versions: 2.0-incubating
>            Reporter: Eric Owhadi
>            Assignee: Atanu Mishra
>              Labels: predicate, pushdown
>             Fix For: 2.0-incubating
>
>         Attachments: Advanced predicate push down feature.docx, Advanced predicate push down feature.docx, Performance results analyzing effects of optimizations introduced in pushdown V2.docx
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Currently Trafodion predicate push down to hbase is supporting only the following cases:
> <Column><op><Value> AND <Column> <op><value> AND…
> And require columns to be “SERIALIZED” (can be compared using binary comparator), 
> and value data type is not a superset of column data type.
> and char type is not case insensitive or upshifted
> and no support for Big Numbers
> It suffer from several issues:
> -	Handling of nullable column:
> When a nullable column is involved in the predicate, because of the way nulls are handled in trafodion (can ether be missing cell, or cell with first byte set to xFF), binary compare cannot do a good job at semantically treating NULL the way a SQL expression would require. So the current behavior is that all null column values as never filtered out and always returned, letting trafodion perform a second pass predicate evaluation to deal with nulls. This can quickly turn counterproductive for very sparse columns, as we would perform useless filtering at region server side (since all nulls are pass), and optimizer has not been coded to turn off the feature on sparse columns.
> In addition, since null handling is done on trafodion side, the current code artificially pull all key columns to make sure that a null coded as absent cell is correctly pushed up for evaluation at trafodion layer. This can be optimized by only requiring a single non-nullable column on current code, but this is another story… as you will see bellow, the proposed new way of doing pushdown will handle 100% nulls at hbase layer, therefore requiring adding non nullable column only when a nullable column is needed in the select statement (not in the predicate).
> -	Always returning predicate columns
> Select a from t where b>10 would always return the b column to trafodion, even if b is non nullable. This is not necessary and will result in useless network and cpu consumption, even if the predicate is not re-evaluated.
> The new advanced predicate push down feature will do the following:
> Support any of these primitives:
> <col><op><value>
> <col><op><col>		(nice to have, high cost of custom filter low value after TPC-DS query survey) 
> Is null
> Is not null
> Like			-> to be investigated, not yet covered in this document
> And combination of these primitive with arbitrary number of OR and AND with ( ) associations, given that within () there is only ether any number of OR or any number of AND, no mixing OR and AND inside (). I suspect that normalizer will always convert expression so that this mixing never happen…
> And will remove the 2 shortcoming of previous implementation: all null cases will be handled at hbase layer, never requiring re-doing evaluation and the associated pushing up of null columns, and predicate columns will not be pushed up if not needed by the node for other task than the predicate evaluation.
> Note that BETWEEN and IN predicate, when normalized as one of the form supported above, will be pushed down too. Nothing in the code will need to be done to support this. 
> Improvement of explain:
> We currently do not show predicate push down information in the scan node. 2 key information is needed:
> -	Is predicate push down used
> -	What columns are retrieved by the scan node (investigate why we get column all instead of accurate information)
> The first one is obviously used to determine if all the conditions are met to have push down available, and the second is used to make sure we are not pushing up data from columns we don’t need.
> Note that columns info is inconsistently shown today. Need to fix this.
> Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be replaced with a multi value CQD that will enable various level of push down optimization, like we have on PCODE optimization level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)