You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/03/11 10:05:14 UTC

[GitHub] [incubator-doris] xinghuayu007 opened a new issue #5505: 【Proposal】Check conflict predicate and merge prediacates

xinghuayu007 opened a new issue #5505:
URL: https://github.com/apache/incubator-doris/issues/5505


   ### **1. Background**
   When a SQL query is like `select * from table1 where a = 1 and a = 2`, it contains two predicates: `a = 1`, `a = 2`. Obviously, the two predicates is conflict. This problem is defined as  **Predicate Conflict Problem**. When a SQL query is like `select * from table1 where a > 1 and a > 2`. Predicate `a > 1` and `a > 2` can be merged into one predicate `a > 2`. This problem is defined as **Predicate Merge Problem**.
   
   What situation these two problems will happen in?
   a. BI tool generate sql
   b. complicate sql with predicate pushdown
   
   ### **2. Design**
   **a. Predicate Conflict Problem**
   
   A WhereClause may be very complicated with many nested layers, such as ((D or ((A and B) or C))) and E).  In order to compute conviently, we can firstly transform the expression into **Primary Disjunctive Paradigm(主析取范式)**. Primary Disjunctive Paradigm is like (A and B and C) or (D and E and F) or (G and H and I). Item `(A and B and C)` is called 
   **Minimal Item(极小项)**. Every Minimal Item is consist of `AND Expression`. Therefore if we can check every item in Minimal Item wether is conflict with each other. If a Minimal Item has conflicts, it is false. if all Minimal Items are false, the whole expression is false.
   
   **b. Predicate Merge Problem**
   
   If two predicates can be merged, it means they share the same column,  and the predicate is like `column > constant` compounded with one column and one constant. There no every good methods to solve this problem. Presto traverses every sub-expression from bottom to top and merge every expression connected with `and` or `or`. If two expression can not be merged, suck as `a > 1 or b > 2`, it returns `{all}` to represet a no limit boundary. 
   We can compute the expression based on the result of Predicate Merge Problem. First compute every Minimal Item, because it is consist of `and`, which is easy to handle. Second compute two Minimal Items. It is easy to compute `and` firstly then `or` than compute `or` then `and`.
   
   
   ### **3. Implement**
   
   ![Untitled Diagram](https://user-images.githubusercontent.com/12771191/110767391-2bb3bb00-8291-11eb-9e0f-e6baf9db3be4.png)
   
   
   **Function Define**
   Expr optimizePredicate(Expr whereClause);
   Input Param: whereClause
   Output Param: return an optimized expression, if it is conflict, return a False BooleanLiteral expression
   
   **a. Generate Primary Disjunctive Paradigm**
   To transform a normal expression into Primary Disjunctive Paradigm, we use `**True Table(真值表)**` algorithm.  
   **b. Check Predicate Conflict**
   Then we can check every Minimal Item, if one Minimal Item is false, remove it. If all Minimal Items are false, return a False BooleanLiteral expression. 
   **c. Merge Predicate**
   First compute every Minimal Item, because it is consist of `and`, which is easy to handle. Second compute two Minimal Items. Last return an optimized expression.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org