You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/17 12:38:17 UTC

[GitHub] [arrow-datafusion] jackwener opened a new issue, #2254: [Optimizer] Infer is not null predicate from `where clause`

jackwener opened a new issue, #2254:
URL: https://github.com/apache/arrow-datafusion/issues/2254

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   There are some predicate include implicit `is not null`.
   
   ```sql
   select * from t1 where c1 > 0;  
   -> 
   select * from t1 where c1 > 0 and c1 is not null;
   ```
   
   Why do this?
   
   Expr in `select expr` and `where expr` is different.
   
   Same `expr` in `select` don't include implicit `is not null`.
   
   After this, 
   
   We can do this
   
   - eliminate self compare #2252
   - convert outer join to inner join #1585
   - .... other about predicate
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2254: [Optimizer] Infer is not null predicate from `where clause`

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2254:
URL: https://github.com/apache/arrow-datafusion/issues/2254#issuecomment-1101759932

   I think this is a *very* subtle area of SQL and it depends on how the expression is being used
   
   For example, 
   
   ```sql
   select * from t1 where c1 > 0;  
   ```
   and 
   ```sql
   select * from t1 where c1 > 0 and c1 is not null;
   ```
   
   Are semantically equivalent (produce the same answers), but these are not:
   
   ```sql
   select c1 > 0  from t1;
   ```
   and
   ```sql
   select c1 > 0 and c1 is not null from t1;
   ```
   
   The reason is that the semantics of the `WHERE` clause are that only rows that evaluate to `true` are passed (if the row evaluates to `NULL` or `false` then they don't pass).
   
   However, in general, you still have to handle the three way logic with `NULL` -- I am not at all convinced we do this entirely correctly in the expression simplification pass. We should probably have two modes: `used in where` and `used elsewhere` or something πŸ€” 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #2254: [Optimizer] Infer is not null predicate from `where clause`

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #2254:
URL: https://github.com/apache/arrow-datafusion/issues/2254#issuecomment-1100876623

   Note: It isn't a `optimizer rule` !
   
   It is because the following transformation can't terminate in Finite State Machine of rule.
   
   ```
   select * from t1 where c1 > 0;  
   -> 
   select * from t1 where c1 > 0 and c1 is not null;
   ->
   select * from t1 where c1 > 0 and c1 is not null and c1 is not null;
   ->
   .....
   ```
   
   So, In fact, this is a `rewrite` in planner build (Specifically, it is in `plan_selection`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #2254: [Optimizer] Infer is not null predicate from `where clause`

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #2254:
URL: https://github.com/apache/arrow-datafusion/issues/2254#issuecomment-1101946913

   > I am not at all convinced we do this entirely correctly in the expression simplification pass. We should probably have two modes: used in where and used elsewhere or something πŸ€”
   
   Yes, This is where the problem lies. Other optimizer rules depend on implicit `not null in where`, which must handle this problem. 
   
   I think we should infer `is not null`, so other rule don't need to do infer `not null` but use directly `is not null` inferred by this rewrite. For example, convert outer join to left/right outer join or inner join can use `is not null` directly instead of match `compare operator`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #2254: [Optimizer] Infer is not null predicate from `where clause`

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #2254:
URL: https://github.com/apache/arrow-datafusion/issues/2254#issuecomment-1101936021

   Yes, we just handle the expr in where.
   This rewrite is from spark.
   
   Andrew Lamb ***@***.***> 于 2022εΉ΄4月19ζ—₯ε‘¨δΊŒ 04:55ε†™ι“οΌš
   
   > I think this is a *very* subtle area of SQL and it depends on how the
   > expression is being used
   >
   > For example,
   >
   > select * from t1 where c1 > 0;
   >
   > and
   >
   > select * from t1 where c1 > 0 and c1 is not null;
   >
   > Are semantically equivalent (produce the same answers), but these are not:
   >
   > select c1 > 0  from t1;
   >
   > and
   >
   > select c1 > 0 and c1 is not null from t1;
   >
   > The reason is that the semantics of the WHERE clause are that only rows
   > that evaluate to true are passed (if the row evaluates to NULL or false
   > then they don't pass).
   >
   > However, in general, you still have to handle the three way logic with
   > NULL -- I am not at all convinced we do this entirely correctly in the
   > expression simplification pass. We should probably have two modes: used
   > in where and used elsewhere or something πŸ€”
   >
   > β€”
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow-datafusion/issues/2254#issuecomment-1101759932>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AHI4SLNVFUPSCQ5WR4SY5RDVFXD3RANCNFSM5TT2L62Q>
   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener closed issue #2254: [Optimizer] Infer is not null predicate from `where clause`

Posted by GitBox <gi...@apache.org>.
jackwener closed issue #2254: [Optimizer] Infer is not null predicate from `where clause`
URL: https://github.com/apache/arrow-datafusion/issues/2254


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #2254: [Optimizer] Infer is not null predicate from `where clause`

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #2254:
URL: https://github.com/apache/arrow-datafusion/issues/2254#issuecomment-1101943252

   Great summary, this rewrite is from `spark`, I will read and consider carefully about this part. 
   
   I want to do this just because some expr rewrite is include this problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org