You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/11/19 09:50:54 UTC

[GitHub] [incubator-doris] acelyc111 opened a new issue #4928: [Proposal] Cost base optimize for conditions

acelyc111 opened a new issue #4928:
URL: https://github.com/apache/incubator-doris/issues/4928


   **Is your feature request related to a problem? Please describe.**
   Suppose there is a table:
   ```
   CREATE TABLE `test_table3` (
     `key_int` int(11) NULL COMMENT "",
     `key_varchar` varchar(16) NULL COMMENT "",
     `val_int` int(11) NULL COMMENT ""
   ) ENGINE=OLAP
   DUPLICATE KEY(`key_int`, `key_varchar`)
   COMMENT "OLAP"
   DISTRIBUTED BY HASH(`key_int`) BUCKETS 1
   PROPERTIES (
   "replication_num" = "1",
   "in_memory" = "false",
   "storage_format" = "V2"
   );
   ```
   And a query like:
   ```
   select count(1) from test_table3 where key_int = 1 and key_varchar in ('1','2','3','4','5','6','7','8','9');
   ```
   The scanner on BE may do the IN predicate at first, it's more costly than EQ predicate.
   
   **Describe the solution you'd like**
   We can sort conditions based on cost, roughly:
   - unary operators (IS) is cheaper than binary operators(EQ,NE,LT,LE,GT,GE)
   - binary operators are cheaper than IN/NOT IN
   
   And also different column types have different cost:
   - INT is cheaper than CHAR or VARCHAR basically
   - shorter INT or VARCHAR is cheaper than longer one
   
   We can sort conditions base on cost before do actual scan.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] acelyc111 commented on issue #4928: [Proposal] Cost base optimize for conditions

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #4928:
URL: https://github.com/apache/incubator-doris/issues/4928#issuecomment-730324657


   Simply based on Operation Type and Data Type may give a incorrect estimate selectivity result indeed, but consider if two predicates have similar selectivity based on statistics info, like cardinality, the type based optimization may be helpful then.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wangbo commented on issue #4928: [Proposal] Cost base optimize for conditions

Posted by GitBox <gi...@apache.org>.
wangbo commented on issue #4928:
URL: https://github.com/apache/incubator-doris/issues/4928#issuecomment-730282616


   It looks more like based on ```Operation Type``` and ```Data Type``` .
   Where is ```Based on cost``` reflected?
   I wonder whether we need to collect column's statistics info,such as columns's cardinality.
   Eg, even ```key_varchar in ```  is more expensive than EQ, but ```key_varchar``` has higher selectivity ,so we can put ```key_varchar``` ahead


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org