You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/07/23 10:49:27 UTC

[GitHub] [incubator-doris] hexian55 opened a new issue #4162: left join 谓词下推bug

hexian55 opened a new issue #4162:
URL: https://github.com/apache/incubator-doris/issues/4162


   **Describe the bug**
   使用doris on es,es table left join olap table 发现存在谓词下推问题。导致es table被全量scan
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. create table
   CREATE TABLE table1
   (
       siteid INT DEFAULT '10',
       citycode SMALLINT,
       account_id BIGINT DEFAULT '1',
       test_id BIGINT DEFAULT '1',
       pv BIGINT SUM DEFAULT '0'
   )
   AGGREGATE KEY(siteid, citycode, account_id, test_id)
   DISTRIBUTED BY HASH(siteid) BUCKETS 10
   PROPERTIES("replication_num" = "1");
   
   
   CREATE EXTERNAL TABLE `table3` (
     `id` bigint(20) NULL COMMENT "id",
     `account_id` bigint(20) NULL COMMENT "id",
     `put_status` bigint(20) NULL COMMENT "状态"
   ) ENGINE=ELASTICSEARCH
   COMMENT "ELASTICSEARCH"
   PROPERTIES (
   "hosts" = "xxxx",
   "user" = "xxx",
   "password" = "xxx",
   "index" = "xxxx",
   "type" = "_doc",
   "transport" = "http",
   "enable_docvalue_scan" = "true",
   "enable_keyword_sniff" = "true"
   );
   
   
   2. Explain select a.id,a.account_id,a.put_status,sum(b.pv) from table3 a left join table1 b on a.account_id=b.account_id and a.id=b.test_id   and a.account_id=1 and b.account_id=1 group by a.id,a.account_id,a.put_status;
   
   
   3. result 
   +------------------------------------------------------------------------------------------------------------------+
   | Explain String                                                                                                   |
   +------------------------------------------------------------------------------------------------------------------+
   | PLAN FRAGMENT 0                                                                                                  |
   |  OUTPUT EXPRS:<slot 6> `a`.`id` | <slot 7> `a`.`account_id` | <slot 8> `a`.`put_status` | <slot 9> sum(`b`.`pv`) |
   |   PARTITION: UNPARTITIONED                                                                                       |
   |                                                                                                                  |
   |   RESULT SINK                                                                                                    |
   |                                                                                                                  |
   |   8:EXCHANGE                                                                                                     |
   |      tuple ids: 2                                                                                                |
   |                                                                                                                  |
   | PLAN FRAGMENT 1                                                                                                  |
   |  OUTPUT EXPRS:                                                                                                   |
   |   PARTITION: HASH_PARTITIONED: <slot 6> `a`.`id`, <slot 7> `a`.`account_id`, <slot 8> `a`.`put_status`           |
   |                                                                                                                  |
   |   STREAM DATA SINK                                                                                               |
   |     EXCHANGE ID: 08                                                                                              |
   |     UNPARTITIONED                                                                                                |
   |                                                                                                                  |
   |   7:AGGREGATE (merge finalize)                                                                                   |
   |   |  output: sum(<slot 9> sum(`b`.`pv`))                                                                         |
   |   |  group by: <slot 6> `a`.`id`, <slot 7> `a`.`account_id`, <slot 8> `a`.`put_status`                           |
   |   |  tuple ids: 2                                                                                                |
   |   |                                                                                                              |
   |   6:EXCHANGE                                                                                                     |
   |      tuple ids: 2                                                                                                |
   |                                                                                                                  |
   | PLAN FRAGMENT 2                                                                                                  |
   |  OUTPUT EXPRS:                                                                                                   |
   |   PARTITION: HASH_PARTITIONED: `a`.`account_id`, `a`.`id`                                                        |
   |                                                                                                                  |
   |   STREAM DATA SINK                                                                                               |
   |     EXCHANGE ID: 06                                                                                              |
   |     HASH_PARTITIONED: <slot 6> `a`.`id`, <slot 7> `a`.`account_id`, <slot 8> `a`.`put_status`                    |
   |                                                                                                                  |
   |   3:AGGREGATE (update serialize)                                                                                 |
   |   |  STREAMING                                                                                                   |
   |   |  output: sum(`b`.`pv`)                                                                                       |
   |   |  group by: `a`.`id`, `a`.`account_id`, `a`.`put_status`                                                      |
   |   |  tuple ids: 2                                                                                                |
   |   |                                                                                                              |
   |   2:HASH JOIN                                                                                                    |
   |   |  join op: LEFT OUTER JOIN (PARTITIONED)                                                                      |
   |   |  hash predicates:                                                                                            |
   |   |  colocate: false, reason: Node type not match                                                                |
   |   |  equal join conjunct: `a`.`account_id` = `b`.`account_id`                                                    |
   |   |  equal join conjunct: `a`.`id` = `b`.`test_id`                                                               |
   |   |  other join predicates: `a`.`account_id` = 1                                                                 |
   |   |  tuple ids: 0 1N                                                                                             |
   |   |                                                                                                              |
   |   |----5:EXCHANGE                                                                                                |
   |   |       tuple ids: 1                                                                                           |
   |   |                                                                                                              |
   |   4:EXCHANGE                                                                                                     |
   |      tuple ids: 0                                                                                                |
   |                                                                                                                  |
   | PLAN FRAGMENT 3                                                                                                  |
   |  OUTPUT EXPRS:                                                                                                   |
   |   PARTITION: RANDOM                                                                                              |
   |                                                                                                                  |
   |   STREAM DATA SINK                                                                                               |
   |     EXCHANGE ID: 05                                                                                              |
   |     HASH_PARTITIONED: `b`.`account_id`, `b`.`test_id`                                                            |
   |                                                                                                                  |
   |   1:OlapScanNode                                                                                                 |
   |      TABLE: table1                                                                                               |
   |      PREAGGREGATION: OFF. Reason: null                                                                           |
   |      PREDICATES: `b`.`account_id` = 1                                                                            |
   |      partitions=0/1                                                                                              |
   |      rollup: null                                                                                                |
   |      tabletRatio=0/0                                                                                             |
   |      tabletList=                                                                                                 |
   |      cardinality=0                                                                                               |
   |      avgRowSize=0.0                                                                                              |
   |      numNodes=1                                                                                                  |
   |      tuple ids: 1                                                                                                |
   |                                                                                                                  |
   | PLAN FRAGMENT 4                                                                                                  |
   |  OUTPUT EXPRS:                                                                                                   |
   |   PARTITION: RANDOM                                                                                              |
   |                                                                                                                  |
   |   STREAM DATA SINK                                                                                               |
   |     EXCHANGE ID: 04                                                                                              |
   |     HASH_PARTITIONED: `a`.`account_id`, `a`.`id`                                                                 |
   |                                                                                                                  |
   |   0:EsScanNode                                                                                                   |
   |      TABLE: table3                                                                                               |
   |      ES_QUERY_DSL: {"match_all": {}}                                                                             |
   |      ES index/type: ad_dsp_creative_index_new/_doc                                                               |
   |      tuple ids: 0                                                                                                |
   +------------------------------------------------------------------------------------------------------------------+
   
   
   4. ES_QUERY_DSL: {"match_all": {}}  , a.account_id=1没有带进来,导致es table全表scan
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] hexian55 commented on issue #4162: Doris on es left join 谓词下推bug

Posted by GitBox <gi...@apache.org>.
hexian55 commented on issue #4162:
URL: https://github.com/apache/incubator-doris/issues/4162#issuecomment-663388256


   @kangkaisen close? why?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] hexian55 commented on issue #4162: Doris on es left join 谓词下推bug

Posted by GitBox <gi...@apache.org>.
hexian55 commented on issue #4162:
URL: https://github.com/apache/incubator-doris/issues/4162#issuecomment-670855080


   ok, closed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] hexian55 edited a comment on issue #4162: Doris on es left join 谓词下推bug

Posted by GitBox <gi...@apache.org>.
hexian55 edited a comment on issue #4162:
URL: https://github.com/apache/incubator-doris/issues/4162#issuecomment-670855080


   thanks please close.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] kangkaisen commented on issue #4162: Doris on es left join 谓词下推bug

Posted by GitBox <gi...@apache.org>.
kangkaisen commented on issue #4162:
URL: https://github.com/apache/incubator-doris/issues/4162#issuecomment-663791493


   > @kangkaisen close? why?
   
   Sorry,which should be a misoperation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] kangkaisen closed issue #4162: Doris on es left join 谓词下推bug

Posted by GitBox <gi...@apache.org>.
kangkaisen closed issue #4162:
URL: https://github.com/apache/incubator-doris/issues/4162


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] chaoyli closed issue #4162: Doris on es left join 谓词下推bug

Posted by GitBox <gi...@apache.org>.
chaoyli closed issue #4162:
URL: https://github.com/apache/incubator-doris/issues/4162


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wutiangan commented on issue #4162: Doris on es left join 谓词下推bug

Posted by GitBox <gi...@apache.org>.
wutiangan commented on issue #4162:
URL: https://github.com/apache/incubator-doris/issues/4162#issuecomment-670401413


   @hexian55 
   this is not a bug. In a left join sql stmt,  if the join predicate of on-join-conditions(the join conditions in on clause) is completely bound by the left table, the predicate cannot be pushed down to the base table, but the predicate which is completely bound by the right table can be pushed down to the base table.
   
   for example:
   sql
   >  select * from t1 left join t2 on t1.k1 =  1 and t2.k1 = 1 and t1.k1 = t2.k1
   
   t1 table has only one column
   
   ```
   t1.k1 
   ------
   1      
   2      
   ```
   t2 table has only colum
   ```
   t1.k1
   -----
   1
   3
   ```
   the normal result is 
   ```
   t1.k1  | t2.k1
   -------------
   1      |    1
   2      |  NULLL
   ```
   If t1.k1 = 1 is pushed down to T1, only one record will be returned when scanning T1, and the above SQL will return only one record, this is wrong result, because it should return two rows.
   ```
   t1.k1  | t2.k1
   -------------
   1      |    1
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org